François Pinard's site

Python style

I discuss a single stylistic issue (one has to start somewhere!). And I'm not even finished with this one yet.





1   Explicit closing of files?

1.1   Summary

There is a recurring debate among Python users about if we should explicitly close files, in Python, or not. Many people have shown strong and diverging opinions, and I am not neutral in all this.

As most of us came to Python while already having other languages as background, and as many languages mandate explicit closes, we are prone to blindly applying the forcefully acquired aesthetics to Python. Moreover, there are genuine cases where explicitly closing files in Python is adequate. Also, depending on how you define what Python is, there are viewpoints from which explicit closes are always adequate.

On the other hand, the reference implementation of Python does timely automatic finalisation of files as for any other kind of objects. Most practical usages of files in Python programs could take advantage of this finalisation, yielding a significant increase in elegance and legibility, with no real loss in maintainability. This explains my position on the matter, as legibility and maintainability are Python virtues which much appeal on me.

1.2   Python finalisation

Python, as created and maintained by Guido van Rossum and team is sometimes nicknamed CPython, to distinguish it from JPython in particular, the former being written in C, the later in Java. Now that JPython has been renamed Jython, I'll merely use Python to name the mainstream, reference implementation.

Python uses reference-counting for reclaiming dead objects, so when the last reference to a file is cut, that file gets automatically and immediately closed by the Python system. Consider:



   buffer = file(NAME).read()

In the above statement, as the user does not keep a reference to the file, it gets automatically closed as soon as its contents have been read.

The Python Reference Manual says: Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether -- it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

Indeed, there are other implementations of Python which delay object finalisation. The classic example is Jython, which relies on the garbage-collector in the Java runtime system, so a file which is not explicitly closed by the Jython programmer, before the last reference is cut, will be closed in some indeterminate future by the garbage-collector. So, in Jython, the above example might better be written as:



   handle = file(NAME)

   buffer = handle.read()

   handle.close()

   del handle

yet in practice, I guess Jython programmers would not worry about the del statement and merely let the closed file object lie around.

1.3   Choosing Python

In the following code fragment:



   file(FILE, 'w').write(DATA)

a Python programmer knows that FILE has received DATA, all secured to disk, by the end of the statement.

Some say that this works purely as an accident, or artifact of the current implementation of Python, which may unexpectedly change behaviour one day, also insisting that Python is not defined that way, and that programmers should not depend on any particular features of garbage collection — not even on its existence.

This seems quite overcautious to me. Python fundamentally relies on reference counting since its inception, this is not going to change. While the Python garbage collector may be tuned or deactivated, reference counting may not be shut down, and will likely never be: the amount of breakage that would result all over is unthinkable.

The Python Reference Manual allows for alternative Python implementations to delay finalisation, and so, use other memory management techniques than reference counting. I guess this is so Jython (at the time the Reference Manual was amended for this), and then others, could use the name Python despite such differences. Moreover, the writer of Jython is also a member of the Python development team.

1.4   Elegance considerations

There might be cases when one moves between Python and Jython, indeed. When one knows s/he works with Python only, it is good style to rely on the refcount behaviour, as it yields code which is not only more legible, but also more elegant and concise. It means that you understand and accept in advance having to revise your code if you ever want to use other implementations of Python, like Jython. As someone was pointing to me very recently, the Python reference tries to describe a common language, but there is no C-Python specific guide. If there was one, the refcount behaviour would most probably be described as dependable and reliable, even through future versions, as far as Python programming is concerned.

And besides, it seems the few implementations of Python do not support exactly the same language: extensions here may not be available there. Defining good style as the common subset of all Python implementations, and everything else as bad style, seems questionable. The only thing is that you have to be aware of the implications of your choices.

For this Jython-forces-you-to-explicit-closes matter, my feeling is that Jython encourages bad style here, much more than it defines good style. Surely, there has never been a bad intent from Jython author. We understand the limitation comes from the fact Jython relies on the Java system for collecting garbage. One has to close explicitly in Jython for practical implementation considerations, this has nothing to do with good style, and does not define what good style may be.

This is surely good to explicitly close() when one is done with a file, but for other reasons, needs to keep a reference on this file. Yet, the nicest is not keeping a reference to the file, whenever it can be avoided easily. Some consider this as laziness. This is no laziness at all, this is rather an active seek for legibility, with the code going straight to the essentials, and showing those essentials as clearly as possible.

Someone writes: The opened file (when not needed) may cause problems to other applications that try to do something with the same file. One should not consider any open file object as the exclusive property of the running application. Because of that, the file should be closed as soon as possible.

But this is exactly what happens. There is no problem, only fears. Once you get use to the paradigm, you discover that it is safe and dependable.

It is only natural that people, coming from other programming languages culture, want to use Python in a way that best reflect the idioms they already know. However, Python has its own paradigms, that can be put to good use for what they are, and independently of our past experiences, provided they fit our own values. My own values are mainly about maintainability, and this one alone implies legibility, compactness, simplicity, humility.

Depending on how s/he studied computers, the programmer may think that arguments are passed on a stack (conceptual or not), or he may just think with higher concepts and even ignore that a stack is involved. After a while, the first category comes to take for granted the stack cleanup, the same as Lisp people, for example, take the garbage collector for granted, and would never think at explicitly freeing cons cells. We may decide to "take for granted" a lot of other idioms, if we only choose to. It's not that difficult. After we make such choices, we discover the elegance we might have been reluctant to see. Still using Lisp as an example, I surely knew people which were extremely reluctant to see that Lisp was quite elegant, as they were sticking themselves at assembler-level thinking.

Elegance roots itself in that conceptual simplicity of automatic finalisation of various Python objects.

1.5   Blind principles

A few programmers will close all files explicitly for blindly sticking to some principles, while not fully understanding the meaning of the principles they quote.

Explicit is better than implicit

This statement, part of the Python credo, is sometimes used as a justification for explicit closes, instead of letting the system take care of them.

But the system takes care of really many things implicitly, which do not honestly bother programmers otherwise. For example, Python local variables allocated within a function are automatically deleted on exit from that function, but no one in his proper soul would think about del-eting all local variables. There is something implicit in recursion that could have been made explicit through a programmer-managed stack. And so on. So, the blind argument above just does not float; it looks more like a justification than a reason.

There is an equilibrium between explicit and implicit, and a sense of good taste goes in that equilibrium. Good taste in Python is not the same as for Perl or for C, we need to un-dust ourselves a bit when we switch between languages.

Never depend on the implementation

As various implementations of Python vary significantly, both for the language features they offer, and the libraries they rely upon, using an implementation is effectively selecting a dialect and a set of paradigms which prevent real portability between implementations.

It is a false feeling to think that, by not depending on this particular aspect of Python, Python programs will truly become portable. Taking Jython as an example, this particular implementation itself happily pretends to correct many deficiencies of Python, lags significantly behind Python development, and invites programmers to depend on many Java libraries and classes which are unavailable to Python. There is nothing inherently wrong in all this, except in the false feeling that Python and Jython are the same language. When one is choosing one of them, s/he is definitely not choosing the other.

Porting from Python to Jython or another variant, or vice-versa, involves many issues and there is an unavoidable cost to porting, while open file concerns is only a fraction of that cost. To guard against all such issues in Python for any program that might never be ported is likely to cost much, much more.

Python itself, the mainstream dialect, has been ported to many platforms, and there lies real Python portability. The language is defined by its implementation, programmers do not refrain from using Python specificities, and depending on permanent characteristics of one given implementation. The behaviour on timely finalisation is there to stay. We may depend on it. So, in the case at stake here, the statement of the principle above is more rhetorical than practical.

It is bad programming practice

The Jython documentation states that depending on timely finalisation is bad programming practice, and some people blindly translates this statement to Python itself. Wrongly, of course.

Others assert that programmers are going to be bitten, sooner or later, because functions might unexpectedly keep references in a cache, or create reference cycles preventing finalisation, or such other mysteries. They assert their gut feelings and experience, saying that relying on Python automatic finalisation is not serious, sloppy, etc., and then go great length at trying, but not that successfully in my opinion, at substantiating their emotion.

Often, experience might come from languages other than Python. We might be careful when we extend our experience from one language to another, in recognising the specificities of the new languages. Our experience is often helpful, but in some cases, if we are not careful, it may blinds us. Best practice might well be to avoid noisy code.

The truth is that Python does not play such trickeries on programmers. Programmers themselves, on the wide average, are competent and careful enough that they have a rather good idea of what their programs are doing, and they can feel themselves in peaceful and confident control.

1.6   Debugging

What happens when it is not possible to write into the 'stuffile'. […] Because of the brevity of the code it is also not so explicitly visible what the write() will do when the file cannot be opened.

Python will traceback, magically, wonderfully, with adequate information.

1.7   Unintegrated material

What follows, dumped almost raw below, still has to be integrated in the document above.

I can come up with a perfectly good example where



        open(FILE, 'w').write(CONTENTS)

will fail to close the file in Python.

There are no guarantees the reference to the object returned is the only reference to that object. Assume the following (perverse) implementation of open:



   __files = []



   def open (path, mode):

       f = __open(path, mode)

       __files.append(f)

       return f

or:



    class file:



    __files = []



    def __init__(self, path, mode):

        __files.append(self)

Whilst I know this is not the implementation, there is nothing that prevents it from being like this. So you get back an object that has one more reference count than expected. It never gets collected.

You should never rely on automatic freeing of any resources unless it is guaranteed. Always explicitly free resources when you have finished with them.

What happens if close fails during GC? Will it still raise an exception? If so, the exception could happen at an unfortunate place in the code.

Um. Explicit close does raise an exception if it fails, right?

From: Dennis Lee Bieber

Um. Explicit close does raise an exception if it fails, right?



      [wulfraed@beastie wulfraed]$ python

      Python 2.2 (#1, Nov  5 2002, 15:43:24)

      [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)] on linux-i386

      Type "help", "copyright", "credits" or "license" for more information.



      >>> f = file("t.t","w")

      >>> f.close()

      >>> f.close()

      >>> f

      <closed file 't.t', mode 'w' at 0x809dc88>

      >>> del f

      >>> f.close()

      Traceback (most recent call last):

        File "<stdin>", line 1, in ?

      NameError: name 'f' is not defined

      >>>

Looks like it doesn't care… As long as the file /had/ been opened first (doesn't look like the interactive interpreter collected f until the del either).

[Alex Martelli]

It's very common use in classic Python, and sometimes handy for interactive or near-one-liner use, but in any serious program explicitly closing the file just as soon as you're done with it is beter than relying on the interpreter closing it for you

In more recent versions of Python, the open built in function has been replaced by the file built in constructor, and open has been made an alias for file, so the above may more adequately be written:



    stuff = file("filename", "r").read()

The file object created by the file constructor has its own destructor, which closes the file automatically if not done already. Using file is Pythonesque. For simple usages like the one above, you can safely rely on Python for destroying the file object, even in extra-serious programs ☺. Better is a question of opinion, but in the case here, it seems to me that cluttering your code with an assignment, only meant for an explicit close, does not buy you legibility, and moreover, your assignment will get your closed file object uselessly hanging around. Best is exactly what you did. (I would have not used the , "r" part in this case.)

it will make your code more usable in Jython,

This is likely the only real advantage of cluttering your code, when you know or bet that you are going to use Jython one of these days.

From: Alex Martelli

Now try/finally IS slightly bothersome to use -- you have to put the initialization before the try, not in it, for example — yet its semantics are truly crucial. This is the reason easier-to-use syntax alternatives to try/finally are considered on Python-Dev — encourage everybody to rely on guaranteed finalization rather than implementation accidents that may work on some correct implementations of the Python but not on others. But even now, though the syntactic enhancements are not there yet and nobody can be sure if and how they'll eventuate, the semantics of try/finally are just as important — and, IMHO, in any real program well worth the little bothers that try/finally gives.

Grant Edwards

I think we're loosing one of the big advantages of Python if users can't rely on objects going away when they're no longer used and instead have to explicitly free() them. The current reference-counting problem that Python has with circular references is something I can live. Having to explicitly dispose of objects when I'm done with them would completely ruin the feel of the language.