Table des matières
- 1. .
- 2. Migrating to Python 3
- 3. Opinions
- 4. Strings
- 5. Tips and tricks
- 6. Tools
Python has become my day to day language of choice for various little scripts and works. But, of course, it is not limited to quickies, and is also quite convenient for bigger, and much bigger, projects.
I'm saving a few ramblings here! On other pages, I discuss a few Python-related matters, like:
- Python for system administrative tasks, in From Cfengine to Python
- Using Greg Ewing's Pyrex for interfacing C or getting speed
- Away from os.path.walk
- Scheme and Python
Amusingly, I stumbled on some text of mine within the official heapq documentation (the Theory section), repeated verbatim from a message I once sent on the Python mailing list.
2 Migrating to Python 3
Well, well! Moving the bulk of my scripts from Python 2 to Python 3 requires more work than expected. I did adapt Pymacs a while ago to serve some of my users, it required a good burst of energy, but it went rather well, leading me to think that it should go easily for the remainder of my things.
2.1 Why the move?
I have a small script named
pre-install.py which bends the Python installation to my own habits. The script creates
/usr/lib/pythonN.M/distutils/distutils.cfg in such a way that all later installations default the prefix to
/usr/local/. I give my own user full access to this hierarchy and consequently, do not need to be root for installing Python packages. As far as my needs go, all Python packages correctly obey this configuration file, to the point I can forget it. The only exception I remember is the pyo package from Olivier Bélanger — otherwise a very impressive piece of work.
The script also rewrites
/usr/lib/pythonN.M/sitecustomize.py in such a way that sys.path is augmented with
/usr/local/lib/pythonN.M/site-packages. Consequently, the Python packages I install are automatically found. The same file is also setting the system default encoding to UTF-8, which was very convenient to me.
However, this last bit may not apply for Python 3, which does not allow setting a default encoding. Many people consider that setting a default encoding is a bad idea. Moreover, as many of my users do not have one, I may not always reproduce the problem they see. So, I decided to merely get rid of it for both Python 2 and Python 3, moving as much of my code to Python 3 as reasonably doable.
2.2 No big deal!
The 2to3 program does a reasonable first approximation, and I patiently applied it to hundreds of Python modules today, perusing the result each time for minor adjustments. In particular, I removed UTF-8 coding biscuits, and those lines saying
__metaclass = type, as 2to3 failed to clean them out.
It is more productive doing it in bulk as I just did, than one file after another with full testing, which would take me like forever. Some scripts I use often, others only seldomly, to the point I do not want to spend too much time on their totality right now. Debugging is undoubtedly going to be required, but the need should fade out rather quickly in practice.
Here are a few more details about some problems I met. If we consider only one module at a time, it is rather easy on the average to get it back into a working state. The problem mainly comes from the quantity of scripts I have. On the other hand, problems repeat between scripts, so these problems get easier to solve as I go.
Some Python modules just may not be converted, they are stuck with Python 2.
- Some depend on external modules which have not been ported. While it's fun that lxml and numpy have been adapted, it's a blow to learn that pygtk is not going to be adapted soon. I miss PIL as well. To a lesser extent, audiolib, matplotlib and a few others.
- Embedded Pythons may create problems. Blender already uses Python 3, which is nice. But Gvim is tied to Python 2, so I should refrain from converting a module to Python 3 if I intend to use it from within Gvim.
A few other external packages, relatively minor in size, I have to port them myself. All in all, after having transformed hundreds of modules, I see more clearly that writing single sources which work for both Python 2 and Python 3 would slow me down considerably, I did not attempt it. And even if I was using pppp — the pre-processor I wrote for Pymacs — it would still uglify the code. So, I'm slowly giving into the idea of multiple sources for multiple Python versions, which is a pity!
I could swear I saw a case where the str type was not directly hashable, requiring a prior encoding as bytes. Cannot reproduce it. Sigh!
The sort function does not accept anymore a comparison function, it now requires a function able to map into a logical key, instead. Granted that the key approach is more speedy for any sizable list. Yet, writing such a key function for complex comparisons, in a clean way, may be at least inconvenient.
As suggested in the Python documentation, I use the subprocess module as much as possible with Python 2, to replace os.system, os.popen and friends. It's a bit cumbersome to use. With Python 3, the associated pipes deliver byte streams instead of character streams, an annoyance really: it forces me at adding some more complexity over code that is a bit heavy already. Here and there, it begins to look like Java code! Python 3 helps already for the stream resulting of open, sys.argv and a few other places, the effort should not have stopped there.
The csv module in Python 2 insisted that we declare files to be binary. The csv module in Python 3 insists (at run time!) that we do not declare files as binary. The module documentation introduces the new newline= keyword instead.
Undoubedly, that little saga will continue and yield new discoveries…
3.1 Learning takes time
The same as with text editors, it is important to go further than butterflying superfically over a lot of programming languages. One has to select and learn at least one programming language well, and this undoubtedly takes time.
Even for an experienced programmer, it takes much time to establish a real proper style and the feeling that our programs are genuine art. We sometimes transport previous habits into new languages, which are not always proper. Some people wrongly consider they know Python because they quickly started using it. It takes quite a good while before one is really acquainted with libraries, overall culture, and intimate internals. Such investments also explain why people make religions of languages, it is not easy to take a few years of life, and work and sweat, and throw these overboard.
For example, I did a lot of C programming following GNU standards, and have a natural tendency to apply those standards when I write Python code. Some parts fit well, some are less welcome. I noticed, in particular, that Guido is quite objectful to GNU standards for Python, and I got to decide where I prefer to follow Guido, and where Guido is just wrong ☺. On the other hand, breaking from previous standards is sometimes helpful, even on minor things, when these minor things are ubiquitous. This helps building walls around a growing style, and later, a spinal chord. A minor but good example, in my case, is whether to space after a function name (GNU) or not (Guido). This detail helps me to recover GNU standards when I write C, and Guido style when writing Python, as a style sticks together.
Take Java for another example. I'm sure I could start using it within a week. But what worth would this be? The amount of libraries available for Java is frighteningly overwhelming. In no way, someone could ever say within a week, or even within a month, that s/he got a serious overview of what's available. We should count this whole time as part of the learning process, if we want to be serious about what we mean by knowing a language.
3.2 Legibility and other virtues
Some like the Python simplicity of everything, the way almost anyone can understand it. I, for one, consider it to be a very appealing and worthy feature that Python stands in a niche similar to Perl, while being so readable. What make Python so great? An humble syntax which is nevertheless very effective. I rewrote tens of thousands of Perl lines into Python, and do not have the shadow of a regret (besides the human aspect of Larry Wall, who I always appreciated — but since his personality is almost fully buried in the Perl crowd and riot, I did loose the pleasure of Larry long before switching away from Perl). Going to Python, code acquires much light and clarity, and maintainability.
Python also got me to loose most of my interest in C programming. I still do it once in a while, because some special applications might need C speed here and there, yet much less often in practice than one might think. Prototyping a program in Python is a lot faster than in C, code is more solid right on the first writing, at least because it is so more legible. Debugging is much easier anyway, for when bugs remain.
One thing that I especially appreciate, but this is a relative detail, is how error processing works in Python. In Perl or C, an error is usually ignored unless you take action to act on it. In Python, you have to take positive action if you want an error to be ignored. That single point makes Python programming, even lazy, a lot safer than Perl or C.
Another point, which I often notice by comparing my Python code to C coders around me, which are very competent people in my opinion, is that they are more reluctant than I am resorting to hashing, threads, client/server design, and other such things, at least for simpler or less important projects. While these things are quickly and simply written in Python, they always require some amount of courage in C. Perl is easier than C, yet Perl code gets a bit discouraging too when times come to read it back, enough to become an impediment in bigger or more serious projects. ☺
3.3 Auto-checking of errors
it is extremely heavy to correctly error-check all calls to library functions. I remember having worked in some C code in which the original programmer tested the return value of each and every
printf, with appropriate error reporting. This yielded code which is noisy, hard to read, and irritating to maintain.
My feeling is that C programmers (and I still work at C programs once in a while ☺ do a reasonable error checking, but are rarely fully thorough with it. A compromise is needed, because both extremes (no error checking at all, checking every possible error) are just unbearable in real programs.
I'm very convinced that Python has a fruitful approach on this. All errors are handled in some way, and the user always have control for errors to be reported with nicer diagnostics, or handled more gracefully, at the price of writing only a few lines then. That's very bearable, and better.
3.4 General appreciation
Python has been happiness so far. For one, I'm very, very satisfied, to the point I lost much of my interest for C programming (and away from all those heavy portability efforts that plague most of my previous works).
Python is excellent for quickly writing prototypes, and moreover, for reading them back while revisiting them later. This is especially important for code that we write quickly, because we have less time than usual for learning what we do while doing it, per our previous habits. When I was writing a lot of assembler, I was meditating for weeks on the code and the project. When one writes equivalent code in one hour or so, one surely looses something. To counter-balance, Python could be written very legibly all along. Of course, a minimum of comments always help!
Prototypes that work well enough often turn into production programs, that stay un-rewritten forever, and with everybody staying happy. If arises a special need for extra speed, a few profiling runs, followed by the writing of a specialised C extension, is often the good way to go.
There are many Python libraries (or modules, or packages) already, they may not be as numerous as in some other languages, but on the other hand, many of them are extremely well designed, and are pure pleasure to use. In a way, my feeling is that in Python, there is less garbage to chose from ☺.
I was recently playing with PIL (Python Imaging Library) and PyFT (Python Freetype interface) in various ways, and found astonishing how easily one can get a lot of quality out of relatively short Python programs (also provided, in this case, that both packages allow easy handling of alpha channels). This has been my experience all along with Python, and in various fields. Lot of results for modest investments in code which always stay legible.
I somewhat regret having discovered something like Python so late in my programmer life. On the other hand, I guess that younger people, lacking experience, may not fully appreciate the worth of the tool given to them…
3.5 Version fatigue
Some languages (think PL/I, some LISP implementations, and surely many others) are full of constructs people do not use. Each user has his/her own preferred subset, and it often happens that a program written by someone may not easily be read by someone else, without keeping the language reference manual nearby.
I essentially learned Python with version 1.5.2, and it was still true at the time that there was (about) only one way to do it, for most it. This was a good guarantee towards legibility. I'm now an happy user of Python 2.2.1, in which there is now many ways to do many things. I quite appreciate the novelties, they are often incredibly handsome. Yet, legibility (of others' programs) is getting impaired by the multiplicity.
To really maintain legibility in the long run, Python designers have to consider they have to clean a bit behind. Of course, You do not like it, so do not use it. conveys health and sanity. But we have to stay aware that this attitude might not be fully proper in the long run.
Some people find C to be a simple language, and also wonder about the simplicity of Python. For one, I do not find that C is a simple language. It has fairly intricate points here and there, and I guess there are very, very few people in this crowd, and in many other crowds as well, truly able to reply correctly to any question about C, the language, not even considering its libraries. C++ as a language is a slightly more regular than C, so it might be slightly easier in that respect, but it is also much more complex, so all summed, C++ is undoubtedly a difficult language.
No doubt to me that Python is an easier language, one can learn most of it quickly, and run quite a long while on that knowledge. But really knowing Python, deeply, inside out, is a challenge in itself: there is a great deal of details to know. So, despite Python allows us to be much more efficient and productive than with other languages, I would never go as far as saying that Python is fully easy.
There is a kind of fatigue which may apply to Python, by which a language becomes so featured over time that people may naturally come to limit themselves to a sufficient subset of the language and be perfectly happy, until they have to read the code written by guys speaking another subset of the language possibilities. Legibility becomes subjective and questionable. In the past, this has been true for some comprehensive implementations of LISP, and as I heard (but did not experience) for PL/I.
There was a time, not so long ago, when there was only one way to do it in Python, and this one way was the good way, necessarily. This is not true anymore, and we ought to recognise that this impacts legibility. Some recent additions are undoubtedly very nice, that really bring something new, and generators are of this kind. Even moving the furniture around may be very good, like for when the underlying mechanics get revised, acquiring power and expressiveness on the road, while keeping the same surface aspect. At least, people recognise the furniture, and could appreciate the new order.
Where it might hurt, however, is when the Python place get crowded with furniture, that is, when Python gets new syntaxes and functions above those which already exist, while keeping the old stuff around more or less forever for compatibility reasons, with no firm intention or plan for deprecation, and no tools to help users at switching from a paradigm to its replacement. This ends up messy, as each programmer then uses some preferred subset.
3.6 Module names
One suggests options to name an option processing module. My feeling is that Python should much avoid, for a library module, a name which is likely to be a user variable name. This would rule out options.
In my experience so far, the most irritating cases in Python hurding common words for itself have been string and socket. I know that some people write s for a string and would write o for options, but this algebraic style is not ideal. I find that using real words, like counter, ordinal, cursor, index or such, yields more readable programs.
When one imports a module, one has to give up using the module name for other purposes. Currently, I think all my callable scripts which handle options already use options for a variable name, so I would prefer that options be left alone.
This is why I think Python should not offer a module named text for example. As a principle for the future, let simple, common words be available to users for naming their own variables.
3.7 Execution speed
Some bluntly said, on the Python development mailing list, that Python is just not fast enough for more serious projects, and to wit, mentions that Python would not be up to the task of an HTTP daemon processing millions of hit per dey.
If one really needs more than 625 hits per second average, and look for Python developments that could help, I would guess that Python developers might be interested in them. But if the need is not real, than this discussion is merely discursive and academical, and might be better held elsewhere than on the developer mailing list.
I would be tempted to consider many architectural avenues, before bluntly deciding that the choice of Python as an implementation language is the problem. With millions of hits per day, I would most probably be big enough to have a flurry of other problems of all kinds, and then, Python would be more on the side of many solutions than the source of my problem.
The truth is that it might well depend on projects. Here, we have fairly serious, production-level projects, and Python is quite fast enough for them. And very dependable too, in our experience. A real blessing. Oh, we got a bit of profiling and tuning to do here and there, once in a while. Then, a few thoughtful algorithmic changes give us great rewards. Extensions are also possible.
Of course, speed increases in Python are welcome, but we should not let ourselves go frantic about speed. I guess that serious projects are often more interested in good and dependable results, and ease in development and maintenance, than in saving a few minutes or hours on a production run. So, for one at least, I think I like the current orientation, priorities and attitudes of Python developers about speed. They undoubtedly care about it, but intelligently, by protecting dependability and usability.
3.8 Writing efficiency
For one, I would want to use Python over C++ mainly because I can write and debug Python much faster than I would do in C++, and because I have more pleasure from the overall legibility of the resulting code, giving me the feeling that later maintenance is likely to be pleasurable.
The strong typing of C++ is surely a good side of the language, seen as a helping device for programmers, but speaking for myself, it is not enough of an incentive for switching, at least for most programs I write.
Memory is cheap, and computers are fast. I am expensive and slow ☺. So, the sacrifice you refer to does not hurt much, in practice, and Python is often the right solution. Moreover, whenever I'm really starving for speed, or memory, there are good Python-friendly solutions. The problem here is likely to properly consider and evaluate all the costs. One might discover that Python is surprisingly, radically economical.
4.1 Single or double quotes?
Many languages allow strings interchangeably delimited either by single quotes or double quotes, given the chosen delimiter is properly escaped within the string, of course. A few languages, like the Unix shell or Perl, make a distinction, by allowing string interpolation only when double quotes are used. Other languages, like SQL, use single quotes and double quotes to mean quite different things.
Being myself anal and needing a reason and motive for everything, I gave myself an artificial rule which I try to follow, but this is only a convention of mine, Python does not force me into anything. If the string contains natural language, and is to be ultimately read by a human, I use "". This is also a way for me to remember that the string is likely to be translated in internationalized programs. When if the string is meant for the machine or algorithm mainly, I use ''. Even if slightly fuzzy, this convention works nicely for me. I do prefer '' over "" in general as I find the former more readable, but the fact that the apostrophe is commonly used in human texts make "" more adequate in such cases. I usually prefer using \" within " strings and \' within ' strings, than to break my own convention.
If the string is text, or even sentence fragments meant to be read by a human, I use double quotes. If the string is anything else, a file name, a character, the name of an identifier, etc., I use single quotes. When in doubt, I ask myself: would this string be translated or not if the program was later internationalised? I agree with you that single quotes are less noisy, yet the fact is that single quotes often appear in human text (as apostrophes).
Some people tend to merely transport in Python habits coming from other languages. We need a very good reason for doing that, in my opinion, as reasons which are good in some languages are pretty meaningless in Python. For example, in Perl and shell, I use double quotes when I want to allow interpolation, and single quotes when I want to inhibit it: so I could translate this usage in using double quotes for formats and single quotes for the rest, but it looks to me that this is poor usage of the available difference. C mandates double quotes for strings and single quotes for characters, this distinction is not really useful nor worth in Python. Best is to develop a style which much more fit to Python idiosyncrasies.
4.2 Multi-line strings
The principle for single or double quotes generalizes to triple quoted strings, like those we often see for doc strings. I use triple double quoted strings for multi-line strings holding documentation or other human text. However, if a doc string contains formal grammar fragments, like with PLY or SPARK, I then ought to use triple single quoted strings. All in all, I find it to be quite a good convention.
For Python, I tend to use raw strings (the
r prefix to strings) consistently for regular expressions (especially with SPARK and PLY in these days ☺, even if there is no embedded backslashes. I use triple single quoted strings for grammar rules, and only in SPARK so far. I limit triple double quoted strings for long strings, meant to be read by a human, having embedded newlines; the first line is always written with an escaped newline immediately after the initial triple double quote, so the first line of text is flushed left, aligned with the others.
There is a trend, writing doc strings, at indenting all lines while writing them, and using some code to strip the indentation at run time. In fact, I rather like my doc-strings and other triple-quoted strings flushed left. So, I can see them in the code exactly as they will appear on the screen. If I used artificial margins in Python so my doc-strings appeared to be indented more than the surrounding, and wrote my code this way, it would appear artificially constricted on the left once printed. It's not worth. For me, best is to use """\ always while the opening triple-quote, and write flushed left until the closing """. As most long strings end with a new line, the closing """ is usually flushed left just as well. My opinion is that it is nice this way. Don't touch the thing! ☺
The $ PEP gave me a strange feeling. Either the PEP should contain a serious and detailed study about how % is going to become deprecated, or the PEP should design $ so nicely that it appears to be a born-twin of the current % format, long lost then recently rediscovered. The PEP should convince us that it would be heart-breaking to separate so loving brothers. Now, it looks like these two do not much belong to the same family, they just randomly met in Python, they are not especially fit with one another.
I have a hard time believing that % will effectively fade out in favour of $. As a few people tried to stress out, changes in Python are welcome when they add real new capabilities, but they are less welcome when they merely add diversity over old substance: the language is then hurt each time, loosing bits of simplicity (and even legibility, through the development of Python subsets in user habits). Each individual loss may be seen as insignificant when discussed separately but when the pace of change is high, the losses accumulate, especially if the cleanup does not occur.
This is why any change in current string interpolation should be crafted so it fits very naturally with what already exists, and does not look like another feature patched over other features. A forever transition period between two interpolation paradigms, foreign to one another, might give exactly that bad impression.
The ultimate goal of internationalisation, for a non English speaking user and even programmer, is to see his/her own language all over the screen. This means from the shell, from the system libraries, from all applications, big or small, everything. For what is provided by other programmers or maintainers, this may occur sooner and later, depending on the language, the interest of the maintainer, and the development dynamic. The far-reaching hope is that it will eventually occur.
For what a user/programmer writes little things himself/herself, and this is where Python pops up, there are two ways. The simplest is to write all strings in native language. The other way, meant to help exchange with various friends or get feedback from a wider community, is to do things properly, and internationalise even small scripts from the start. It is easy to develop such an attitude, yet currently, examples do not abound.
I surely had it for a few languages, despite it was rather demanding on me, at the time gettext was not yet available — and in fact, my works were used to benchmark various ideas before gettext was first written. The mantra I repeated all along had two key points:
- internationalisation will only be successful if designed to be unobtrusive, otherwise average maintainers and implementors will resist it.
- programmer duties and translation duties are to be kept separate, so these activities could be done asynchronously from one another.
I really, really think that with enough and proper care, Python could be set so internationalisation of Python scripts is just unobtrusive routine. There should not be one way to write Python when one does not internationalise, and another different way to use it when one internationalises. The full power and facilities of Python should be available at all times, unrelated to internationalisation intents. Non-English people should not have to pay a penalty, or if they do, the penalty should be minimised As Much As Possible.
Our BDFL, Guido, should favour internationalisation as a principle in the evolution for the language, that is, more than a random negligible feature. I sincerely hope he will do. For many people, internationalisation issues cannot be separated out that simply, or otherwise dismissed. We should rather learn to collaborate at properly addressing and solving them at each evolutionary step, so Python really remains a language for everybody.
About separation or programmer and translation duties, we've met those two goals only partly in practice. For C programs, the character overhead per localised string is low — the three characters
_(), while exceptionally not obeying the GNU standard about a space before the opening parenthesis. The glue code is still small — yet not as small as I would have wanted. I wrote the Emacs PO mode so marking strings in a C project can be done rather quickly by maintainers, and so translators can do their job alone. These are on the positive side.
But I think we failed at the level of release engineering, as the combined complexity of Automake, Autoconf, Libtool and Gettext installation scripts is merely frightening, and very discouraging for the casual user. There were reasons behind releng choices, but they would make a long story. Also, people in the development allowed more fundamental unneeded complexities, and which had to sad effect of anchoring the original plans to the point of being stuck. On the other hand, people not understanding where we were aiming, are happily unaware of what we are missing. Maintainers may become incredibly stubborn, when they have ideas. Eh, that's life… Sigh!
Python can do better on all fronts. By the way, I hope that distutils can be adapted to address internationalisation-related release engineering difficulties, so these merely vanish in practice for Python lovers. We could also have other standard helper tools for non-installed scripts.
- Explicit closing of files?
There is a recurring debate among Python users about if we should explicitly close files, in Python, or not. Many people have shown strong and diverging opinions, and I am not neutral in all this.
As most of us came to Python while already having other languages as background, and as many languages mandate explicit closes, we are prone to blindly applying the forcefully acquired aesthetics to Python. Moreover, there are genuine cases where explicitly closing files in Python is adequate. Also, depending on how you define what Python is, there are viewpoints from which explicit closes are always adequate.
On the other hand, the reference implementation of Python does timely automatic finalisation of files as for any other kind of objects. Most practical usages of files in Python programs could take advantage of this finalisation, yielding a significant increase in elegance and legibility, with no real loss in maintainability. This explains my position on the matter, as legibility and maintainability are Python virtues which much appeal on me.
- Python finalisation
Python, as created and maintained by Guido van Rossum and team is sometimes nicknamed CPython, to distinguish it from JPython in particular, the former being written in C, the later in Java. Now that JPython has been renamed Jython, I'll merely use Python to name the mainstream, reference implementation.
Python uses reference-counting for reclaiming dead objects, so when the last reference to a file is cut, that file gets automatically and immediately closed by the Python system. Consider:
buffer = file(NAME).read()
In the above statement, as the user does not keep a reference to the file, it gets automatically closed as soon as its contents have been read.
The Python Reference Manual says: Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether – it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.
Indeed, there are other implementations of Python which delay object finalisation. The classic example is Jython, which relies on the garbage-collector in the Java runtime system, so a file which is not explicitly closed by the Jython programmer, before the last reference is cut, will be closed in some indeterminate future by the garbage-collector. So, in Jython, the above example might better be written as:
handle = file(NAME) buffer = handle.read() handle.close() del handle
yet in practice, I guess Jython programmers would not worry about the
delstatement and merely let the closed file object lie around.
- Choosing Python
In the following code fragment:
a Python programmer knows that FILE has received DATA, all secured to disk, by the end of the statement.
Some say that this works purely as an accident, or artifact of the current implementation of Python, which may unexpectedly change behaviour one day, also insisting that Python is not defined that way, and that programmers should not depend on any particular features of garbage collection — not even on its existence.
This seems quite overcautious to me. Python fundamentally relies on reference counting since its inception, this is not going to change. While the Python garbage collector may be tuned or deactivated, reference counting may not be shut down, and will likely never be: the amount of breakage that would result all over is unthinkable.
The Python Reference Manual allows for alternative Python implementations to delay finalisation, and so, use other memory management techniques than reference counting. I guess this is so Jython (at the time the Reference Manual was amended for this), and then others, could use the name Python despite such differences. Moreover, the writer of Jython is also a member of the Python development team.
- Elegance considerations
There might be cases when one moves between Python and Jython, indeed. When one knows s/he works with Python only, it is good style to rely on the refcount behaviour, as it yields code which is not only more legible, but also more elegant and concise. It means that you understand and accept in advance having to revise your code if you ever want to use other implementations of Python, like Jython. As someone was pointing to me very recently, the Python reference tries to describe a common language, but there is no C-Python specific guide. If there was one, the refcount behaviour would most probably be described as dependable and reliable, even through future versions, as far as Python programming is concerned.
And besides, it seems the few implementations of Python do not support exactly the same language: extensions here may not be available there. Defining good style as the common subset of all Python implementations, and everything else as bad style, seems questionable. The only thing is that you have to be aware of the implications of your choices.
There is no common opinion what poor writing style is. Yet, that does not mean that the concept is vacuous. My mother often said to me:
Des goûts et des couleurs, on ne discute pas. Mais il y en a de meilleurs que d'autres…
(roughly translated: People should never discuss about what is good taste. Yet, some people have better taste than others.).
In practice, there is some common sense that could emerge, and on which we could rely, more or less. More than less, actually! ☺
For this Jython-forces-you-to-explicit-closes matter, my feeling is that Jython encourages bad style here, much more than it defines good style. Surely, there has never been a bad intent from Jython author. We understand the limitation comes from the fact Jython relies on the Java system for collecting garbage. One has to close explicitly in Jython for practical implementation considerations, this has nothing to do with good style, and does not define what good style may be.
This is surely good to explicitly
close()when one is done with a file, but for other reasons, needs to keep a reference on this file. Yet, the nicest is not keeping a reference to the file, whenever it can be avoided easily. Some consider this as laziness. This is no laziness at all, this is rather an active seek for legibility, with the code going straight to the essentials, and showing those essentials as clearly as possible.
Someone writes: The opened file (when not needed) may cause problems to other applications that try to do something with the same file. One should not consider any open file object as the exclusive property of the running application. Because of that, the file should be closed as soon as possible.
But this is exactly what happens. There is no problem, only fears. Once you get use to the paradigm, you discover that it is safe and dependable.
It is only natural that people, coming from other programming languages culture, want to use Python in a way that best reflect the idioms they already know. However, Python has its own paradigms, that can be put to good use for what they are, and independently of our past experiences, provided they fit our own values. My own values are mainly about maintainability, and this one alone implies legibility, compactness, simplicity, humility.
Depending on how s/he studied computers, the programmer may think that arguments are passed on a stack (conceptual or not), or he may just think with higher concepts and even ignore that a stack is involved. After a while, the first category comes to take for granted the stack cleanup, the same as Lisp people, for example, take the garbage collector for granted, and would never think at explicitly freeing cons cells. We may decide to "take for granted" a lot of other idioms, if we only choose to. It's not that difficult. After we make such choices, we discover the elegance we might have been reluctant to see. Still using Lisp as an example, I surely knew people which were extremely reluctant to see that Lisp was quite elegant, as they were sticking themselves at assembler-level thinking.
Elegance roots itself in that conceptual simplicity of automatic finalisation of various Python objects.
- Blind principles
A few programmers will close all files explicitly for blindly sticking to some principles, while not fully understanding the meaning of the principles they quote.
Explicit is better than implicit
This statement, part of the Python credo, is sometimes used as a justification for explicit closes, instead of letting the system take care of them.
But the system takes care of really many things implicitly, which do not honestly bother programmers otherwise. For example, Python local variables allocated within a function are automatically deleted on exit from that function, but no one in his proper soul would think about
del-eting all local variables. There is something implicit in recursion that could have been made explicit through a programmer-managed stack. And so on. So, the blind argument above just does not float; it looks more like a justification than a reason.
There is an equilibrium between explicit and implicit, and a sense of good taste goes in that equilibrium. Good taste in Python is not the same as for Perl or for C, we need to un-dust ourselves a bit when we switch between languages.
Never depend on the implementation
As various implementations of Python vary significantly, both for the language features they offer, and the libraries they rely upon, using an implementation is effectively selecting a dialect and a set of paradigms which prevent real portability between implementations.
It is a false feeling to think that, by not depending on this particular aspect of Python, Python programs will truly become portable. Taking Jython as an example, this particular implementation itself happily pretends to correct many deficiencies of Python, lags significantly behind Python development, and invites programmers to depend on many Java libraries and classes which are unavailable to Python. There is nothing inherently wrong in all this, except in the false feeling that Python and Jython are the same language. When one is choosing one of them, s/he is definitely not choosing the other.
Porting from Python to Jython or another variant, or vice-versa, involves many issues and there is an unavoidable cost to porting, while open file concerns is only a fraction of that cost. To guard against all such issues in Python for any program that might never be ported is likely to cost much, much more.
Python itself, the mainstream dialect, has been ported to many platforms, and there lies real Python portability. The language is defined by its implementation, programmers do not refrain from using Python specificities, and depending on permanent characteristics of one given implementation. The behaviour on timely finalisation is there to stay. We may depend on it. So, in the case at stake here, the statement of the principle above is more rhetorical than practical.
It is bad programming practice
The Jython documentation states that depending on timely finalisation is bad programming practice, and some people blindly translates this statement to Python itself. Wrongly, of course.
Others assert that programmers are going to be bitten, sooner or later, because functions might unexpectedly keep references in a cache, or create reference cycles preventing finalisation, or such other mysteries. They assert their gut feelings and experience, saying that relying on Python automatic finalisation is not serious, sloppy, etc., and then go great length at trying, but not that successfully in my opinion, at substantiating their emotion.
Often, experience might come from languages other than Python. We might be careful when we extend our experience from one language to another, in recognising the specificities of the new languages. Our experience is often helpful, but in some cases, if we are not careful, it may blinds us. Best practice might well be to avoid noisy code.
The truth is that Python does not play such trickeries on programmers. Programmers themselves, on the wide average, are competent and careful enough that they have a rather good idea of what their programs are doing, and they can feel themselves in peaceful and confident control.
What happens when it is not possible to write into the 'stuffile'. […] Because of the brevity of the code it is also not so explicitly visible what the
write()will do when the file cannot be opened.
Python will traceback, magically, wonderfully, with adequate information.
- Other notes
In the Language Reference, section 3.1 talks about finalization and garbage collection as implementation specific, and even strongly recommends using methods such as
close()on objects that involved external resources, but indirectly circumstantiated by but since garbage collection is not guaranteed to happen…. In C-Python, currently, such a guarantee exists. I was confirmed that Python will never give up refcounting.
Some rightly argue that if the finalization was postponed, that exception might crop up in the strangest place. Yet, in many cases, we know that the finalization is not going to be postponed, and in such cases, I think we might avoid making it explicit.
It is generatlly considered bad programming practice depending on features which the manual claims may not exist, and I heartedly agree that it is bad practice not keeping oneself between the tracks set by specifications. Without really knowing, I imagine that the Python documentation on this topic might be a politeness from Guido towards other implementations, encouraging them, so they could claim being called Python nevertheless. I wonder if the quote has been there since the beginnings of Python. ☺ The current (C-)Python implementation of reference counting, where objects are collected as soon as they stop being reachable, is of high quality on that respect, and after checking with knowledgeable people, I got that the dependability of refcounts could be considered as cast in stone, exactly like if it has been documented as such. If I did not get this confirmation first, I would never have started to use things like
open(FILE, 'w').write(CONTENTS). Now, I really see it as part of (unwritten) specifications of (C-)Python, and perfectly legitimate.
Because Jython exists, which uses delayed garbage collection, some consider that I am trading off safety for saving a line or two. But I'm not currently using Jython, and doing
lines = open(file).readlines()looks perfectly safe to me, I'm not trading any safety. The goal of using such writing is not saving a line or two, but writing more legibly, by eliminating unnecessary noise. Saving lines is not very important. Saving spurious lines is, however. All those things are quite debatable, there is some aesthetic considerations associated with legibility; moreover, aesthetics do not alway yield to legibility. I think Python is nice because it has one artist in charge, able to impose a common view. If it was democratic, it would loose much.
If immediate finalization was documented to say that it can be relied on, now and forever, this would be ideal, indeed. Despite my current choices and what I was told, there is always a tiny, unlikely risk that Python changes his behaviour on this, despite what I was told, and I would surely prefer not have to revise the code I wrote. But if I ever have to revise it, for Jython or otherwise, I will be happy having enjoyed the current legibility and simplicity for a long while! Once again, I would not have started depending on immediate finalisation before getting a confirmation of some sort that it is dependable. On one hand, I've always have been strict, anal, at using only documented features, in all systems and languages I've used in my life. On the other hand, in this precise case, the advantages were so appealing that I felt compelled to ask for clarification about the dependability of immediate finalisation.
- Non-ASCII identifiers?
There is a lot of in-house development, not meant to be exported, that would be so much more comfortable if we could use our own language while programming. Many years ago, we experienced that university-wide, by modifying the Pascal compiler so we can use French identifiers whenever we feel like it (as well as a lot of other software and even hardware), and we kept modifying compilers while new Pascal versions were released. Moving on to other sites and languages, my co-workers and I did not try redoing such patches all the time, everywhere. Yet, I would deeply like that Python be on our side, and favourable at restoring our Lost Paradise.
One argued with me, sayint that /"it would be nice" is not a strong-enough rationale for such a change/. But this argument does not stand the road. A great deal of recent Python changes were to make it nicer in various ways. None were strictly unavoidable, the proof being that Python 1.5.2 has been successfully used for many things, and could still be. We should not merely vary the height of the strong-enough rationale bar depending on our own tastes, as this merely gives a logical sounding to relatively pure emotions.
Having the capability of writing identifiers with national letters is not going to break other people's code, this assertion looks a bit like gratuitous FUD to me. Existing code is not going to break, as long as English identifiers stay a subset of nationally written identifiers. Which is usually the case for most character sets, Unicode among them, allowing ASCII letters as a subset of all letters. OK, there might probably be transient implementation bugs which are normal part of any release cycle. Python has undergone changes which were much deeper and much more drastic than this one would be, and the fear of transient bugs has not been a stopper.
If many people had experienced the pleasure of naming variables properly for their national language while programming, I guess most of them would be rather enthusiastic proponents on having this capability with Python, today. As very few people experienced it, they can only imagine, without really knowing, all the comfort that results. Python is dynamic and interesting enough, in my opinion, for opening and leading a worth trend in this area.
Some see an advantage in forbidding non English speaking people from using Python, as it prevents people to write code that cannot be universally exported. People can understand two different, orthogonal things in this issue: keywords and user identifiers. I'm not really asking that keywords be translated, because Python keywords and syntax are modelled after the English languages. This may be debated of course, but is a lower priority issue.
However, identifiers created by local programmers, and especially identifiers naming functions or methods, should be writable in national language without forcing us to make orthographical mistakes all over (I usually choose English identifiers over disgustingly written French identifiers).
There is a background irritation at not being able to program in my own language, this irritation is permanent and never fades out — a bit like the fossile radiation after the big bang! ☺ I surely like Python a lot, but I would like it even more if it was on the side of programmers of all nations, and not forcing everyone to wide portability: there are many cases where planetary portability is just not a concern.
All development is done in house by French people. All documentation, external or internal, comments, identifier and function names, everything is in French. Some of the developers here have had a long programming life, while they only barely read English. It is surely a constant frustration, for some of us, having to mangle identifiers by ravelling out their necessary diacritics. It does not look good, it does not smell good, and in many cases, mangling identifiers significantly decreases program legibility.
Now, I keep reading strange arguments from people opposing that we use national letters in identifiers, disturbed by the fact they would have a hard time reading our code or publishing it. Even worse, some want to protect us (and the world) against ourselves, using made up, irrational arguments, producing false logic out of their own emotions and feelings. They would like us to think, write, and publish in English. Is it some anachronical colonialism? Quite possible. It surely has some success, as you may find some French people that will only swear in English! ☺
For one, in my programming life, I surely chose to write a lot of English code, and I still think English is a good vehicle to planetary communication. However, I like it to my choice. I always felt much opened and collaborative with similarly minded people, and for them, happily rewrote my things from French to English in view of sharing, whenever I saw some mutual advantage to it.
I resent when people want to force me into English when I have no real reason to do so. Let me choose to use my own language, as nicely as I can, when working in-shop with people sharing this language with me, for programs that will likely never be published outside anyway. Internationalisation is already granted in our overall view of today's programming, as a way for letting people be comfortable with computers, each in his/her own language. This comfort should extend widely to naming main programming objects (functions, classes, variables, modules) as legibly as possible. Here, I mean legible in an ideal way for the team or the local community, and not necessarily legible to the whole planet. It does not always have to be planetary, you know.
For keywords, the need is less stringent, as syntactical constructs are part of a language. When English is opaque to a programmer, he/she can easily learn that small set of words making the syntax, understanding their effect, even while not necessarily understanding the real English meaning of those keywords. This is not a real obstacle in practice.
Some suggested that we create our own Frenchified version of Python that lets us use all the characters we want, for our own in-house use.No doubt that we, you and me and everybody, could all have our own little version of Python. To tell all the truth, the very topic of this suggestion has already been discussed in-house already, and the decision has been to stick to Python mainstream. We could not justify to our administration that we start modifying our sources, in such a way that we ought to invest maintainance each time a new Python version appears, forever. On the other hand, we may reasonably guess that many people in this world would love being as comfortable as possible using Python, while naming identifiers naturally. It is not so unreasonable that we keep some hope that Guido will soon choose to help us all, not only me.
It is true that many Python tools are not prepared to handle internationalised identifiers, and it is very unlikely that these tools will get ready before Python opens itself to internationalised identifiers. Let's open Python first, tools will undoubtedly follow. There will be some adaptation period, but after some while, everything will fall in place, things will become smooth again and just natural to everybody, to the point many of us might remember the current times, and wonder what was all that fuss about. ☺
for x in 1, 2, 3: pass
because the in is nothing more than a mandated keyword. On the other hand, it always saddened me a little bit that I may not write:
if x in 1, 2, 3: pass
because the in here is an operator, and as such, has its priority relative to a comma. It might not be easy adjusting this in Python without breaking anything.
Some people prefer to use parentheses for tuples all the time. Whenever language I use, I try to avoid abusing parentheses, as using them often leaves me with the bad feeling I do not know the language enough, and that I write it fuzzily, adding parentheses /just in case/… ☺
Of course, for many languages, parentheses could be effectively used to increase legibility. But for marking tuples, Python does not need them in most cases, it may be more noisy than useful having them to mark tuples when it is not required to do so. Python sources are cleaner without such spurious parentheses, and it is a good habit learning to avoid them. There is one case, however, where using parentheses with tuples is worth doing, and this is for long lists of tuple elements spanning more than one line. Parentheses combined with proper indenting help the eye at catching the overall structure. For short tuples that fit on a line, parentheses are overkill to me.
5 Tips and tricks
Not much in that area yet. One has to start somewhere! ☺
People ask me, once in a while, why the asterisk in the above construct, which I use rather systematically in my Python programs. Once again this morning, someone reported that as a possible typo, asking if this writing was normal.
Yes indeed, that's normal. It corresponds to the asterisk in:
def main(self, *arguments):
The idea is a convenience for whoever calls main interactively or from elsewhere. Arguments are given separated, a bit like when using a shell:
main('arg1', 'arg2', 'arg3')
instead of having to be:
main(['arg1', 'arg2', 'arg3'])
This construct is very common for me. My old Pynits tool, which I wrote to nit-pick on my own Python sources, considers as an error when a source file is empty. If I ask Pynits to attempt an auto-correction for this error, it generates the skeleton of a Python program which, of course, uses the above idiom.
- hpk42 / pycmd — Bitbucket
- I say things » Blog Archive » line-by-line memory usage of a Python program
- Pygments — Python syntax highlighter
- blessings 1.5 : Python Package Index (curses replacement)
- Clint, Command Line Library for Python
- pyBead | Free Development software downloads at SourceForge.net (Remoting code through Web servers, like a kind of Pyro with SSL)
- Python Package Index : Pygments 1.4 (syntax highlighting package)
- kennethreitz/clint - GitHub (command line interface tools)
- kasun/YapDi - GitHub (daemons)
- tomerfiliba/plumbum (attempt at a Python shell)
- Plumbum: Shell Combinators and More — Plumbum: Shell Combinators
- Python Object Sharing (POSH)
- PyXR 0.9.4.13 Readme
- Pour vérifier tous les fichiers Python d'un répertoire:
flymake-helper -f8 $(file * | grep python | sed s/:.*//) 2>&1 | tee /tmp/GREP; gvim -q /tmp/GREP
Tout récemment, je me suis fait une opinion sur quelques outils qui font automatiquement une critique du code Python déjà écrit. Ces outils sont (dans l'ordre où je les ai découverts): pylint, PyChecker, pep8 et pyflakes. Ils sont tous directement disponibles sous Ubuntu.
Le programme pylint est extrêmement critique, et il possède une tonne d'options pour contrôler sa verbosité. Je n'ai pas vraiment l'impression que, sans pas mal d'ajustements dans ces options, je puisse parvenir à des compromis qui me permettraient tout à la fois de hausser la qualité de mon propre code, et de complètement satisfaire cet outil.
Le programme pychecker est bien plus raisonnable, quoiqu'il importe le code qu'il analyse, ce qui peut avoir pour effet de bord désagréable d'en exécuter un partie de manière inopportune.
Quant à pyflakes, il remplace assez bien le précédent, sans devoir importer ou exécuter le code. Il m'a aussi semblé beaucoup, beaucoup plus rapide. Mais l'outil ne connaît pas vraiment Python 3 — et c'est à voir pour les autres…
L'outil pep8 fait une vérification beaucoup plus élémentaire, et souligne les déviations d'avec le standard PEP 8, qui dicte comment disposer raisonnablement des sources Python. Je ne suis pas complètement d'accord avec toutes les conventions du PEP 8, mais pour fins d'uniformité, je veux bien en suivre l'essentiel. Un point me chagrine particulièrement: alors que les standards de GNU suggèrent de placer l'opérateur au début de la ligne de continuation, PEP 8 suggère plutôt de le placer à la fin de la ligne continuée, un mauvais choix au niveau de la clarté et de la lisibilité.
Il me semble assez utopique d'espérer une discipline dans les grandes choses lorsqu'on ne réussit pas à en avoir dans les petites, et c'est vrai dans toute forme d'écriture, y compris l'informatique. Certains gros projets auraient grand avantage à développer au moins de petites uniformités et disciplines, celles du PEP 8 par exemple. Des outils, comme ceux cités plus haut, pourraient être profitables et formateurs. Au niveau de l'habitude de la rigueur, il faut bien commencer quelque part! ☺
Selon leur documentation, il semble que pylint et pep8 sont extensibles et capables d'intégrer des caprices supplémentaires, et comme j'en ai quelques-uns, ça n'est pas pour me déplaire ☺! Il faudra que je regarde tout ça de plus près un moment donné. Entre-temps, j'intègre pep8 et pyflakes dans mes habitudes, et pylint sous option. Voici les recettes que j'utilise présentement D'abord, un script accessoire, exécutable:
#!/usr/bin/env python3 # François Pinard <email@example.com>, 2012. """\ Help Emacs to run flymake. Usage: flymake-helper [OPTION]... FILE... Options: -8 run pep8 -f run pyflakes -l run pylint Run linters on all FILEs and collect diagnostics on standard output. If none of -8fl, -f is assumed. """ import os import subprocess import sys pep8org = os.path.expanduser('~/etc/bin/pep8org') pyflakes = 'pyflakes' pylint = 'epylint' class Main: pep8 = False pyflakes = False pylint = False def main(self, *arguments): if not arguments: sys.stdout.write(__doc__) sys.exit(0) import getopt options, arguments = getopt.getopt(arguments, '8fl') for option, value in options: if option == '-8': self.pep8 = True elif option == '-f': self.pyflakes = True elif option == '-l': self.pylint = True if not (self.pep8 or self.pyflakes or self.pylint): self.pyflakes = True for argument in arguments: if self.pep8: subprocess.call([pep8org, '--repeat', argument]) if self.pyflakes: subprocess.call([pyflakes, argument]) if self.pylint: subprocess.call([pylint, argument]) main = Main().main if __name__ == '__main__': main(*sys.argv[1:])
Je place ce script dans
~/etc/bin/flymake-helper, qui est mentionné explicitement dans quelques fragments de code Emacs Lisp situés dans
~/.emacset qui suivent. Il faut évidemment corriger en fonction de l'endroit que vous-mêmes choisiriez:
(defvar fp-pep8-flag t) (defun fp-toggle-pep8 () (interactive) (setq fp-pep8-flag (not fp-pep8-flag)) (let ((buffer (current-buffer))) (when (and buffer (eq (buffer-local-value 'major-mode buffer) 'python-mode)) (flymake-start-syntax-check)) (message (concat "PEP8 " (if fp-pep8-flag "active" "inactive"))))) (defvar fp-pyflakes-flag t) (defun fp-toggle-pyflakes () (interactive) (setq fp-pyflakes-flag (not fp-pyflakes-flag)) (let ((buffer (current-buffer))) (when (and buffer (eq (buffer-local-value 'major-mode buffer) 'python-mode)) (flymake-start-syntax-check)) (message (concat "pyflakes " (if fp-pyflakes-flag "active" "inactive"))))) (defvar fp-pylint-flag nil) (defun fp-toggle-pylint () (interactive) (setq fp-pylint-flag (not fp-pylint-flag)) (let ((buffer (current-buffer))) (when (and buffer (eq (buffer-local-value 'major-mode buffer) 'python-mode)) (flymake-start-syntax-check)) (message (concat "pylint " (if fp-pylint-flag "active" "inactive"))))) (defun fp-flymake-py-init () (let* ((temp-file (flymake-init-create-temp-buffer-copy 'flymake-create-temp-inplace)) (local-file (file-relative-name temp-file (file-name-directory buffer-file-name)))) (let ((arguments (list local-file))) (when fp-pep8-flag (push "-8" arguments)) (when fp-pyflakes-flag (push "-f" arguments)) (when fp-pylint-flag (push "-l" arguments)) (list "~/etc/bin/flymake-helper" arguments)))) (add-to-list 'flymake-allowed-file-name-masks '("\\.py\\'" fp-flymake-py-init)) ;; Python files do not always end in ".py". (defadvice flymake-get-file-name-mode-and-masks (around fp-flymake-get-mode activate) "Python files do not always end in `.py'." (let ((buffer (get-file-buffer file-name))) (if (and buffer (eq (buffer-local-value 'major-mode buffer) 'python-mode)) (setq ad-return-value '(fp-flymake-py-init)) ad-do-it))) (global-set-key "\C-cp7" 'fp-toggle-pyflakes) (global-set-key "\C-cp8" 'fp-toggle-pep8) (global-set-key "\C-cp9" 'fp-toggle-pylint)
Dans mon cas, avec le code juste cité, les commandes Emacs C-c p 7 et C-c p 8 désactivent au vol les diagnostics émis par pyflakes et pep8, ou les basculent ensuite. La commande C-c p 9 active au vol les diagnostics émis par pylint, ou les bascule ensuite.