XML
Table des matières
1 .
You may enjoy reading Erik Naggum's classic rant.
2 XML among markup languages
Some people consider XML (eXtensible Markup Language) for a standardisation device, let's say for parsing, validation, editing, querying, transformation or storage. XML tools and processors are indeed available for a variety of systems and programming languages. XML serves a purpose indeed, but XML being in the fashion, it is mandated far too often that it should, nowadays. One should resort to XML only for applications needing to handle complex structures between heterogenous and incompatible systems and languages.
There are XML-fanatics — the same as there are Unicode-fanatics ☺. A while ago, I've seen a pushy trend by which /etc/passwd and a lot of other simple system files would be reformulated as XML, to be XML all over. (Maybe after an announcement of Microsoft — real or made-up, I do not know — that they aim such a thing for the next MS-Windows.)
standardised is like a buzzword, here. This is no more advantage being XML-standardised for the only sake of being XML-standardised than going XML for the only sake of going XML. There should be a real, sounded, clear reason to go XML for an application, and administrators should firmly resist XML until such reason has been demonstrated.
Note that if I'm speaking about Python, below, this is merely as the language I most often use in these times, but the same considerations might likely apply to many other languages.
2.1 Simple structures
Any file is a hierarchy of some sort. We often see a file being a sequence of lines, a line being a sequence of fields or tokens, and tokens being a sequence of characters. In many, many, really many applications, this organisation in lines and fields is wholly satisfactory. Reusing the enumeration above, it is easy to parse, easy to validate, easy to edit, easy to query, easy to transform and easy to store. Let's be honest. People are comfortable with lines and fields, examples and tools merely abound. XML becomes more sensible when you have a lot of structure, something which is complex, difficult, and which you have to exchange with away parties. For simple things, it is just annoying and heavy overkill, really…
Some might wonder what benefits could be gained from coming up with one's own formats. For simple things, this is obvious: ease, speed, simplicity, readability. Don't fear it. The world will survive, you know, even if you sometimes don't use XML. ☺
2.2 Configuration files
As Guido van Rossum once put it:
- The most important user of a config file is not the programmer who has to get data out of it; the most important user is the user who has to edit the config file. The outrageous verbosity of XML makes the above example a complete usability liability.
- Now, if you're talking about config files that represent options that the user edits in a convenient application-specific options dialog, that's a different story; I think XML is well-suited for that; but I'm talking about the classic configuration file pattern where you use your favorite flat-file text editor to edit the options file. In that situation, using XML is insane.
XML may sometimes be useful when a configuration has to be a tree of thick (contents heavy) nodes. For simpler configurations files, and this covers a flurry of cases, Guido above expresses my feelings exactly.
We should not go overboard with XML. Configuration files with lines and words for structuring of a two-level hierarchy have long proven their value. ConfigParser .ini files add a third level to these. and with good sense, we can still go another long way without resorting to XML.
Besides, when configuration files are going to have some more complex structure, but explicitly made to drive big applications written in Python — a common case for most of us presumably — I found out that Python itself is a very convenient and powerful tool for expressing such parameterisation. Users can easily modify a carefully designed Python-written configuration template without knowing a bit about Python itself. If they know the concepts of the applications they are configuring, in my experience at least, they quickly get what needs to be grasped about the bits of syntax to respect and follow. With proper care, this can be made incredibly powerful, while staying way more legible than XML. Also compare speed and simplicity, for the application programmer, for bringing such configuration in memory, all typing concerns already addressed.
Of course, XML better inter-operates with foreign programming languages and systems, and would undoubtedly allow easier communication with alien races from neighbouring galaxies ☺. But deep down, I do not think that this need in configuration neutrality is common enough to warrant the added complexity, and loss of legibility, for most practical cases.
2.3 Homogeneous environments
As a Python lover, I'm tempted to prefer easy and legible over standardised wherever I can, wherever they contradict one another. People who consider XML as legible most usually confuse the meanings of legible and decipherable, but they are still speakable. On the other hand, XML fanatics confuse much more widely and inconsiderately.
Speaking for my own situation only, as a Python lover, XML is gross overkill even for quite complex things. It is extremely simple to pickle rather complex structures, transmit them over wires to applications on other machines, and unpickle them there. Using Python as an API for such usages is natural and very comfortable, and not to say, immensely faster than XML.
Of course, I would prefer XML is I had to speak outside a Python environment, with people offering nothing simpler than an XML interfaces. I've looked into some of these fashioned avenues. So far, they invariably seem extremely complex and hairy to me, at least for what they provide. XML is there to give users a reinsurance on the fact they have a last-resort control, after all, to inspect what is going on, or to intervene if they ever need to. So, for them, I really understand how valuable XML may be. It's a good thing.
In my simple situations, Python is much, much better than XML as a solution. Moreover, if I have no choice than communicate with an outside world groking XML, Python offers me a good set of XML tools and interfaces ! For one, when I really need a marking language for my users, without having XML imposed from the outside, SGML is often a better solution, as it is closer to humans than XML. There might also be better solutions than SGML, too. XML is mainly there to help implementors. Surely, I like humans far more than I like machines, and this feeling mainly drives my efforts. ☺
Some people argue that XML is more readable than pickles. I can read a pickle with Python and dump it as a readable, pretty-printed Python structure. Conversely, I can have Python to read in a text containing the source of a Python structure, and produce a pickle from the result. The programs to do so are very small. In practice, I do not maintain my original structures as pickles, but rather as very straight Python source files containing about nothing but data structures. These are easy to edit.
For most people, even to those not familiar with Python, a Python structured constant is probably easier to read that any XML rendering of it. Python will parse and validate it for me. I can save the original text, or the pickle if I prefer so, in files and databases, and transmit either over networks. Python, the language, is also a wonderful and probably unequalled generic framework for transforming data structures.
About being compatible with nearly any language off-the-shelf (as someone put it), of course, this is where some difficulty may rise. Until I have this problem, I'll rather stay comfortable with Python than push myself into miseries I do not need. Then, I could decide to transmit either lines and fields, or XML if available at the other end, or even analyse and produce source or data files for the other language, probably all from within Python.