François Pinard's site

Pyrex


1   What is Pyrex?

Pyrex is a kind of Python compiler, written by Greg Ewing, which produces C code. That C generated code allows for both-way interfacing between real Python code and any C module. However, Pyrex does not exactly compile pure Python. First, there are a few missing features from the Python; yet in my experience, this does not create a problem in practice). Second, one may decide to declare, using C-like specifiers, a fixed type for some variables, arguments or functions. Such declarations, when they exist, let Pyrex produce speed optimisations.

For interfacing between Python and C, Python documents an API for C. Such interfaces are somehow cumbersome to write perfectly, as tiny errors (particularly in the area of reference count administration) could produce far-reached and asynchronous effects, the kind of bugs where one might loose some hair.

Pyrex invites you to think in C (yet also in Python) while using Python syntax, a rather elegant compromise in my opinion. For me, besides allowing nearly-Python syntax while thinking in C, Pyrex automatically and safely takes care of this error-prone reference administration. These two features combine into an immense advantage.

While it is true that Pyrex could be used to speed up Python, I see it more as an elegant and simple way to build interfaces from Python to C libraries or C modules, and vice-versa. In my applications at least, the speed-up of Python-style code is welcome of course, but incidental.

I got success with Pyrex as a way to recover about all of the C speed, once analysed the bottlenecks, without actually writing any new C. So, the argument that Python is slow does not hold so much anymore.

All in all, Pyrex is amazingly well thought, this is a wonderful tool.

2   Miscellaneous notes

  • Documentation
    • Mentionner la possibilité pour embedding
      • 2002-07-25 Okay, I'll do that
  • Syntaxe Python
    • class Erreur(Exception): pass non accepté
      • 2003-03-12 On my list
    • try, clause else non acceptée
      • 2003-03-09 On my list
    • fiches_CP[:] = []
      • 2003-02-27 I will investigate
  • Génération
    • warning: assignment from incompatible pointer type
      • 2003-03-08 Soumis
      • 2003-03-09 … that is worrying. I'll check it out

3   Cheat sheet

The following set of facts is a reminder of the real documentation, and in no way a replacement for it.

3.1   Installation details

3.2   Python aspects

  • Permanently missing
    • import *
    • yield
    • def nesting
    • class not at top level
  • Temporarily missing
    • Unicode constants
    • class or def within statements
    • incrementing operators
    • list comprehensions
  • Other details
    • Unless declared, refer to "builtin" scope, not module scope
    • Built-in functions recognized, not looked up:
      • abs delattr dir divmod getattr getattr3 hasattr hash intern isinstance issubclass iter len pow (3 args) reload repr setattr

3.3   C aspects

  • Special constants
    • c prefix to character literals
  • Automatically converted CTYPE
    • [unsigned] {char|short|int|long}
    • float|double|long|long double
    • char *
  • Textual replacements
    • -> becomes .
    • *p becomes p[0]
    • (TYPE)var becomes <TYPE>VAR (do not mix with Python objects)
  • Pointers
    • & and NULL as in C
    • NULL is a keyword
  • Functions
    • No ... except for declaring a functional argument
  • Other details
    • Fully avoid const
    • Py_ssize_t predefined
    • __cdecl and __stdcall usable in Windows context

3.4   Pyrex pre-processor

  • DEF VARIABLE = EXPRESSION
    • Pre-defined DEF variables:
      • UNAME_SYSNAME UNAME_NODENAME UNAME_RELEASE UNAME_VERSION UNAME_MACHINE None True False abs bool chr cmp complex dict divmod enumerate float hash hex int len list long map max min oct ord pow range reduce repr round slice str sum tuple xrange zip
  • IF EXPRESSION:
  • ELIF EXPRESSION:
  • ELSE:

3.5   Pyrex simple statements

  • Notes on C types
    • object implied when none of void char int float
    • CTYPE * for pointers
    • CTYPE[LENGTH] for arrays
  • cdef [extern|public] CTYPE VARIABLE [CNAME],
    • Unless extern, allocation occurs in generated module
    • With extern or public, names are given unmodified to linker
    • If any public, PACKAGE.NAME.h gets generated
  • ctypedef CTYPE NEW_TYPE
  • include STRING
  • cimport MODULE,
  • from MODULE cimport NAME [as NAME],

3.6   Pyrex compound statements

  • cdef [public] [api]:
    • cdef implied for all declarations
    • If any api, PACKAGE.NAME_api.h gets generated
  • cdef extern from "FILE.h":
    • cdef extern implied for all declarations
    • Declarations meant for Pyrex, not for generated C code
    • Use pass if the only goal is generating #include
  • cdef extern from *:
    • No #include generated
  • {cdef|ctypedef} [public] {struct|union|enum} [NEW_TYPE [CNAME]]:
    • For enums, constant [= VALUE], separated by \n or ,
    • ctypedef where C uses typedef {struct|union|enum} …
  • {cdef|ctypedef} class EXTENSION_TYPE:
    • May be sub-classed from within Python
    • Class attributes using cdef are reachable from Pyrex only
    • __SPECIAL__ attributes may sometimes work differently
  • def FUNCTION(ARGUMENTS):
    • Accept and return object only
    • Callable from a Pyrex module or through Python import
    • ARGUMENTS:
      • NAME — Python generic
      • CTYPE NAME
        • May only be int ou char *
        • Translated, then fixed
      • * — Means no more positional arguments
  • cdef [extern|public] CTYPE FUNCTION [CNAME] (ARGUMENTS) [nogil|with gil] [EXCEPTION_VALUE]:
    • Accept and return any C type
    • Callable from a Pyrex module, or through C linker
    • EXCEPTION_VALUE
      • empty: an error within function will not be propagated
      • except VALUE: VALUE only if an exception was raised
      • except ?VALUE: VALUE might mean an exception occurred
      • except *: have to explicitly check if an exception occurred
  • for VARIABLE from INTEGER <[=] VARIABLE <[=] INTEGER:
    • <[=] may also be >[=]
    • Allow for break, continue, else
  • with nogil:

3.7   Useful declarations



   cdef extern from "Python.h":

     object PyString_FromStringAndSize(char *, int)



   cdef extern from "Numeric/arrayobject.h":

     struct PyArray_Descr:

       int type_um, elsize

       char type

     ctypedef class PyArrayObject[type PyArray_Type]:

       cdef char *data

       cdef int nd

       cdef int *dimensions, *strides

       cdef object base

       cdef PyArray_Descr *descr

       cdef int flags

     void import_array()

4   Usage notes

I would not hesitate to use a global variable to get the effect of a class variable. When one writes Pyrex, one has to think assembler (or C!) a bit. Pyrex is the good tool for establishing a compromise between Python elegance and C speed, so I'm also ready to think in compromise mode when using it.

If we want the feel that we deeply understand the relation between the Pyrex special constructs and the speed we can get from them, we have to get used to the exact meaning of these constructs, and look at the generated C code once in a while. That's easy and that's worth. Moreover, I surely learned many good things about Python internals by using Pyrex as a teacher. ☺

5   Embedding Python in C

One nice possibility of Pyrex, which is not documented enough in my opinion, is its capabilities for embedding Python within what would otherwise be a pure C application. Instead of long explanations, let me take an excerpt of a real example. The following is statique.pyx, it links and uses ammodule.pyx, compiled separately. That could have been importable Python code just as well. Of course, main_application may be anything::



  cdef extern void Py_Initialize()

  cdef extern void Py_Finalize()

  cdef extern void initstatique()



  cdef extern int main(int argc, char **argv):

      # This function should not use any Python object.

      cdef int status

      Py_Initialize()

      initstatique()

      status = main_wrapper(argc, argv)

      Py_Finalize()

      return status



  cdef int main_wrapper(int argc, char **argv):

      # Python objects are safe to use from now on.

      cdef int index, status

      cdef char *argument

      try:

          sys_argv = []

          for index from 0 <= index < argc:

              sys_argv.append(argv[index])

          import sys

          sys.argv = sys_argv

          result = main_application()

          raise SystemExit, result or 0

      except SystemExit, result:

          result = str(result)

          try:

              status = int(result)

          except ValueError:

              if not result.endswith('\n'):

                  result = result + '\n'

              sys.stderr.write(result)

              status = 1

      except:

          import traceback

          traceback.print_exc()

          status = 1

      return status



  cdef extern void initammodule()



  def main_application():

      initammodule()

      import ammodule as am

      [... rest of main application ...]

I sometimes abuse this feature to statically link, in view of gdb debugging. In the case above, it was just some code to trigger a bug I wanted to explore. The initammmodule() call is needed here because of static linking, but is seemingly not required when dynamically loading through import. Here is an example of the actual gcc and link calls::



  pyrexc statique.pyx

  pyrexc ammodule.pyx

  gcc -g -O -fPIC -I/usr/include/python2.2
 -I. -g -c -o statique.o statique.c

  gcc -g -O -fPIC -I/usr/include/python2.2
 -I. -g -c -o ammodule.o ammodule.c

  gcc -o statique statique.o ammodule.o [other .o files or .a files] \

    -Xlinker -export-dynamic -L /usr/lib/python2.2/config
 -lpython2.2 \

    -ldl -lpthread -lutil -lm [other -l libraries]

  ./statique

In fact, the actual setup I used is a bit more simpler for my users, as a site-wide common Makefile hides the pyrexc calls and linking mechanics within builtin rules. We also once used a home-grown pre-processor for Python code which takes care of part of the main expansion, above.

6   Python-free applications?

The Recodec project uses Python for prototyping what could become the next major version of my older Recode project, itself written in C, and which is traditionally installed as a single Python-free, big executable. The migration plans from Recodec to Recode are still fuzzy. One avenue to ponder is Pyrex, if I could find a way to cut the wires between Pyrex and the Python runtime library.

While Pyrex generates C code, it includes many things for Python module administration, reference counting, and library calls. Extracting the pure C code out of this might be difficult. I sometimes dream that it would be possible to provide a module containing many stub routines or minimalistic routines, for representing the Python library, but I do not know yet how possible this could be.

For the next Recode, Pyrex would then allow me to stick to the pleasure of Python-like syntax, of course. Here, I could build and debug prototypes in Python, with the agenda of later transforming them into C, somehow, without rewriting so much of them, for environments like Recode where the whole of Python, for various reasons, is not welcome.

The main concern is speed. Not only CPU speed, but also avoiding, through loading a single executable file, sparing the many hundreds of disk accesses which occur each time the Python interpreter is started.

In this project, I want to explore how fully one could get rid of the Python interpreter while using Pyrex, maybe through a set of added stub routines, but this project is low priority and I'm not getting there yet. One of these days hopefully. The nice thing with Pyrex is that you can use the Python interpreter, or not use it, more or less depending on your way to declare things and your way to code. So, in a way, you have full control over the compromise between speed and facility. The temptation is always strong to use Python facilities, but I guess that with enough discipline, you can displace and put the equilibrium wherever you want.

If you tolerate having the Python interpreter around, I presume that you can take good advantage and be happy of its presence for the many parts of your programs where speed is not that critical. If you do not want the Python interpreter around in the long run, and if this is possible to fully get rid of it (I do not know yet), then there is no doubt to me that one needs a fairly strict discipline while using Pyrex.

7   Reclasser

7.1   détails

7.2   docs