]>
Commit | Line | Data |
---|---|---|
7c673cae FG |
1 | +++++++++++++++++++++++++++++++++++++++++++ |
2 | Building Hybrid Systems with Boost.Python | |
3 | +++++++++++++++++++++++++++++++++++++++++++ | |
4 | ||
5 | :Author: David Abrahams | |
6 | :Contact: dave@boost-consulting.com | |
7 | :organization: `Boost Consulting`_ | |
8 | :date: 2003-05-14 | |
9 | ||
10 | :Author: Ralf W. Grosse-Kunstleve | |
11 | ||
12 | :copyright: Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved | |
13 | ||
14 | .. contents:: Table of Contents | |
15 | ||
16 | .. _`Boost Consulting`: http://www.boost-consulting.com | |
17 | ||
18 | ========== | |
19 | Abstract | |
20 | ========== | |
21 | ||
22 | Boost.Python is an open source C++ library which provides a concise | |
23 | IDL-like interface for binding C++ classes and functions to | |
24 | Python. Leveraging the full power of C++ compile-time introspection | |
25 | and of recently developed metaprogramming techniques, this is achieved | |
26 | entirely in pure C++, without introducing a new syntax. | |
27 | Boost.Python's rich set of features and high-level interface make it | |
28 | possible to engineer packages from the ground up as hybrid systems, | |
29 | giving programmers easy and coherent access to both the efficient | |
30 | compile-time polymorphism of C++ and the extremely convenient run-time | |
31 | polymorphism of Python. | |
32 | ||
33 | ============== | |
34 | Introduction | |
35 | ============== | |
36 | ||
37 | Python and C++ are in many ways as different as two languages could | |
38 | be: while C++ is usually compiled to machine-code, Python is | |
39 | interpreted. Python's dynamic type system is often cited as the | |
40 | foundation of its flexibility, while in C++ static typing is the | |
41 | cornerstone of its efficiency. C++ has an intricate and difficult | |
42 | compile-time meta-language, while in Python, practically everything | |
43 | happens at runtime. | |
44 | ||
45 | Yet for many programmers, these very differences mean that Python and | |
46 | C++ complement one another perfectly. Performance bottlenecks in | |
47 | Python programs can be rewritten in C++ for maximal speed, and | |
48 | authors of powerful C++ libraries choose Python as a middleware | |
49 | language for its flexible system integration capabilities. | |
50 | Furthermore, the surface differences mask some strong similarities: | |
51 | ||
52 | * 'C'-family control structures (if, while, for...) | |
53 | ||
54 | * Support for object-orientation, functional programming, and generic | |
55 | programming (these are both *multi-paradigm* programming languages.) | |
56 | ||
57 | * Comprehensive operator overloading facilities, recognizing the | |
58 | importance of syntactic variability for readability and | |
59 | expressivity. | |
60 | ||
61 | * High-level concepts such as collections and iterators. | |
62 | ||
63 | * High-level encapsulation facilities (C++: namespaces, Python: modules) | |
64 | to support the design of re-usable libraries. | |
65 | ||
66 | * Exception-handling for effective management of error conditions. | |
67 | ||
68 | * C++ idioms in common use, such as handle/body classes and | |
69 | reference-counted smart pointers mirror Python reference semantics. | |
70 | ||
71 | Given Python's rich 'C' interoperability API, it should in principle | |
72 | be possible to expose C++ type and function interfaces to Python with | |
73 | an analogous interface to their C++ counterparts. However, the | |
74 | facilities provided by Python alone for integration with C++ are | |
75 | relatively meager. Compared to C++ and Python, 'C' has only very | |
76 | rudimentary abstraction facilities, and support for exception-handling | |
77 | is completely missing. 'C' extension module writers are required to | |
78 | manually manage Python reference counts, which is both annoyingly | |
79 | tedious and extremely error-prone. Traditional extension modules also | |
80 | tend to contain a great deal of boilerplate code repetition which | |
81 | makes them difficult to maintain, especially when wrapping an evolving | |
82 | API. | |
83 | ||
84 | These limitations have lead to the development of a variety of wrapping | |
85 | systems. SWIG_ is probably the most popular package for the | |
86 | integration of C/C++ and Python. A more recent development is SIP_, | |
87 | which was specifically designed for interfacing Python with the Qt_ | |
88 | graphical user interface library. Both SWIG and SIP introduce their | |
89 | own specialized languages for customizing inter-language bindings. | |
90 | This has certain advantages, but having to deal with three different | |
91 | languages (Python, C/C++ and the interface language) also introduces | |
92 | practical and mental difficulties. The CXX_ package demonstrates an | |
93 | interesting alternative. It shows that at least some parts of | |
94 | Python's 'C' API can be wrapped and presented through a much more | |
95 | user-friendly C++ interface. However, unlike SWIG and SIP, CXX does | |
96 | not include support for wrapping C++ classes as new Python types. | |
97 | ||
98 | The features and goals of Boost.Python_ overlap significantly with | |
99 | many of these other systems. That said, Boost.Python attempts to | |
100 | maximize convenience and flexibility without introducing a separate | |
101 | wrapping language. Instead, it presents the user with a high-level | |
102 | C++ interface for wrapping C++ classes and functions, managing much of | |
103 | the complexity behind-the-scenes with static metaprogramming. | |
104 | Boost.Python also goes beyond the scope of earlier systems by | |
105 | providing: | |
106 | ||
107 | * Support for C++ virtual functions that can be overridden in Python. | |
108 | ||
109 | * Comprehensive lifetime management facilities for low-level C++ | |
110 | pointers and references. | |
111 | ||
112 | * Support for organizing extensions as Python packages, | |
113 | with a central registry for inter-language type conversions. | |
114 | ||
115 | * A safe and convenient mechanism for tying into Python's powerful | |
116 | serialization engine (pickle). | |
117 | ||
118 | * Coherence with the rules for handling C++ lvalues and rvalues that | |
119 | can only come from a deep understanding of both the Python and C++ | |
120 | type systems. | |
121 | ||
122 | The key insight that sparked the development of Boost.Python is that | |
123 | much of the boilerplate code in traditional extension modules could be | |
124 | eliminated using C++ compile-time introspection. Each argument of a | |
125 | wrapped C++ function must be extracted from a Python object using a | |
126 | procedure that depends on the argument type. Similarly the function's | |
127 | return type determines how the return value will be converted from C++ | |
128 | to Python. Of course argument and return types are part of each | |
129 | function's type, and this is exactly the source from which | |
130 | Boost.Python deduces most of the information required. | |
131 | ||
132 | This approach leads to *user guided wrapping*: as much information is | |
133 | extracted directly from the source code to be wrapped as is possible | |
134 | within the framework of pure C++, and some additional information is | |
135 | supplied explicitly by the user. Mostly the guidance is mechanical | |
136 | and little real intervention is required. Because the interface | |
137 | specification is written in the same full-featured language as the | |
138 | code being exposed, the user has unprecedented power available when | |
139 | she does need to take control. | |
140 | ||
141 | .. _Python: http://www.python.org/ | |
142 | .. _SWIG: http://www.swig.org/ | |
143 | .. _SIP: http://www.riverbankcomputing.co.uk/sip/index.php | |
144 | .. _Qt: http://www.trolltech.com/ | |
145 | .. _CXX: http://cxx.sourceforge.net/ | |
146 | .. _Boost.Python: http://www.boost.org/libs/python/doc | |
147 | ||
148 | =========================== | |
149 | Boost.Python Design Goals | |
150 | =========================== | |
151 | ||
152 | The primary goal of Boost.Python is to allow users to expose C++ | |
153 | classes and functions to Python using nothing more than a C++ | |
154 | compiler. In broad strokes, the user experience should be one of | |
155 | directly manipulating C++ objects from Python. | |
156 | ||
157 | However, it's also important not to translate all interfaces *too* | |
158 | literally: the idioms of each language must be respected. For | |
159 | example, though C++ and Python both have an iterator concept, they are | |
160 | expressed very differently. Boost.Python has to be able to bridge the | |
161 | interface gap. | |
162 | ||
163 | It must be possible to insulate Python users from crashes resulting | |
164 | from trivial misuses of C++ interfaces, such as accessing | |
165 | already-deleted objects. By the same token the library should | |
166 | insulate C++ users from low-level Python 'C' API, replacing | |
167 | error-prone 'C' interfaces like manual reference-count management and | |
168 | raw ``PyObject`` pointers with more-robust alternatives. | |
169 | ||
170 | Support for component-based development is crucial, so that C++ types | |
171 | exposed in one extension module can be passed to functions exposed in | |
172 | another without loss of crucial information like C++ inheritance | |
173 | relationships. | |
174 | ||
175 | Finally, all wrapping must be *non-intrusive*, without modifying or | |
176 | even seeing the original C++ source code. Existing C++ libraries have | |
177 | to be wrappable by third parties who only have access to header files | |
178 | and binaries. | |
179 | ||
180 | ========================== | |
181 | Hello Boost.Python World | |
182 | ========================== | |
183 | ||
184 | And now for a preview of Boost.Python, and how it improves on the raw | |
185 | facilities offered by Python. Here's a function we might want to | |
186 | expose:: | |
187 | ||
188 | char const* greet(unsigned x) | |
189 | { | |
190 | static char const* const msgs[] = { "hello", "Boost.Python", "world!" }; | |
191 | ||
192 | if (x > 2) | |
193 | throw std::range_error("greet: index out of range"); | |
194 | ||
195 | return msgs[x]; | |
196 | } | |
197 | ||
198 | To wrap this function in standard C++ using the Python 'C' API, we'd | |
199 | need something like this:: | |
200 | ||
201 | extern "C" // all Python interactions use 'C' linkage and calling convention | |
202 | { | |
203 | // Wrapper to handle argument/result conversion and checking | |
204 | PyObject* greet_wrap(PyObject* args, PyObject * keywords) | |
205 | { | |
206 | int x; | |
207 | if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments | |
208 | { | |
209 | char const* result = greet(x); // invoke wrapped function | |
210 | return PyString_FromString(result); // convert result to Python | |
211 | } | |
212 | return 0; // error occurred | |
213 | } | |
214 | ||
215 | // Table of wrapped functions to be exposed by the module | |
216 | static PyMethodDef methods[] = { | |
217 | { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" } | |
218 | , { NULL, NULL, 0, NULL } // sentinel | |
219 | }; | |
220 | ||
221 | // module initialization function | |
222 | DL_EXPORT init_hello() | |
223 | { | |
224 | (void) Py_InitModule("hello", methods); // add the methods to the module | |
225 | } | |
226 | } | |
227 | ||
228 | Now here's the wrapping code we'd use to expose it with Boost.Python:: | |
229 | ||
230 | #include <boost/python.hpp> | |
231 | using namespace boost::python; | |
232 | BOOST_PYTHON_MODULE(hello) | |
233 | { | |
234 | def("greet", greet, "return one of 3 parts of a greeting"); | |
235 | } | |
236 | ||
237 | and here it is in action:: | |
238 | ||
239 | >>> import hello | |
240 | >>> for x in range(3): | |
241 | ... print hello.greet(x) | |
242 | ... | |
243 | hello | |
244 | Boost.Python | |
245 | world! | |
246 | ||
247 | Aside from the fact that the 'C' API version is much more verbose, | |
248 | it's worth noting a few things that it doesn't handle correctly: | |
249 | ||
250 | * The original function accepts an unsigned integer, and the Python | |
251 | 'C' API only gives us a way of extracting signed integers. The | |
252 | Boost.Python version will raise a Python exception if we try to pass | |
253 | a negative number to ``hello.greet``, but the other one will proceed | |
254 | to do whatever the C++ implementation does when converting an | |
255 | negative integer to unsigned (usually wrapping to some very large | |
256 | number), and pass the incorrect translation on to the wrapped | |
257 | function. | |
258 | ||
259 | * That brings us to the second problem: if the C++ ``greet()`` | |
260 | function is called with a number greater than 2, it will throw an | |
261 | exception. Typically, if a C++ exception propagates across the | |
262 | boundary with code generated by a 'C' compiler, it will cause a | |
263 | crash. As you can see in the first version, there's no C++ | |
264 | scaffolding there to prevent this from happening. Functions wrapped | |
265 | by Boost.Python automatically include an exception-handling layer | |
266 | which protects Python users by translating unhandled C++ exceptions | |
267 | into a corresponding Python exception. | |
268 | ||
269 | * A slightly more-subtle limitation is that the argument conversion | |
270 | used in the Python 'C' API case can only get that integer ``x`` in | |
271 | *one way*. PyArg_ParseTuple can't convert Python ``long`` objects | |
272 | (arbitrary-precision integers) which happen to fit in an ``unsigned | |
273 | int`` but not in a ``signed long``, nor will it ever handle a | |
274 | wrapped C++ class with a user-defined implicit ``operator unsigned | |
275 | int()`` conversion. Boost.Python's dynamic type conversion | |
276 | registry allows users to add arbitrary conversion methods. | |
277 | ||
278 | ================== | |
279 | Library Overview | |
280 | ================== | |
281 | ||
282 | This section outlines some of the library's major features. Except as | |
283 | neccessary to avoid confusion, details of library implementation are | |
284 | omitted. | |
285 | ||
286 | ------------------ | |
287 | Exposing Classes | |
288 | ------------------ | |
289 | ||
290 | C++ classes and structs are exposed with a similarly-terse interface. | |
291 | Given:: | |
292 | ||
293 | struct World | |
294 | { | |
295 | void set(std::string msg) { this->msg = msg; } | |
296 | std::string greet() { return msg; } | |
297 | std::string msg; | |
298 | }; | |
299 | ||
300 | The following code will expose it in our extension module:: | |
301 | ||
302 | #include <boost/python.hpp> | |
303 | BOOST_PYTHON_MODULE(hello) | |
304 | { | |
305 | class_<World>("World") | |
306 | .def("greet", &World::greet) | |
307 | .def("set", &World::set) | |
308 | ; | |
309 | } | |
310 | ||
311 | Although this code has a certain pythonic familiarity, people | |
312 | sometimes find the syntax bit confusing because it doesn't look like | |
313 | most of the C++ code they're used to. All the same, this is just | |
314 | standard C++. Because of their flexible syntax and operator | |
315 | overloading, C++ and Python are great for defining domain-specific | |
316 | (sub)languages | |
317 | (DSLs), and that's what we've done in Boost.Python. To break it down:: | |
318 | ||
319 | class_<World>("World") | |
320 | ||
321 | constructs an unnamed object of type ``class_<World>`` and passes | |
322 | ``"World"`` to its constructor. This creates a new-style Python class | |
323 | called ``World`` in the extension module, and associates it with the | |
324 | C++ type ``World`` in the Boost.Python type conversion registry. We | |
325 | might have also written:: | |
326 | ||
327 | class_<World> w("World"); | |
328 | ||
329 | but that would've been more verbose, since we'd have to name ``w`` | |
330 | again to invoke its ``def()`` member function:: | |
331 | ||
332 | w.def("greet", &World::greet) | |
333 | ||
334 | There's nothing special about the location of the dot for member | |
335 | access in the original example: C++ allows any amount of whitespace on | |
336 | either side of a token, and placing the dot at the beginning of each | |
337 | line allows us to chain as many successive calls to member functions | |
338 | as we like with a uniform syntax. The other key fact that allows | |
339 | chaining is that ``class_<>`` member functions all return a reference | |
340 | to ``*this``. | |
341 | ||
342 | So the example is equivalent to:: | |
343 | ||
344 | class_<World> w("World"); | |
345 | w.def("greet", &World::greet); | |
346 | w.def("set", &World::set); | |
347 | ||
348 | It's occasionally useful to be able to break down the components of a | |
349 | Boost.Python class wrapper in this way, but the rest of this article | |
350 | will stick to the terse syntax. | |
351 | ||
352 | For completeness, here's the wrapped class in use: :: | |
353 | ||
354 | >>> import hello | |
355 | >>> planet = hello.World() | |
356 | >>> planet.set('howdy') | |
357 | >>> planet.greet() | |
358 | 'howdy' | |
359 | ||
360 | Constructors | |
361 | ============ | |
362 | ||
363 | Since our ``World`` class is just a plain ``struct``, it has an | |
364 | implicit no-argument (nullary) constructor. Boost.Python exposes the | |
365 | nullary constructor by default, which is why we were able to write: :: | |
366 | ||
367 | >>> planet = hello.World() | |
368 | ||
369 | However, well-designed classes in any language may require constructor | |
370 | arguments in order to establish their invariants. Unlike Python, | |
371 | where ``__init__`` is just a specially-named method, In C++ | |
372 | constructors cannot be handled like ordinary member functions. In | |
373 | particular, we can't take their address: ``&World::World`` is an | |
374 | error. The library provides a different interface for specifying | |
375 | constructors. Given:: | |
376 | ||
377 | struct World | |
378 | { | |
379 | World(std::string msg); // added constructor | |
380 | ... | |
381 | ||
382 | we can modify our wrapping code as follows:: | |
383 | ||
384 | class_<World>("World", init<std::string>()) | |
385 | ... | |
386 | ||
387 | of course, a C++ class may have additional constructors, and we can | |
388 | expose those as well by passing more instances of ``init<...>`` to | |
389 | ``def()``:: | |
390 | ||
391 | class_<World>("World", init<std::string>()) | |
392 | .def(init<double, double>()) | |
393 | ... | |
394 | ||
395 | Boost.Python allows wrapped functions, member functions, and | |
396 | constructors to be overloaded to mirror C++ overloading. | |
397 | ||
398 | Data Members and Properties | |
399 | =========================== | |
400 | ||
401 | Any publicly-accessible data members in a C++ class can be easily | |
402 | exposed as either ``readonly`` or ``readwrite`` attributes:: | |
403 | ||
404 | class_<World>("World", init<std::string>()) | |
405 | .def_readonly("msg", &World::msg) | |
406 | ... | |
407 | ||
408 | and can be used directly in Python: :: | |
409 | ||
410 | >>> planet = hello.World('howdy') | |
411 | >>> planet.msg | |
412 | 'howdy' | |
413 | ||
414 | This does *not* result in adding attributes to the ``World`` instance | |
415 | ``__dict__``, which can result in substantial memory savings when | |
416 | wrapping large data structures. In fact, no instance ``__dict__`` | |
417 | will be created at all unless attributes are explicitly added from | |
418 | Python. Boost.Python owes this capability to the new Python 2.2 type | |
419 | system, in particular the descriptor interface and ``property`` type. | |
420 | ||
421 | In C++, publicly-accessible data members are considered a sign of poor | |
422 | design because they break encapsulation, and style guides usually | |
423 | dictate the use of "getter" and "setter" functions instead. In | |
424 | Python, however, ``__getattr__``, ``__setattr__``, and since 2.2, | |
425 | ``property`` mean that attribute access is just one more | |
426 | well-encapsulated syntactic tool at the programmer's disposal. | |
427 | Boost.Python bridges this idiomatic gap by making Python ``property`` | |
428 | creation directly available to users. If ``msg`` were private, we | |
429 | could still expose it as attribute in Python as follows:: | |
430 | ||
431 | class_<World>("World", init<std::string>()) | |
432 | .add_property("msg", &World::greet, &World::set) | |
433 | ... | |
434 | ||
435 | The example above mirrors the familiar usage of properties in Python | |
436 | 2.2+: :: | |
437 | ||
438 | >>> class World(object): | |
439 | ... __init__(self, msg): | |
440 | ... self.__msg = msg | |
441 | ... def greet(self): | |
442 | ... return self.__msg | |
443 | ... def set(self, msg): | |
444 | ... self.__msg = msg | |
445 | ... msg = property(greet, set) | |
446 | ||
447 | Operator Overloading | |
448 | ==================== | |
449 | ||
450 | The ability to write arithmetic operators for user-defined types has | |
451 | been a major factor in the success of both languages for numerical | |
452 | computation, and the success of packages like NumPy_ attests to the | |
453 | power of exposing operators in extension modules. Boost.Python | |
454 | provides a concise mechanism for wrapping operator overloads. The | |
455 | example below shows a fragment from a wrapper for the Boost rational | |
456 | number library:: | |
457 | ||
458 | class_<rational<int> >("rational_int") | |
459 | .def(init<int, int>()) // constructor, e.g. rational_int(3,4) | |
460 | .def("numerator", &rational<int>::numerator) | |
461 | .def("denominator", &rational<int>::denominator) | |
462 | .def(-self) // __neg__ (unary minus) | |
463 | .def(self + self) // __add__ (homogeneous) | |
464 | .def(self * self) // __mul__ | |
465 | .def(self + int()) // __add__ (heterogenous) | |
466 | .def(int() + self) // __radd__ | |
467 | ... | |
468 | ||
469 | The magic is performed using a simplified application of "expression | |
470 | templates" [VELD1995]_, a technique originally developed for | |
471 | optimization of high-performance matrix algebra expressions. The | |
472 | essence is that instead of performing the computation immediately, | |
473 | operators are overloaded to construct a type *representing* the | |
474 | computation. In matrix algebra, dramatic optimizations are often | |
475 | available when the structure of an entire expression can be taken into | |
476 | account, rather than evaluating each operation "greedily". | |
477 | Boost.Python uses the same technique to build an appropriate Python | |
478 | method object based on expressions involving ``self``. | |
479 | ||
480 | .. _NumPy: http://www.pfdubois.com/numpy/ | |
481 | ||
482 | Inheritance | |
483 | =========== | |
484 | ||
485 | C++ inheritance relationships can be represented to Boost.Python by adding | |
486 | an optional ``bases<...>`` argument to the ``class_<...>`` template | |
487 | parameter list as follows:: | |
488 | ||
489 | class_<Derived, bases<Base1,Base2> >("Derived") | |
490 | ... | |
491 | ||
492 | This has two effects: | |
493 | ||
494 | 1. When the ``class_<...>`` is created, Python type objects | |
495 | corresponding to ``Base1`` and ``Base2`` are looked up in | |
496 | Boost.Python's registry, and are used as bases for the new Python | |
497 | ``Derived`` type object, so methods exposed for the Python ``Base1`` | |
498 | and ``Base2`` types are automatically members of the ``Derived`` | |
499 | type. Because the registry is global, this works correctly even if | |
500 | ``Derived`` is exposed in a different module from either of its | |
501 | bases. | |
502 | ||
503 | 2. C++ conversions from ``Derived`` to its bases are added to the | |
504 | Boost.Python registry. Thus wrapped C++ methods expecting (a | |
505 | pointer or reference to) an object of either base type can be | |
506 | called with an object wrapping a ``Derived`` instance. Wrapped | |
507 | member functions of class ``T`` are treated as though they have an | |
508 | implicit first argument of ``T&``, so these conversions are | |
509 | neccessary to allow the base class methods to be called for derived | |
510 | objects. | |
511 | ||
512 | Of course it's possible to derive new Python classes from wrapped C++ | |
513 | class instances. Because Boost.Python uses the new-style class | |
514 | system, that works very much as for the Python built-in types. There | |
515 | is one significant detail in which it differs: the built-in types | |
516 | generally establish their invariants in their ``__new__`` function, so | |
517 | that derived classes do not need to call ``__init__`` on the base | |
518 | class before invoking its methods : :: | |
519 | ||
520 | >>> class L(list): | |
521 | ... def __init__(self): | |
522 | ... pass | |
523 | ... | |
524 | >>> L().reverse() | |
525 | >>> | |
526 | ||
527 | Because C++ object construction is a one-step operation, C++ instance | |
528 | data cannot be constructed until the arguments are available, in the | |
529 | ``__init__`` function: :: | |
530 | ||
531 | >>> class D(SomeBoostPythonClass): | |
532 | ... def __init__(self): | |
533 | ... pass | |
534 | ... | |
535 | >>> D().some_boost_python_method() | |
536 | Traceback (most recent call last): | |
537 | File "<stdin>", line 1, in ? | |
538 | TypeError: bad argument type for built-in operation | |
539 | ||
540 | This happened because Boost.Python couldn't find instance data of type | |
541 | ``SomeBoostPythonClass`` within the ``D`` instance; ``D``'s ``__init__`` | |
542 | function masked construction of the base class. It could be corrected | |
543 | by either removing ``D``'s ``__init__`` function or having it call | |
544 | ``SomeBoostPythonClass.__init__(...)`` explicitly. | |
545 | ||
546 | Virtual Functions | |
547 | ================= | |
548 | ||
549 | Deriving new types in Python from extension classes is not very | |
550 | interesting unless they can be used polymorphically from C++. In | |
551 | other words, Python method implementations should appear to override | |
552 | the implementation of C++ virtual functions when called *through base | |
553 | class pointers/references from C++*. Since the only way to alter the | |
554 | behavior of a virtual function is to override it in a derived class, | |
555 | the user must build a special derived class to dispatch a polymorphic | |
556 | class' virtual functions:: | |
557 | ||
558 | // | |
559 | // interface to wrap: | |
560 | // | |
561 | class Base | |
562 | { | |
563 | public: | |
564 | virtual int f(std::string x) { return 42; } | |
565 | virtual ~Base(); | |
566 | }; | |
567 | ||
568 | int calls_f(Base const& b, std::string x) { return b.f(x); } | |
569 | ||
570 | // | |
571 | // Wrapping Code | |
572 | // | |
573 | ||
574 | // Dispatcher class | |
575 | struct BaseWrap : Base | |
576 | { | |
577 | // Store a pointer to the Python object | |
578 | BaseWrap(PyObject* self_) : self(self_) {} | |
579 | PyObject* self; | |
580 | ||
581 | // Default implementation, for when f is not overridden | |
582 | int f_default(std::string x) { return this->Base::f(x); } | |
583 | // Dispatch implementation | |
584 | int f(std::string x) { return call_method<int>(self, "f", x); } | |
585 | }; | |
586 | ||
587 | ... | |
588 | def("calls_f", calls_f); | |
589 | class_<Base, BaseWrap>("Base") | |
590 | .def("f", &Base::f, &BaseWrap::f_default) | |
591 | ; | |
592 | ||
593 | Now here's some Python code which demonstrates: :: | |
594 | ||
595 | >>> class Derived(Base): | |
596 | ... def f(self, s): | |
597 | ... return len(s) | |
598 | ... | |
599 | >>> calls_f(Base(), 'foo') | |
600 | 42 | |
601 | >>> calls_f(Derived(), 'forty-two') | |
602 | 9 | |
603 | ||
604 | Things to notice about the dispatcher class: | |
605 | ||
606 | * The key element which allows overriding in Python is the | |
607 | ``call_method`` invocation, which uses the same global type | |
608 | conversion registry as the C++ function wrapping does to convert its | |
609 | arguments from C++ to Python and its return type from Python to C++. | |
610 | ||
611 | * Any constructor signatures you wish to wrap must be replicated with | |
612 | an initial ``PyObject*`` argument | |
613 | ||
614 | * The dispatcher must store this argument so that it can be used to | |
615 | invoke ``call_method`` | |
616 | ||
617 | * The ``f_default`` member function is needed when the function being | |
618 | exposed is not pure virtual; there's no other way ``Base::f`` can be | |
619 | called on an object of type ``BaseWrap``, since it overrides ``f``. | |
620 | ||
621 | Deeper Reflection on the Horizon? | |
622 | ================================= | |
623 | ||
624 | Admittedly, this formula is tedious to repeat, especially on a project | |
625 | with many polymorphic classes. That it is neccessary reflects some | |
626 | limitations in C++'s compile-time introspection capabilities: there's | |
627 | no way to enumerate the members of a class and find out which are | |
628 | virtual functions. At least one very promising project has been | |
629 | started to write a front-end which can generate these dispatchers (and | |
630 | other wrapping code) automatically from C++ headers. | |
631 | ||
632 | Pyste_ is being developed by Bruno da Silva de Oliveira. It builds on | |
633 | GCC_XML_, which generates an XML version of GCC's internal program | |
634 | representation. Since GCC is a highly-conformant C++ compiler, this | |
635 | ensures correct handling of the most-sophisticated template code and | |
636 | full access to the underlying type system. In keeping with the | |
637 | Boost.Python philosophy, a Pyste interface description is neither | |
638 | intrusive on the code being wrapped, nor expressed in some unfamiliar | |
639 | language: instead it is a 100% pure Python script. If Pyste is | |
640 | successful it will mark a move away from wrapping everything directly | |
641 | in C++ for many of our users. It will also allow us the choice to | |
642 | shift some of the metaprogram code from C++ to Python. We expect that | |
643 | soon, not only our users but the Boost.Python developers themselves | |
644 | will be "thinking hybrid" about their own code. | |
645 | ||
646 | .. _`GCC_XML`: http://www.gccxml.org/HTML/Index.html | |
647 | .. _`Pyste`: http://www.boost.org/libs/python/pyste | |
648 | ||
649 | --------------- | |
650 | Serialization | |
651 | --------------- | |
652 | ||
653 | *Serialization* is the process of converting objects in memory to a | |
654 | form that can be stored on disk or sent over a network connection. The | |
655 | serialized object (most often a plain string) can be retrieved and | |
656 | converted back to the original object. A good serialization system will | |
657 | automatically convert entire object hierarchies. Python's standard | |
658 | ``pickle`` module is just such a system. It leverages the language's strong | |
659 | runtime introspection facilities for serializing practically arbitrary | |
660 | user-defined objects. With a few simple and unintrusive provisions this | |
661 | powerful machinery can be extended to also work for wrapped C++ objects. | |
662 | Here is an example:: | |
663 | ||
664 | #include <string> | |
665 | ||
666 | struct World | |
667 | { | |
668 | World(std::string a_msg) : msg(a_msg) {} | |
669 | std::string greet() const { return msg; } | |
670 | std::string msg; | |
671 | }; | |
672 | ||
673 | #include <boost/python.hpp> | |
674 | using namespace boost::python; | |
675 | ||
676 | struct World_picklers : pickle_suite | |
677 | { | |
678 | static tuple | |
679 | getinitargs(World const& w) { return make_tuple(w.greet()); } | |
680 | }; | |
681 | ||
682 | BOOST_PYTHON_MODULE(hello) | |
683 | { | |
684 | class_<World>("World", init<std::string>()) | |
685 | .def("greet", &World::greet) | |
686 | .def_pickle(World_picklers()) | |
687 | ; | |
688 | } | |
689 | ||
690 | Now let's create a ``World`` object and put it to rest on disk:: | |
691 | ||
692 | >>> import hello | |
693 | >>> import pickle | |
694 | >>> a_world = hello.World("howdy") | |
695 | >>> pickle.dump(a_world, open("my_world", "w")) | |
696 | ||
697 | In a potentially *different script* on a potentially *different | |
698 | computer* with a potentially *different operating system*:: | |
699 | ||
700 | >>> import pickle | |
701 | >>> resurrected_world = pickle.load(open("my_world", "r")) | |
702 | >>> resurrected_world.greet() | |
703 | 'howdy' | |
704 | ||
705 | Of course the ``cPickle`` module can also be used for faster | |
706 | processing. | |
707 | ||
708 | Boost.Python's ``pickle_suite`` fully supports the ``pickle`` protocol | |
709 | defined in the standard Python documentation. Like a __getinitargs__ | |
710 | function in Python, the pickle_suite's getinitargs() is responsible for | |
711 | creating the argument tuple that will be use to reconstruct the pickled | |
712 | object. The other elements of the Python pickling protocol, | |
713 | __getstate__ and __setstate__ can be optionally provided via C++ | |
714 | getstate and setstate functions. C++'s static type system allows the | |
715 | library to ensure at compile-time that nonsensical combinations of | |
716 | functions (e.g. getstate without setstate) are not used. | |
717 | ||
718 | Enabling serialization of more complex C++ objects requires a little | |
719 | more work than is shown in the example above. Fortunately the | |
720 | ``object`` interface (see next section) greatly helps in keeping the | |
721 | code manageable. | |
722 | ||
723 | ------------------ | |
724 | Object interface | |
725 | ------------------ | |
726 | ||
727 | Experienced 'C' language extension module authors will be familiar | |
728 | with the ubiquitous ``PyObject*``, manual reference-counting, and the | |
729 | need to remember which API calls return "new" (owned) references or | |
730 | "borrowed" (raw) references. These constraints are not just | |
731 | cumbersome but also a major source of errors, especially in the | |
732 | presence of exceptions. | |
733 | ||
734 | Boost.Python provides a class ``object`` which automates reference | |
735 | counting and provides conversion to Python from C++ objects of | |
736 | arbitrary type. This significantly reduces the learning effort for | |
737 | prospective extension module writers. | |
738 | ||
739 | Creating an ``object`` from any other type is extremely simple:: | |
740 | ||
741 | object s("hello, world"); // s manages a Python string | |
742 | ||
743 | ``object`` has templated interactions with all other types, with | |
744 | automatic to-python conversions. It happens so naturally that it's | |
745 | easily overlooked:: | |
746 | ||
747 | object ten_Os = 10 * s[4]; // -> "oooooooooo" | |
748 | ||
749 | In the example above, ``4`` and ``10`` are converted to Python objects | |
750 | before the indexing and multiplication operations are invoked. | |
751 | ||
752 | The ``extract<T>`` class template can be used to convert Python objects | |
753 | to C++ types:: | |
754 | ||
755 | double x = extract<double>(o); | |
756 | ||
757 | If a conversion in either direction cannot be performed, an | |
758 | appropriate exception is thrown at runtime. | |
759 | ||
760 | The ``object`` type is accompanied by a set of derived types | |
761 | that mirror the Python built-in types such as ``list``, ``dict``, | |
762 | ``tuple``, etc. as much as possible. This enables convenient | |
763 | manipulation of these high-level types from C++:: | |
764 | ||
765 | dict d; | |
766 | d["some"] = "thing"; | |
767 | d["lucky_number"] = 13; | |
768 | list l = d.keys(); | |
769 | ||
770 | This almost looks and works like regular Python code, but it is pure | |
771 | C++. Of course we can wrap C++ functions which accept or return | |
772 | ``object`` instances. | |
773 | ||
774 | ================= | |
775 | Thinking hybrid | |
776 | ================= | |
777 | ||
778 | Because of the practical and mental difficulties of combining | |
779 | programming languages, it is common to settle a single language at the | |
780 | outset of any development effort. For many applications, performance | |
781 | considerations dictate the use of a compiled language for the core | |
782 | algorithms. Unfortunately, due to the complexity of the static type | |
783 | system, the price we pay for runtime performance is often a | |
784 | significant increase in development time. Experience shows that | |
785 | writing maintainable C++ code usually takes longer and requires *far* | |
786 | more hard-earned working experience than developing comparable Python | |
787 | code. Even when developers are comfortable working exclusively in | |
788 | compiled languages, they often augment their systems by some type of | |
789 | ad hoc scripting layer for the benefit of their users without ever | |
790 | availing themselves of the same advantages. | |
791 | ||
792 | Boost.Python enables us to *think hybrid*. Python can be used for | |
793 | rapidly prototyping a new application; its ease of use and the large | |
794 | pool of standard libraries give us a head start on the way to a | |
795 | working system. If necessary, the working code can be used to | |
796 | discover rate-limiting hotspots. To maximize performance these can | |
797 | be reimplemented in C++, together with the Boost.Python bindings | |
798 | needed to tie them back into the existing higher-level procedure. | |
799 | ||
800 | Of course, this *top-down* approach is less attractive if it is clear | |
801 | from the start that many algorithms will eventually have to be | |
802 | implemented in C++. Fortunately Boost.Python also enables us to | |
803 | pursue a *bottom-up* approach. We have used this approach very | |
804 | successfully in the development of a toolbox for scientific | |
805 | applications. The toolbox started out mainly as a library of C++ | |
806 | classes with Boost.Python bindings, and for a while the growth was | |
807 | mainly concentrated on the C++ parts. However, as the toolbox is | |
808 | becoming more complete, more and more newly added functionality can be | |
809 | implemented in Python. | |
810 | ||
811 | .. image:: images/python_cpp_mix.png | |
812 | ||
813 | This figure shows the estimated ratio of newly added C++ and Python | |
814 | code over time as new algorithms are implemented. We expect this | |
815 | ratio to level out near 70% Python. Being able to solve new problems | |
816 | mostly in Python rather than a more difficult statically typed | |
817 | language is the return on our investment in Boost.Python. The ability | |
818 | to access all of our code from Python allows a broader group of | |
819 | developers to use it in the rapid development of new applications. | |
820 | ||
821 | ===================== | |
822 | Development history | |
823 | ===================== | |
824 | ||
825 | The first version of Boost.Python was developed in 2000 by Dave | |
826 | Abrahams at Dragon Systems, where he was privileged to have Tim Peters | |
827 | as a guide to "The Zen of Python". One of Dave's jobs was to develop | |
828 | a Python-based natural language processing system. Since it was | |
829 | eventually going to be targeting embedded hardware, it was always | |
830 | assumed that the compute-intensive core would be rewritten in C++ to | |
831 | optimize speed and memory footprint [#proto]_. The project also wanted to | |
832 | test all of its C++ code using Python test scripts [#test]_. The only | |
833 | tool we knew of for binding C++ and Python was SWIG_, and at the time | |
834 | its handling of C++ was weak. It would be false to claim any deep | |
835 | insight into the possible advantages of Boost.Python's approach at | |
836 | this point. Dave's interest and expertise in fancy C++ template | |
837 | tricks had just reached the point where he could do some real damage, | |
838 | and Boost.Python emerged as it did because it filled a need and | |
839 | because it seemed like a cool thing to try. | |
840 | ||
841 | This early version was aimed at many of the same basic goals we've | |
842 | described in this paper, differing most-noticeably by having a | |
843 | slightly more cumbersome syntax and by lack of special support for | |
844 | operator overloading, pickling, and component-based development. | |
845 | These last three features were quickly added by Ullrich Koethe and | |
846 | Ralf Grosse-Kunstleve [#feature]_, and other enthusiastic contributors arrived | |
847 | on the scene to contribute enhancements like support for nested | |
848 | modules and static member functions. | |
849 | ||
850 | By early 2001 development had stabilized and few new features were | |
851 | being added, however a disturbing new fact came to light: Ralf had | |
852 | begun testing Boost.Python on pre-release versions of a compiler using | |
853 | the EDG_ front-end, and the mechanism at the core of Boost.Python | |
854 | responsible for handling conversions between Python and C++ types was | |
855 | failing to compile. As it turned out, we had been exploiting a very | |
856 | common bug in the implementation of all the C++ compilers we had | |
857 | tested. We knew that as C++ compilers rapidly became more | |
858 | standards-compliant, the library would begin failing on more | |
859 | platforms. Unfortunately, because the mechanism was so central to the | |
860 | functioning of the library, fixing the problem looked very difficult. | |
861 | ||
862 | Fortunately, later that year Lawrence Berkeley and later Lawrence | |
863 | Livermore National labs contracted with `Boost Consulting`_ for support | |
864 | and development of Boost.Python, and there was a new opportunity to | |
865 | address fundamental issues and ensure a future for the library. A | |
866 | redesign effort began with the low level type conversion architecture, | |
867 | building in standards-compliance and support for component-based | |
868 | development (in contrast to version 1 where conversions had to be | |
869 | explicitly imported and exported across module boundaries). A new | |
870 | analysis of the relationship between the Python and C++ objects was | |
871 | done, resulting in more intuitive handling for C++ lvalues and | |
872 | rvalues. | |
873 | ||
874 | The emergence of a powerful new type system in Python 2.2 made the | |
875 | choice of whether to maintain compatibility with Python 1.5.2 easy: | |
876 | the opportunity to throw away a great deal of elaborate code for | |
877 | emulating classic Python classes alone was too good to pass up. In | |
878 | addition, Python iterators and descriptors provided crucial and | |
879 | elegant tools for representing similar C++ constructs. The | |
880 | development of the generalized ``object`` interface allowed us to | |
881 | further shield C++ programmers from the dangers and syntactic burdens | |
882 | of the Python 'C' API. A great number of other features including C++ | |
883 | exception translation, improved support for overloaded functions, and | |
884 | most significantly, CallPolicies for handling pointers and | |
885 | references, were added during this period. | |
886 | ||
887 | In October 2002, version 2 of Boost.Python was released. Development | |
888 | since then has concentrated on improved support for C++ runtime | |
889 | polymorphism and smart pointers. Peter Dimov's ingenious | |
890 | ``boost::shared_ptr`` design in particular has allowed us to give the | |
891 | hybrid developer a consistent interface for moving objects back and | |
892 | forth across the language barrier without loss of information. At | |
893 | first, we were concerned that the sophistication and complexity of the | |
894 | Boost.Python v2 implementation might discourage contributors, but the | |
895 | emergence of Pyste_ and several other significant feature | |
896 | contributions have laid those fears to rest. Daily questions on the | |
897 | Python C++-sig and a backlog of desired improvements show that the | |
898 | library is getting used. To us, the future looks bright. | |
899 | ||
900 | .. _`EDG`: http://www.edg.com | |
901 | ||
902 | ============= | |
903 | Conclusions | |
904 | ============= | |
905 | ||
906 | Boost.Python achieves seamless interoperability between two rich and | |
907 | complimentary language environments. Because it leverages template | |
908 | metaprogramming to introspect about types and functions, the user | |
909 | never has to learn a third syntax: the interface definitions are | |
910 | written in concise and maintainable C++. Also, the wrapping system | |
911 | doesn't have to parse C++ headers or represent the type system: the | |
912 | compiler does that work for us. | |
913 | ||
914 | Computationally intensive tasks play to the strengths of C++ and are | |
915 | often impossible to implement efficiently in pure Python, while jobs | |
916 | like serialization that are trivial in Python can be very difficult in | |
917 | pure C++. Given the luxury of building a hybrid software system from | |
918 | the ground up, we can approach design with new confidence and power. | |
919 | ||
920 | =========== | |
921 | Citations | |
922 | =========== | |
923 | ||
924 | .. [VELD1995] T. Veldhuizen, "Expression Templates," C++ Report, | |
925 | Vol. 7 No. 5 June 1995, pp. 26-31. | |
926 | http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html | |
927 | ||
928 | =========== | |
929 | Footnotes | |
930 | =========== | |
931 | ||
932 | .. [#proto] In retrospect, it seems that "thinking hybrid" from the | |
933 | ground up might have been better for the NLP system: the | |
934 | natural component boundaries defined by the pure python | |
935 | prototype turned out to be inappropriate for getting the | |
936 | desired performance and memory footprint out of the C++ core, | |
937 | which eventually caused some redesign overhead on the Python | |
938 | side when the core was moved to C++. | |
939 | ||
940 | .. [#test] We also have some reservations about driving all C++ | |
941 | testing through a Python interface, unless that's the only way | |
942 | it will be ultimately used. Any transition across language | |
943 | boundaries with such different object models can inevitably | |
944 | mask bugs. | |
945 | ||
946 | .. [#feature] These features were expressed very differently in v1 of | |
947 | Boost.Python |