]> git.proxmox.com Git - ceph.git/blob - ceph/src/boost/libs/python/doc/article.rst
bump version to 12.2.2-pve1
[ceph.git] / ceph / src / boost / libs / python / doc / article.rst
1 +++++++++++++++++++++++++++++++++++++++++++
2 Building Hybrid Systems with Boost.Python
3 +++++++++++++++++++++++++++++++++++++++++++
4
5 :Author: David Abrahams
6 :Contact: dave@boost-consulting.com
7 :organization: `Boost Consulting`_
8 :date: 2003-05-14
9
10 :Author: Ralf W. Grosse-Kunstleve
11
12 :copyright: Copyright David Abrahams and Ralf W. Grosse-Kunstleve 2003. All rights reserved
13
14 .. contents:: Table of Contents
15
16 .. _`Boost Consulting`: http://www.boost-consulting.com
17
18 ==========
19 Abstract
20 ==========
21
22 Boost.Python is an open source C++ library which provides a concise
23 IDL-like interface for binding C++ classes and functions to
24 Python. Leveraging the full power of C++ compile-time introspection
25 and of recently developed metaprogramming techniques, this is achieved
26 entirely in pure C++, without introducing a new syntax.
27 Boost.Python's rich set of features and high-level interface make it
28 possible to engineer packages from the ground up as hybrid systems,
29 giving programmers easy and coherent access to both the efficient
30 compile-time polymorphism of C++ and the extremely convenient run-time
31 polymorphism of Python.
32
33 ==============
34 Introduction
35 ==============
36
37 Python and C++ are in many ways as different as two languages could
38 be: while C++ is usually compiled to machine-code, Python is
39 interpreted. Python's dynamic type system is often cited as the
40 foundation of its flexibility, while in C++ static typing is the
41 cornerstone of its efficiency. C++ has an intricate and difficult
42 compile-time meta-language, while in Python, practically everything
43 happens at runtime.
44
45 Yet for many programmers, these very differences mean that Python and
46 C++ complement one another perfectly. Performance bottlenecks in
47 Python programs can be rewritten in C++ for maximal speed, and
48 authors of powerful C++ libraries choose Python as a middleware
49 language for its flexible system integration capabilities.
50 Furthermore, the surface differences mask some strong similarities:
51
52 * 'C'-family control structures (if, while, for...)
53
54 * Support for object-orientation, functional programming, and generic
55 programming (these are both *multi-paradigm* programming languages.)
56
57 * Comprehensive operator overloading facilities, recognizing the
58 importance of syntactic variability for readability and
59 expressivity.
60
61 * High-level concepts such as collections and iterators.
62
63 * High-level encapsulation facilities (C++: namespaces, Python: modules)
64 to support the design of re-usable libraries.
65
66 * Exception-handling for effective management of error conditions.
67
68 * C++ idioms in common use, such as handle/body classes and
69 reference-counted smart pointers mirror Python reference semantics.
70
71 Given Python's rich 'C' interoperability API, it should in principle
72 be possible to expose C++ type and function interfaces to Python with
73 an analogous interface to their C++ counterparts. However, the
74 facilities provided by Python alone for integration with C++ are
75 relatively meager. Compared to C++ and Python, 'C' has only very
76 rudimentary abstraction facilities, and support for exception-handling
77 is completely missing. 'C' extension module writers are required to
78 manually manage Python reference counts, which is both annoyingly
79 tedious and extremely error-prone. Traditional extension modules also
80 tend to contain a great deal of boilerplate code repetition which
81 makes them difficult to maintain, especially when wrapping an evolving
82 API.
83
84 These limitations have lead to the development of a variety of wrapping
85 systems. SWIG_ is probably the most popular package for the
86 integration of C/C++ and Python. A more recent development is SIP_,
87 which was specifically designed for interfacing Python with the Qt_
88 graphical user interface library. Both SWIG and SIP introduce their
89 own specialized languages for customizing inter-language bindings.
90 This has certain advantages, but having to deal with three different
91 languages (Python, C/C++ and the interface language) also introduces
92 practical and mental difficulties. The CXX_ package demonstrates an
93 interesting alternative. It shows that at least some parts of
94 Python's 'C' API can be wrapped and presented through a much more
95 user-friendly C++ interface. However, unlike SWIG and SIP, CXX does
96 not include support for wrapping C++ classes as new Python types.
97
98 The features and goals of Boost.Python_ overlap significantly with
99 many of these other systems. That said, Boost.Python attempts to
100 maximize convenience and flexibility without introducing a separate
101 wrapping language. Instead, it presents the user with a high-level
102 C++ interface for wrapping C++ classes and functions, managing much of
103 the complexity behind-the-scenes with static metaprogramming.
104 Boost.Python also goes beyond the scope of earlier systems by
105 providing:
106
107 * Support for C++ virtual functions that can be overridden in Python.
108
109 * Comprehensive lifetime management facilities for low-level C++
110 pointers and references.
111
112 * Support for organizing extensions as Python packages,
113 with a central registry for inter-language type conversions.
114
115 * A safe and convenient mechanism for tying into Python's powerful
116 serialization engine (pickle).
117
118 * Coherence with the rules for handling C++ lvalues and rvalues that
119 can only come from a deep understanding of both the Python and C++
120 type systems.
121
122 The key insight that sparked the development of Boost.Python is that
123 much of the boilerplate code in traditional extension modules could be
124 eliminated using C++ compile-time introspection. Each argument of a
125 wrapped C++ function must be extracted from a Python object using a
126 procedure that depends on the argument type. Similarly the function's
127 return type determines how the return value will be converted from C++
128 to Python. Of course argument and return types are part of each
129 function's type, and this is exactly the source from which
130 Boost.Python deduces most of the information required.
131
132 This approach leads to *user guided wrapping*: as much information is
133 extracted directly from the source code to be wrapped as is possible
134 within the framework of pure C++, and some additional information is
135 supplied explicitly by the user. Mostly the guidance is mechanical
136 and little real intervention is required. Because the interface
137 specification is written in the same full-featured language as the
138 code being exposed, the user has unprecedented power available when
139 she does need to take control.
140
141 .. _Python: http://www.python.org/
142 .. _SWIG: http://www.swig.org/
143 .. _SIP: http://www.riverbankcomputing.co.uk/sip/index.php
144 .. _Qt: http://www.trolltech.com/
145 .. _CXX: http://cxx.sourceforge.net/
146 .. _Boost.Python: http://www.boost.org/libs/python/doc
147
148 ===========================
149 Boost.Python Design Goals
150 ===========================
151
152 The primary goal of Boost.Python is to allow users to expose C++
153 classes and functions to Python using nothing more than a C++
154 compiler. In broad strokes, the user experience should be one of
155 directly manipulating C++ objects from Python.
156
157 However, it's also important not to translate all interfaces *too*
158 literally: the idioms of each language must be respected. For
159 example, though C++ and Python both have an iterator concept, they are
160 expressed very differently. Boost.Python has to be able to bridge the
161 interface gap.
162
163 It must be possible to insulate Python users from crashes resulting
164 from trivial misuses of C++ interfaces, such as accessing
165 already-deleted objects. By the same token the library should
166 insulate C++ users from low-level Python 'C' API, replacing
167 error-prone 'C' interfaces like manual reference-count management and
168 raw ``PyObject`` pointers with more-robust alternatives.
169
170 Support for component-based development is crucial, so that C++ types
171 exposed in one extension module can be passed to functions exposed in
172 another without loss of crucial information like C++ inheritance
173 relationships.
174
175 Finally, all wrapping must be *non-intrusive*, without modifying or
176 even seeing the original C++ source code. Existing C++ libraries have
177 to be wrappable by third parties who only have access to header files
178 and binaries.
179
180 ==========================
181 Hello Boost.Python World
182 ==========================
183
184 And now for a preview of Boost.Python, and how it improves on the raw
185 facilities offered by Python. Here's a function we might want to
186 expose::
187
188 char const* greet(unsigned x)
189 {
190 static char const* const msgs[] = { "hello", "Boost.Python", "world!" };
191
192 if (x > 2)
193 throw std::range_error("greet: index out of range");
194
195 return msgs[x];
196 }
197
198 To wrap this function in standard C++ using the Python 'C' API, we'd
199 need something like this::
200
201 extern "C" // all Python interactions use 'C' linkage and calling convention
202 {
203 // Wrapper to handle argument/result conversion and checking
204 PyObject* greet_wrap(PyObject* args, PyObject * keywords)
205 {
206 int x;
207 if (PyArg_ParseTuple(args, "i", &x)) // extract/check arguments
208 {
209 char const* result = greet(x); // invoke wrapped function
210 return PyString_FromString(result); // convert result to Python
211 }
212 return 0; // error occurred
213 }
214
215 // Table of wrapped functions to be exposed by the module
216 static PyMethodDef methods[] = {
217 { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" }
218 , { NULL, NULL, 0, NULL } // sentinel
219 };
220
221 // module initialization function
222 DL_EXPORT init_hello()
223 {
224 (void) Py_InitModule("hello", methods); // add the methods to the module
225 }
226 }
227
228 Now here's the wrapping code we'd use to expose it with Boost.Python::
229
230 #include <boost/python.hpp>
231 using namespace boost::python;
232 BOOST_PYTHON_MODULE(hello)
233 {
234 def("greet", greet, "return one of 3 parts of a greeting");
235 }
236
237 and here it is in action::
238
239 >>> import hello
240 >>> for x in range(3):
241 ... print hello.greet(x)
242 ...
243 hello
244 Boost.Python
245 world!
246
247 Aside from the fact that the 'C' API version is much more verbose,
248 it's worth noting a few things that it doesn't handle correctly:
249
250 * The original function accepts an unsigned integer, and the Python
251 'C' API only gives us a way of extracting signed integers. The
252 Boost.Python version will raise a Python exception if we try to pass
253 a negative number to ``hello.greet``, but the other one will proceed
254 to do whatever the C++ implementation does when converting an
255 negative integer to unsigned (usually wrapping to some very large
256 number), and pass the incorrect translation on to the wrapped
257 function.
258
259 * That brings us to the second problem: if the C++ ``greet()``
260 function is called with a number greater than 2, it will throw an
261 exception. Typically, if a C++ exception propagates across the
262 boundary with code generated by a 'C' compiler, it will cause a
263 crash. As you can see in the first version, there's no C++
264 scaffolding there to prevent this from happening. Functions wrapped
265 by Boost.Python automatically include an exception-handling layer
266 which protects Python users by translating unhandled C++ exceptions
267 into a corresponding Python exception.
268
269 * A slightly more-subtle limitation is that the argument conversion
270 used in the Python 'C' API case can only get that integer ``x`` in
271 *one way*. PyArg_ParseTuple can't convert Python ``long`` objects
272 (arbitrary-precision integers) which happen to fit in an ``unsigned
273 int`` but not in a ``signed long``, nor will it ever handle a
274 wrapped C++ class with a user-defined implicit ``operator unsigned
275 int()`` conversion. Boost.Python's dynamic type conversion
276 registry allows users to add arbitrary conversion methods.
277
278 ==================
279 Library Overview
280 ==================
281
282 This section outlines some of the library's major features. Except as
283 neccessary to avoid confusion, details of library implementation are
284 omitted.
285
286 ------------------
287 Exposing Classes
288 ------------------
289
290 C++ classes and structs are exposed with a similarly-terse interface.
291 Given::
292
293 struct World
294 {
295 void set(std::string msg) { this->msg = msg; }
296 std::string greet() { return msg; }
297 std::string msg;
298 };
299
300 The following code will expose it in our extension module::
301
302 #include <boost/python.hpp>
303 BOOST_PYTHON_MODULE(hello)
304 {
305 class_<World>("World")
306 .def("greet", &World::greet)
307 .def("set", &World::set)
308 ;
309 }
310
311 Although this code has a certain pythonic familiarity, people
312 sometimes find the syntax bit confusing because it doesn't look like
313 most of the C++ code they're used to. All the same, this is just
314 standard C++. Because of their flexible syntax and operator
315 overloading, C++ and Python are great for defining domain-specific
316 (sub)languages
317 (DSLs), and that's what we've done in Boost.Python. To break it down::
318
319 class_<World>("World")
320
321 constructs an unnamed object of type ``class_<World>`` and passes
322 ``"World"`` to its constructor. This creates a new-style Python class
323 called ``World`` in the extension module, and associates it with the
324 C++ type ``World`` in the Boost.Python type conversion registry. We
325 might have also written::
326
327 class_<World> w("World");
328
329 but that would've been more verbose, since we'd have to name ``w``
330 again to invoke its ``def()`` member function::
331
332 w.def("greet", &World::greet)
333
334 There's nothing special about the location of the dot for member
335 access in the original example: C++ allows any amount of whitespace on
336 either side of a token, and placing the dot at the beginning of each
337 line allows us to chain as many successive calls to member functions
338 as we like with a uniform syntax. The other key fact that allows
339 chaining is that ``class_<>`` member functions all return a reference
340 to ``*this``.
341
342 So the example is equivalent to::
343
344 class_<World> w("World");
345 w.def("greet", &World::greet);
346 w.def("set", &World::set);
347
348 It's occasionally useful to be able to break down the components of a
349 Boost.Python class wrapper in this way, but the rest of this article
350 will stick to the terse syntax.
351
352 For completeness, here's the wrapped class in use: ::
353
354 >>> import hello
355 >>> planet = hello.World()
356 >>> planet.set('howdy')
357 >>> planet.greet()
358 'howdy'
359
360 Constructors
361 ============
362
363 Since our ``World`` class is just a plain ``struct``, it has an
364 implicit no-argument (nullary) constructor. Boost.Python exposes the
365 nullary constructor by default, which is why we were able to write: ::
366
367 >>> planet = hello.World()
368
369 However, well-designed classes in any language may require constructor
370 arguments in order to establish their invariants. Unlike Python,
371 where ``__init__`` is just a specially-named method, In C++
372 constructors cannot be handled like ordinary member functions. In
373 particular, we can't take their address: ``&World::World`` is an
374 error. The library provides a different interface for specifying
375 constructors. Given::
376
377 struct World
378 {
379 World(std::string msg); // added constructor
380 ...
381
382 we can modify our wrapping code as follows::
383
384 class_<World>("World", init<std::string>())
385 ...
386
387 of course, a C++ class may have additional constructors, and we can
388 expose those as well by passing more instances of ``init<...>`` to
389 ``def()``::
390
391 class_<World>("World", init<std::string>())
392 .def(init<double, double>())
393 ...
394
395 Boost.Python allows wrapped functions, member functions, and
396 constructors to be overloaded to mirror C++ overloading.
397
398 Data Members and Properties
399 ===========================
400
401 Any publicly-accessible data members in a C++ class can be easily
402 exposed as either ``readonly`` or ``readwrite`` attributes::
403
404 class_<World>("World", init<std::string>())
405 .def_readonly("msg", &World::msg)
406 ...
407
408 and can be used directly in Python: ::
409
410 >>> planet = hello.World('howdy')
411 >>> planet.msg
412 'howdy'
413
414 This does *not* result in adding attributes to the ``World`` instance
415 ``__dict__``, which can result in substantial memory savings when
416 wrapping large data structures. In fact, no instance ``__dict__``
417 will be created at all unless attributes are explicitly added from
418 Python. Boost.Python owes this capability to the new Python 2.2 type
419 system, in particular the descriptor interface and ``property`` type.
420
421 In C++, publicly-accessible data members are considered a sign of poor
422 design because they break encapsulation, and style guides usually
423 dictate the use of "getter" and "setter" functions instead. In
424 Python, however, ``__getattr__``, ``__setattr__``, and since 2.2,
425 ``property`` mean that attribute access is just one more
426 well-encapsulated syntactic tool at the programmer's disposal.
427 Boost.Python bridges this idiomatic gap by making Python ``property``
428 creation directly available to users. If ``msg`` were private, we
429 could still expose it as attribute in Python as follows::
430
431 class_<World>("World", init<std::string>())
432 .add_property("msg", &World::greet, &World::set)
433 ...
434
435 The example above mirrors the familiar usage of properties in Python
436 2.2+: ::
437
438 >>> class World(object):
439 ... __init__(self, msg):
440 ... self.__msg = msg
441 ... def greet(self):
442 ... return self.__msg
443 ... def set(self, msg):
444 ... self.__msg = msg
445 ... msg = property(greet, set)
446
447 Operator Overloading
448 ====================
449
450 The ability to write arithmetic operators for user-defined types has
451 been a major factor in the success of both languages for numerical
452 computation, and the success of packages like NumPy_ attests to the
453 power of exposing operators in extension modules. Boost.Python
454 provides a concise mechanism for wrapping operator overloads. The
455 example below shows a fragment from a wrapper for the Boost rational
456 number library::
457
458 class_<rational<int> >("rational_int")
459 .def(init<int, int>()) // constructor, e.g. rational_int(3,4)
460 .def("numerator", &rational<int>::numerator)
461 .def("denominator", &rational<int>::denominator)
462 .def(-self) // __neg__ (unary minus)
463 .def(self + self) // __add__ (homogeneous)
464 .def(self * self) // __mul__
465 .def(self + int()) // __add__ (heterogenous)
466 .def(int() + self) // __radd__
467 ...
468
469 The magic is performed using a simplified application of "expression
470 templates" [VELD1995]_, a technique originally developed for
471 optimization of high-performance matrix algebra expressions. The
472 essence is that instead of performing the computation immediately,
473 operators are overloaded to construct a type *representing* the
474 computation. In matrix algebra, dramatic optimizations are often
475 available when the structure of an entire expression can be taken into
476 account, rather than evaluating each operation "greedily".
477 Boost.Python uses the same technique to build an appropriate Python
478 method object based on expressions involving ``self``.
479
480 .. _NumPy: http://www.pfdubois.com/numpy/
481
482 Inheritance
483 ===========
484
485 C++ inheritance relationships can be represented to Boost.Python by adding
486 an optional ``bases<...>`` argument to the ``class_<...>`` template
487 parameter list as follows::
488
489 class_<Derived, bases<Base1,Base2> >("Derived")
490 ...
491
492 This has two effects:
493
494 1. When the ``class_<...>`` is created, Python type objects
495 corresponding to ``Base1`` and ``Base2`` are looked up in
496 Boost.Python's registry, and are used as bases for the new Python
497 ``Derived`` type object, so methods exposed for the Python ``Base1``
498 and ``Base2`` types are automatically members of the ``Derived``
499 type. Because the registry is global, this works correctly even if
500 ``Derived`` is exposed in a different module from either of its
501 bases.
502
503 2. C++ conversions from ``Derived`` to its bases are added to the
504 Boost.Python registry. Thus wrapped C++ methods expecting (a
505 pointer or reference to) an object of either base type can be
506 called with an object wrapping a ``Derived`` instance. Wrapped
507 member functions of class ``T`` are treated as though they have an
508 implicit first argument of ``T&``, so these conversions are
509 neccessary to allow the base class methods to be called for derived
510 objects.
511
512 Of course it's possible to derive new Python classes from wrapped C++
513 class instances. Because Boost.Python uses the new-style class
514 system, that works very much as for the Python built-in types. There
515 is one significant detail in which it differs: the built-in types
516 generally establish their invariants in their ``__new__`` function, so
517 that derived classes do not need to call ``__init__`` on the base
518 class before invoking its methods : ::
519
520 >>> class L(list):
521 ... def __init__(self):
522 ... pass
523 ...
524 >>> L().reverse()
525 >>>
526
527 Because C++ object construction is a one-step operation, C++ instance
528 data cannot be constructed until the arguments are available, in the
529 ``__init__`` function: ::
530
531 >>> class D(SomeBoostPythonClass):
532 ... def __init__(self):
533 ... pass
534 ...
535 >>> D().some_boost_python_method()
536 Traceback (most recent call last):
537 File "<stdin>", line 1, in ?
538 TypeError: bad argument type for built-in operation
539
540 This happened because Boost.Python couldn't find instance data of type
541 ``SomeBoostPythonClass`` within the ``D`` instance; ``D``'s ``__init__``
542 function masked construction of the base class. It could be corrected
543 by either removing ``D``'s ``__init__`` function or having it call
544 ``SomeBoostPythonClass.__init__(...)`` explicitly.
545
546 Virtual Functions
547 =================
548
549 Deriving new types in Python from extension classes is not very
550 interesting unless they can be used polymorphically from C++. In
551 other words, Python method implementations should appear to override
552 the implementation of C++ virtual functions when called *through base
553 class pointers/references from C++*. Since the only way to alter the
554 behavior of a virtual function is to override it in a derived class,
555 the user must build a special derived class to dispatch a polymorphic
556 class' virtual functions::
557
558 //
559 // interface to wrap:
560 //
561 class Base
562 {
563 public:
564 virtual int f(std::string x) { return 42; }
565 virtual ~Base();
566 };
567
568 int calls_f(Base const& b, std::string x) { return b.f(x); }
569
570 //
571 // Wrapping Code
572 //
573
574 // Dispatcher class
575 struct BaseWrap : Base
576 {
577 // Store a pointer to the Python object
578 BaseWrap(PyObject* self_) : self(self_) {}
579 PyObject* self;
580
581 // Default implementation, for when f is not overridden
582 int f_default(std::string x) { return this->Base::f(x); }
583 // Dispatch implementation
584 int f(std::string x) { return call_method<int>(self, "f", x); }
585 };
586
587 ...
588 def("calls_f", calls_f);
589 class_<Base, BaseWrap>("Base")
590 .def("f", &Base::f, &BaseWrap::f_default)
591 ;
592
593 Now here's some Python code which demonstrates: ::
594
595 >>> class Derived(Base):
596 ... def f(self, s):
597 ... return len(s)
598 ...
599 >>> calls_f(Base(), 'foo')
600 42
601 >>> calls_f(Derived(), 'forty-two')
602 9
603
604 Things to notice about the dispatcher class:
605
606 * The key element which allows overriding in Python is the
607 ``call_method`` invocation, which uses the same global type
608 conversion registry as the C++ function wrapping does to convert its
609 arguments from C++ to Python and its return type from Python to C++.
610
611 * Any constructor signatures you wish to wrap must be replicated with
612 an initial ``PyObject*`` argument
613
614 * The dispatcher must store this argument so that it can be used to
615 invoke ``call_method``
616
617 * The ``f_default`` member function is needed when the function being
618 exposed is not pure virtual; there's no other way ``Base::f`` can be
619 called on an object of type ``BaseWrap``, since it overrides ``f``.
620
621 Deeper Reflection on the Horizon?
622 =================================
623
624 Admittedly, this formula is tedious to repeat, especially on a project
625 with many polymorphic classes. That it is neccessary reflects some
626 limitations in C++'s compile-time introspection capabilities: there's
627 no way to enumerate the members of a class and find out which are
628 virtual functions. At least one very promising project has been
629 started to write a front-end which can generate these dispatchers (and
630 other wrapping code) automatically from C++ headers.
631
632 Pyste_ is being developed by Bruno da Silva de Oliveira. It builds on
633 GCC_XML_, which generates an XML version of GCC's internal program
634 representation. Since GCC is a highly-conformant C++ compiler, this
635 ensures correct handling of the most-sophisticated template code and
636 full access to the underlying type system. In keeping with the
637 Boost.Python philosophy, a Pyste interface description is neither
638 intrusive on the code being wrapped, nor expressed in some unfamiliar
639 language: instead it is a 100% pure Python script. If Pyste is
640 successful it will mark a move away from wrapping everything directly
641 in C++ for many of our users. It will also allow us the choice to
642 shift some of the metaprogram code from C++ to Python. We expect that
643 soon, not only our users but the Boost.Python developers themselves
644 will be "thinking hybrid" about their own code.
645
646 .. _`GCC_XML`: http://www.gccxml.org/HTML/Index.html
647 .. _`Pyste`: http://www.boost.org/libs/python/pyste
648
649 ---------------
650 Serialization
651 ---------------
652
653 *Serialization* is the process of converting objects in memory to a
654 form that can be stored on disk or sent over a network connection. The
655 serialized object (most often a plain string) can be retrieved and
656 converted back to the original object. A good serialization system will
657 automatically convert entire object hierarchies. Python's standard
658 ``pickle`` module is just such a system. It leverages the language's strong
659 runtime introspection facilities for serializing practically arbitrary
660 user-defined objects. With a few simple and unintrusive provisions this
661 powerful machinery can be extended to also work for wrapped C++ objects.
662 Here is an example::
663
664 #include <string>
665
666 struct World
667 {
668 World(std::string a_msg) : msg(a_msg) {}
669 std::string greet() const { return msg; }
670 std::string msg;
671 };
672
673 #include <boost/python.hpp>
674 using namespace boost::python;
675
676 struct World_picklers : pickle_suite
677 {
678 static tuple
679 getinitargs(World const& w) { return make_tuple(w.greet()); }
680 };
681
682 BOOST_PYTHON_MODULE(hello)
683 {
684 class_<World>("World", init<std::string>())
685 .def("greet", &World::greet)
686 .def_pickle(World_picklers())
687 ;
688 }
689
690 Now let's create a ``World`` object and put it to rest on disk::
691
692 >>> import hello
693 >>> import pickle
694 >>> a_world = hello.World("howdy")
695 >>> pickle.dump(a_world, open("my_world", "w"))
696
697 In a potentially *different script* on a potentially *different
698 computer* with a potentially *different operating system*::
699
700 >>> import pickle
701 >>> resurrected_world = pickle.load(open("my_world", "r"))
702 >>> resurrected_world.greet()
703 'howdy'
704
705 Of course the ``cPickle`` module can also be used for faster
706 processing.
707
708 Boost.Python's ``pickle_suite`` fully supports the ``pickle`` protocol
709 defined in the standard Python documentation. Like a __getinitargs__
710 function in Python, the pickle_suite's getinitargs() is responsible for
711 creating the argument tuple that will be use to reconstruct the pickled
712 object. The other elements of the Python pickling protocol,
713 __getstate__ and __setstate__ can be optionally provided via C++
714 getstate and setstate functions. C++'s static type system allows the
715 library to ensure at compile-time that nonsensical combinations of
716 functions (e.g. getstate without setstate) are not used.
717
718 Enabling serialization of more complex C++ objects requires a little
719 more work than is shown in the example above. Fortunately the
720 ``object`` interface (see next section) greatly helps in keeping the
721 code manageable.
722
723 ------------------
724 Object interface
725 ------------------
726
727 Experienced 'C' language extension module authors will be familiar
728 with the ubiquitous ``PyObject*``, manual reference-counting, and the
729 need to remember which API calls return "new" (owned) references or
730 "borrowed" (raw) references. These constraints are not just
731 cumbersome but also a major source of errors, especially in the
732 presence of exceptions.
733
734 Boost.Python provides a class ``object`` which automates reference
735 counting and provides conversion to Python from C++ objects of
736 arbitrary type. This significantly reduces the learning effort for
737 prospective extension module writers.
738
739 Creating an ``object`` from any other type is extremely simple::
740
741 object s("hello, world"); // s manages a Python string
742
743 ``object`` has templated interactions with all other types, with
744 automatic to-python conversions. It happens so naturally that it's
745 easily overlooked::
746
747 object ten_Os = 10 * s[4]; // -> "oooooooooo"
748
749 In the example above, ``4`` and ``10`` are converted to Python objects
750 before the indexing and multiplication operations are invoked.
751
752 The ``extract<T>`` class template can be used to convert Python objects
753 to C++ types::
754
755 double x = extract<double>(o);
756
757 If a conversion in either direction cannot be performed, an
758 appropriate exception is thrown at runtime.
759
760 The ``object`` type is accompanied by a set of derived types
761 that mirror the Python built-in types such as ``list``, ``dict``,
762 ``tuple``, etc. as much as possible. This enables convenient
763 manipulation of these high-level types from C++::
764
765 dict d;
766 d["some"] = "thing";
767 d["lucky_number"] = 13;
768 list l = d.keys();
769
770 This almost looks and works like regular Python code, but it is pure
771 C++. Of course we can wrap C++ functions which accept or return
772 ``object`` instances.
773
774 =================
775 Thinking hybrid
776 =================
777
778 Because of the practical and mental difficulties of combining
779 programming languages, it is common to settle a single language at the
780 outset of any development effort. For many applications, performance
781 considerations dictate the use of a compiled language for the core
782 algorithms. Unfortunately, due to the complexity of the static type
783 system, the price we pay for runtime performance is often a
784 significant increase in development time. Experience shows that
785 writing maintainable C++ code usually takes longer and requires *far*
786 more hard-earned working experience than developing comparable Python
787 code. Even when developers are comfortable working exclusively in
788 compiled languages, they often augment their systems by some type of
789 ad hoc scripting layer for the benefit of their users without ever
790 availing themselves of the same advantages.
791
792 Boost.Python enables us to *think hybrid*. Python can be used for
793 rapidly prototyping a new application; its ease of use and the large
794 pool of standard libraries give us a head start on the way to a
795 working system. If necessary, the working code can be used to
796 discover rate-limiting hotspots. To maximize performance these can
797 be reimplemented in C++, together with the Boost.Python bindings
798 needed to tie them back into the existing higher-level procedure.
799
800 Of course, this *top-down* approach is less attractive if it is clear
801 from the start that many algorithms will eventually have to be
802 implemented in C++. Fortunately Boost.Python also enables us to
803 pursue a *bottom-up* approach. We have used this approach very
804 successfully in the development of a toolbox for scientific
805 applications. The toolbox started out mainly as a library of C++
806 classes with Boost.Python bindings, and for a while the growth was
807 mainly concentrated on the C++ parts. However, as the toolbox is
808 becoming more complete, more and more newly added functionality can be
809 implemented in Python.
810
811 .. image:: images/python_cpp_mix.png
812
813 This figure shows the estimated ratio of newly added C++ and Python
814 code over time as new algorithms are implemented. We expect this
815 ratio to level out near 70% Python. Being able to solve new problems
816 mostly in Python rather than a more difficult statically typed
817 language is the return on our investment in Boost.Python. The ability
818 to access all of our code from Python allows a broader group of
819 developers to use it in the rapid development of new applications.
820
821 =====================
822 Development history
823 =====================
824
825 The first version of Boost.Python was developed in 2000 by Dave
826 Abrahams at Dragon Systems, where he was privileged to have Tim Peters
827 as a guide to "The Zen of Python". One of Dave's jobs was to develop
828 a Python-based natural language processing system. Since it was
829 eventually going to be targeting embedded hardware, it was always
830 assumed that the compute-intensive core would be rewritten in C++ to
831 optimize speed and memory footprint [#proto]_. The project also wanted to
832 test all of its C++ code using Python test scripts [#test]_. The only
833 tool we knew of for binding C++ and Python was SWIG_, and at the time
834 its handling of C++ was weak. It would be false to claim any deep
835 insight into the possible advantages of Boost.Python's approach at
836 this point. Dave's interest and expertise in fancy C++ template
837 tricks had just reached the point where he could do some real damage,
838 and Boost.Python emerged as it did because it filled a need and
839 because it seemed like a cool thing to try.
840
841 This early version was aimed at many of the same basic goals we've
842 described in this paper, differing most-noticeably by having a
843 slightly more cumbersome syntax and by lack of special support for
844 operator overloading, pickling, and component-based development.
845 These last three features were quickly added by Ullrich Koethe and
846 Ralf Grosse-Kunstleve [#feature]_, and other enthusiastic contributors arrived
847 on the scene to contribute enhancements like support for nested
848 modules and static member functions.
849
850 By early 2001 development had stabilized and few new features were
851 being added, however a disturbing new fact came to light: Ralf had
852 begun testing Boost.Python on pre-release versions of a compiler using
853 the EDG_ front-end, and the mechanism at the core of Boost.Python
854 responsible for handling conversions between Python and C++ types was
855 failing to compile. As it turned out, we had been exploiting a very
856 common bug in the implementation of all the C++ compilers we had
857 tested. We knew that as C++ compilers rapidly became more
858 standards-compliant, the library would begin failing on more
859 platforms. Unfortunately, because the mechanism was so central to the
860 functioning of the library, fixing the problem looked very difficult.
861
862 Fortunately, later that year Lawrence Berkeley and later Lawrence
863 Livermore National labs contracted with `Boost Consulting`_ for support
864 and development of Boost.Python, and there was a new opportunity to
865 address fundamental issues and ensure a future for the library. A
866 redesign effort began with the low level type conversion architecture,
867 building in standards-compliance and support for component-based
868 development (in contrast to version 1 where conversions had to be
869 explicitly imported and exported across module boundaries). A new
870 analysis of the relationship between the Python and C++ objects was
871 done, resulting in more intuitive handling for C++ lvalues and
872 rvalues.
873
874 The emergence of a powerful new type system in Python 2.2 made the
875 choice of whether to maintain compatibility with Python 1.5.2 easy:
876 the opportunity to throw away a great deal of elaborate code for
877 emulating classic Python classes alone was too good to pass up. In
878 addition, Python iterators and descriptors provided crucial and
879 elegant tools for representing similar C++ constructs. The
880 development of the generalized ``object`` interface allowed us to
881 further shield C++ programmers from the dangers and syntactic burdens
882 of the Python 'C' API. A great number of other features including C++
883 exception translation, improved support for overloaded functions, and
884 most significantly, CallPolicies for handling pointers and
885 references, were added during this period.
886
887 In October 2002, version 2 of Boost.Python was released. Development
888 since then has concentrated on improved support for C++ runtime
889 polymorphism and smart pointers. Peter Dimov's ingenious
890 ``boost::shared_ptr`` design in particular has allowed us to give the
891 hybrid developer a consistent interface for moving objects back and
892 forth across the language barrier without loss of information. At
893 first, we were concerned that the sophistication and complexity of the
894 Boost.Python v2 implementation might discourage contributors, but the
895 emergence of Pyste_ and several other significant feature
896 contributions have laid those fears to rest. Daily questions on the
897 Python C++-sig and a backlog of desired improvements show that the
898 library is getting used. To us, the future looks bright.
899
900 .. _`EDG`: http://www.edg.com
901
902 =============
903 Conclusions
904 =============
905
906 Boost.Python achieves seamless interoperability between two rich and
907 complimentary language environments. Because it leverages template
908 metaprogramming to introspect about types and functions, the user
909 never has to learn a third syntax: the interface definitions are
910 written in concise and maintainable C++. Also, the wrapping system
911 doesn't have to parse C++ headers or represent the type system: the
912 compiler does that work for us.
913
914 Computationally intensive tasks play to the strengths of C++ and are
915 often impossible to implement efficiently in pure Python, while jobs
916 like serialization that are trivial in Python can be very difficult in
917 pure C++. Given the luxury of building a hybrid software system from
918 the ground up, we can approach design with new confidence and power.
919
920 ===========
921 Citations
922 ===========
923
924 .. [VELD1995] T. Veldhuizen, "Expression Templates," C++ Report,
925 Vol. 7 No. 5 June 1995, pp. 26-31.
926 http://osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html
927
928 ===========
929 Footnotes
930 ===========
931
932 .. [#proto] In retrospect, it seems that "thinking hybrid" from the
933 ground up might have been better for the NLP system: the
934 natural component boundaries defined by the pure python
935 prototype turned out to be inappropriate for getting the
936 desired performance and memory footprint out of the C++ core,
937 which eventually caused some redesign overhead on the Python
938 side when the core was moved to C++.
939
940 .. [#test] We also have some reservations about driving all C++
941 testing through a Python interface, unless that's the only way
942 it will be ultimately used. Any transition across language
943 boundaries with such different object models can inevitably
944 mask bugs.
945
946 .. [#feature] These features were expressed very differently in v1 of
947 Boost.Python