]>
Commit | Line | Data |
---|---|---|
4710c53d | 1 | '''"Executable documentation" for the pickle module.\r |
2 | \r | |
3 | Extensive comments about the pickle protocols and pickle-machine opcodes\r | |
4 | can be found here. Some functions meant for external use:\r | |
5 | \r | |
6 | genops(pickle)\r | |
7 | Generate all the opcodes in a pickle, as (opcode, arg, position) triples.\r | |
8 | \r | |
9 | dis(pickle, out=None, memo=None, indentlevel=4)\r | |
10 | Print a symbolic disassembly of a pickle.\r | |
11 | '''\r | |
12 | \r | |
13 | __all__ = ['dis', 'genops', 'optimize']\r | |
14 | \r | |
15 | # Other ideas:\r | |
16 | #\r | |
17 | # - A pickle verifier: read a pickle and check it exhaustively for\r | |
18 | # well-formedness. dis() does a lot of this already.\r | |
19 | #\r | |
20 | # - A protocol identifier: examine a pickle and return its protocol number\r | |
21 | # (== the highest .proto attr value among all the opcodes in the pickle).\r | |
22 | # dis() already prints this info at the end.\r | |
23 | #\r | |
24 | # - A pickle optimizer: for example, tuple-building code is sometimes more\r | |
25 | # elaborate than necessary, catering for the possibility that the tuple\r | |
26 | # is recursive. Or lots of times a PUT is generated that's never accessed\r | |
27 | # by a later GET.\r | |
28 | \r | |
29 | \r | |
30 | """\r | |
31 | "A pickle" is a program for a virtual pickle machine (PM, but more accurately\r | |
32 | called an unpickling machine). It's a sequence of opcodes, interpreted by the\r | |
33 | PM, building an arbitrarily complex Python object.\r | |
34 | \r | |
35 | For the most part, the PM is very simple: there are no looping, testing, or\r | |
36 | conditional instructions, no arithmetic and no function calls. Opcodes are\r | |
37 | executed once each, from first to last, until a STOP opcode is reached.\r | |
38 | \r | |
39 | The PM has two data areas, "the stack" and "the memo".\r | |
40 | \r | |
41 | Many opcodes push Python objects onto the stack; e.g., INT pushes a Python\r | |
42 | integer object on the stack, whose value is gotten from a decimal string\r | |
43 | literal immediately following the INT opcode in the pickle bytestream. Other\r | |
44 | opcodes take Python objects off the stack. The result of unpickling is\r | |
45 | whatever object is left on the stack when the final STOP opcode is executed.\r | |
46 | \r | |
47 | The memo is simply an array of objects, or it can be implemented as a dict\r | |
48 | mapping little integers to objects. The memo serves as the PM's "long term\r | |
49 | memory", and the little integers indexing the memo are akin to variable\r | |
50 | names. Some opcodes pop a stack object into the memo at a given index,\r | |
51 | and others push a memo object at a given index onto the stack again.\r | |
52 | \r | |
53 | At heart, that's all the PM has. Subtleties arise for these reasons:\r | |
54 | \r | |
55 | + Object identity. Objects can be arbitrarily complex, and subobjects\r | |
56 | may be shared (for example, the list [a, a] refers to the same object a\r | |
57 | twice). It can be vital that unpickling recreate an isomorphic object\r | |
58 | graph, faithfully reproducing sharing.\r | |
59 | \r | |
60 | + Recursive objects. For example, after "L = []; L.append(L)", L is a\r | |
61 | list, and L[0] is the same list. This is related to the object identity\r | |
62 | point, and some sequences of pickle opcodes are subtle in order to\r | |
63 | get the right result in all cases.\r | |
64 | \r | |
65 | + Things pickle doesn't know everything about. Examples of things pickle\r | |
66 | does know everything about are Python's builtin scalar and container\r | |
67 | types, like ints and tuples. They generally have opcodes dedicated to\r | |
68 | them. For things like module references and instances of user-defined\r | |
69 | classes, pickle's knowledge is limited. Historically, many enhancements\r | |
70 | have been made to the pickle protocol in order to do a better (faster,\r | |
71 | and/or more compact) job on those.\r | |
72 | \r | |
73 | + Backward compatibility and micro-optimization. As explained below,\r | |
74 | pickle opcodes never go away, not even when better ways to do a thing\r | |
75 | get invented. The repertoire of the PM just keeps growing over time.\r | |
76 | For example, protocol 0 had two opcodes for building Python integers (INT\r | |
77 | and LONG), protocol 1 added three more for more-efficient pickling of short\r | |
78 | integers, and protocol 2 added two more for more-efficient pickling of\r | |
79 | long integers (before protocol 2, the only ways to pickle a Python long\r | |
80 | took time quadratic in the number of digits, for both pickling and\r | |
81 | unpickling). "Opcode bloat" isn't so much a subtlety as a source of\r | |
82 | wearying complication.\r | |
83 | \r | |
84 | \r | |
85 | Pickle protocols:\r | |
86 | \r | |
87 | For compatibility, the meaning of a pickle opcode never changes. Instead new\r | |
88 | pickle opcodes get added, and each version's unpickler can handle all the\r | |
89 | pickle opcodes in all protocol versions to date. So old pickles continue to\r | |
90 | be readable forever. The pickler can generally be told to restrict itself to\r | |
91 | the subset of opcodes available under previous protocol versions too, so that\r | |
92 | users can create pickles under the current version readable by older\r | |
93 | versions. However, a pickle does not contain its version number embedded\r | |
94 | within it. If an older unpickler tries to read a pickle using a later\r | |
95 | protocol, the result is most likely an exception due to seeing an unknown (in\r | |
96 | the older unpickler) opcode.\r | |
97 | \r | |
98 | The original pickle used what's now called "protocol 0", and what was called\r | |
99 | "text mode" before Python 2.3. The entire pickle bytestream is made up of\r | |
100 | printable 7-bit ASCII characters, plus the newline character, in protocol 0.\r | |
101 | That's why it was called text mode. Protocol 0 is small and elegant, but\r | |
102 | sometimes painfully inefficient.\r | |
103 | \r | |
104 | The second major set of additions is now called "protocol 1", and was called\r | |
105 | "binary mode" before Python 2.3. This added many opcodes with arguments\r | |
106 | consisting of arbitrary bytes, including NUL bytes and unprintable "high bit"\r | |
107 | bytes. Binary mode pickles can be substantially smaller than equivalent\r | |
108 | text mode pickles, and sometimes faster too; e.g., BININT represents a 4-byte\r | |
109 | int as 4 bytes following the opcode, which is cheaper to unpickle than the\r | |
110 | (perhaps) 11-character decimal string attached to INT. Protocol 1 also added\r | |
111 | a number of opcodes that operate on many stack elements at once (like APPENDS\r | |
112 | and SETITEMS), and "shortcut" opcodes (like EMPTY_DICT and EMPTY_TUPLE).\r | |
113 | \r | |
114 | The third major set of additions came in Python 2.3, and is called "protocol\r | |
115 | 2". This added:\r | |
116 | \r | |
117 | - A better way to pickle instances of new-style classes (NEWOBJ).\r | |
118 | \r | |
119 | - A way for a pickle to identify its protocol (PROTO).\r | |
120 | \r | |
121 | - Time- and space- efficient pickling of long ints (LONG{1,4}).\r | |
122 | \r | |
123 | - Shortcuts for small tuples (TUPLE{1,2,3}}.\r | |
124 | \r | |
125 | - Dedicated opcodes for bools (NEWTRUE, NEWFALSE).\r | |
126 | \r | |
127 | - The "extension registry", a vector of popular objects that can be pushed\r | |
128 | efficiently by index (EXT{1,2,4}). This is akin to the memo and GET, but\r | |
129 | the registry contents are predefined (there's nothing akin to the memo's\r | |
130 | PUT).\r | |
131 | \r | |
132 | Another independent change with Python 2.3 is the abandonment of any\r | |
133 | pretense that it might be safe to load pickles received from untrusted\r | |
134 | parties -- no sufficient security analysis has been done to guarantee\r | |
135 | this and there isn't a use case that warrants the expense of such an\r | |
136 | analysis.\r | |
137 | \r | |
138 | To this end, all tests for __safe_for_unpickling__ or for\r | |
139 | copy_reg.safe_constructors are removed from the unpickling code.\r | |
140 | References to these variables in the descriptions below are to be seen\r | |
141 | as describing unpickling in Python 2.2 and before.\r | |
142 | """\r | |
143 | \r | |
144 | # Meta-rule: Descriptions are stored in instances of descriptor objects,\r | |
145 | # with plain constructors. No meta-language is defined from which\r | |
146 | # descriptors could be constructed. If you want, e.g., XML, write a little\r | |
147 | # program to generate XML from the objects.\r | |
148 | \r | |
149 | ##############################################################################\r | |
150 | # Some pickle opcodes have an argument, following the opcode in the\r | |
151 | # bytestream. An argument is of a specific type, described by an instance\r | |
152 | # of ArgumentDescriptor. These are not to be confused with arguments taken\r | |
153 | # off the stack -- ArgumentDescriptor applies only to arguments embedded in\r | |
154 | # the opcode stream, immediately following an opcode.\r | |
155 | \r | |
156 | # Represents the number of bytes consumed by an argument delimited by the\r | |
157 | # next newline character.\r | |
158 | UP_TO_NEWLINE = -1\r | |
159 | \r | |
160 | # Represents the number of bytes consumed by a two-argument opcode where\r | |
161 | # the first argument gives the number of bytes in the second argument.\r | |
162 | TAKEN_FROM_ARGUMENT1 = -2 # num bytes is 1-byte unsigned int\r | |
163 | TAKEN_FROM_ARGUMENT4 = -3 # num bytes is 4-byte signed little-endian int\r | |
164 | \r | |
165 | class ArgumentDescriptor(object):\r | |
166 | __slots__ = (\r | |
167 | # name of descriptor record, also a module global name; a string\r | |
168 | 'name',\r | |
169 | \r | |
170 | # length of argument, in bytes; an int; UP_TO_NEWLINE and\r | |
171 | # TAKEN_FROM_ARGUMENT{1,4} are negative values for variable-length\r | |
172 | # cases\r | |
173 | 'n',\r | |
174 | \r | |
175 | # a function taking a file-like object, reading this kind of argument\r | |
176 | # from the object at the current position, advancing the current\r | |
177 | # position by n bytes, and returning the value of the argument\r | |
178 | 'reader',\r | |
179 | \r | |
180 | # human-readable docs for this arg descriptor; a string\r | |
181 | 'doc',\r | |
182 | )\r | |
183 | \r | |
184 | def __init__(self, name, n, reader, doc):\r | |
185 | assert isinstance(name, str)\r | |
186 | self.name = name\r | |
187 | \r | |
188 | assert isinstance(n, int) and (n >= 0 or\r | |
189 | n in (UP_TO_NEWLINE,\r | |
190 | TAKEN_FROM_ARGUMENT1,\r | |
191 | TAKEN_FROM_ARGUMENT4))\r | |
192 | self.n = n\r | |
193 | \r | |
194 | self.reader = reader\r | |
195 | \r | |
196 | assert isinstance(doc, str)\r | |
197 | self.doc = doc\r | |
198 | \r | |
199 | from struct import unpack as _unpack\r | |
200 | \r | |
201 | def read_uint1(f):\r | |
202 | r"""\r | |
203 | >>> import StringIO\r | |
204 | >>> read_uint1(StringIO.StringIO('\xff'))\r | |
205 | 255\r | |
206 | """\r | |
207 | \r | |
208 | data = f.read(1)\r | |
209 | if data:\r | |
210 | return ord(data)\r | |
211 | raise ValueError("not enough data in stream to read uint1")\r | |
212 | \r | |
213 | uint1 = ArgumentDescriptor(\r | |
214 | name='uint1',\r | |
215 | n=1,\r | |
216 | reader=read_uint1,\r | |
217 | doc="One-byte unsigned integer.")\r | |
218 | \r | |
219 | \r | |
220 | def read_uint2(f):\r | |
221 | r"""\r | |
222 | >>> import StringIO\r | |
223 | >>> read_uint2(StringIO.StringIO('\xff\x00'))\r | |
224 | 255\r | |
225 | >>> read_uint2(StringIO.StringIO('\xff\xff'))\r | |
226 | 65535\r | |
227 | """\r | |
228 | \r | |
229 | data = f.read(2)\r | |
230 | if len(data) == 2:\r | |
231 | return _unpack("<H", data)[0]\r | |
232 | raise ValueError("not enough data in stream to read uint2")\r | |
233 | \r | |
234 | uint2 = ArgumentDescriptor(\r | |
235 | name='uint2',\r | |
236 | n=2,\r | |
237 | reader=read_uint2,\r | |
238 | doc="Two-byte unsigned integer, little-endian.")\r | |
239 | \r | |
240 | \r | |
241 | def read_int4(f):\r | |
242 | r"""\r | |
243 | >>> import StringIO\r | |
244 | >>> read_int4(StringIO.StringIO('\xff\x00\x00\x00'))\r | |
245 | 255\r | |
246 | >>> read_int4(StringIO.StringIO('\x00\x00\x00\x80')) == -(2**31)\r | |
247 | True\r | |
248 | """\r | |
249 | \r | |
250 | data = f.read(4)\r | |
251 | if len(data) == 4:\r | |
252 | return _unpack("<i", data)[0]\r | |
253 | raise ValueError("not enough data in stream to read int4")\r | |
254 | \r | |
255 | int4 = ArgumentDescriptor(\r | |
256 | name='int4',\r | |
257 | n=4,\r | |
258 | reader=read_int4,\r | |
259 | doc="Four-byte signed integer, little-endian, 2's complement.")\r | |
260 | \r | |
261 | \r | |
262 | def read_stringnl(f, decode=True, stripquotes=True):\r | |
263 | r"""\r | |
264 | >>> import StringIO\r | |
265 | >>> read_stringnl(StringIO.StringIO("'abcd'\nefg\n"))\r | |
266 | 'abcd'\r | |
267 | \r | |
268 | >>> read_stringnl(StringIO.StringIO("\n"))\r | |
269 | Traceback (most recent call last):\r | |
270 | ...\r | |
271 | ValueError: no string quotes around ''\r | |
272 | \r | |
273 | >>> read_stringnl(StringIO.StringIO("\n"), stripquotes=False)\r | |
274 | ''\r | |
275 | \r | |
276 | >>> read_stringnl(StringIO.StringIO("''\n"))\r | |
277 | ''\r | |
278 | \r | |
279 | >>> read_stringnl(StringIO.StringIO('"abcd"'))\r | |
280 | Traceback (most recent call last):\r | |
281 | ...\r | |
282 | ValueError: no newline found when trying to read stringnl\r | |
283 | \r | |
284 | Embedded escapes are undone in the result.\r | |
285 | >>> read_stringnl(StringIO.StringIO(r"'a\n\\b\x00c\td'" + "\n'e'"))\r | |
286 | 'a\n\\b\x00c\td'\r | |
287 | """\r | |
288 | \r | |
289 | data = f.readline()\r | |
290 | if not data.endswith('\n'):\r | |
291 | raise ValueError("no newline found when trying to read stringnl")\r | |
292 | data = data[:-1] # lose the newline\r | |
293 | \r | |
294 | if stripquotes:\r | |
295 | for q in "'\"":\r | |
296 | if data.startswith(q):\r | |
297 | if not data.endswith(q):\r | |
298 | raise ValueError("strinq quote %r not found at both "\r | |
299 | "ends of %r" % (q, data))\r | |
300 | data = data[1:-1]\r | |
301 | break\r | |
302 | else:\r | |
303 | raise ValueError("no string quotes around %r" % data)\r | |
304 | \r | |
305 | # I'm not sure when 'string_escape' was added to the std codecs; it's\r | |
306 | # crazy not to use it if it's there.\r | |
307 | if decode:\r | |
308 | data = data.decode('string_escape')\r | |
309 | return data\r | |
310 | \r | |
311 | stringnl = ArgumentDescriptor(\r | |
312 | name='stringnl',\r | |
313 | n=UP_TO_NEWLINE,\r | |
314 | reader=read_stringnl,\r | |
315 | doc="""A newline-terminated string.\r | |
316 | \r | |
317 | This is a repr-style string, with embedded escapes, and\r | |
318 | bracketing quotes.\r | |
319 | """)\r | |
320 | \r | |
321 | def read_stringnl_noescape(f):\r | |
322 | return read_stringnl(f, decode=False, stripquotes=False)\r | |
323 | \r | |
324 | stringnl_noescape = ArgumentDescriptor(\r | |
325 | name='stringnl_noescape',\r | |
326 | n=UP_TO_NEWLINE,\r | |
327 | reader=read_stringnl_noescape,\r | |
328 | doc="""A newline-terminated string.\r | |
329 | \r | |
330 | This is a str-style string, without embedded escapes,\r | |
331 | or bracketing quotes. It should consist solely of\r | |
332 | printable ASCII characters.\r | |
333 | """)\r | |
334 | \r | |
335 | def read_stringnl_noescape_pair(f):\r | |
336 | r"""\r | |
337 | >>> import StringIO\r | |
338 | >>> read_stringnl_noescape_pair(StringIO.StringIO("Queue\nEmpty\njunk"))\r | |
339 | 'Queue Empty'\r | |
340 | """\r | |
341 | \r | |
342 | return "%s %s" % (read_stringnl_noescape(f), read_stringnl_noescape(f))\r | |
343 | \r | |
344 | stringnl_noescape_pair = ArgumentDescriptor(\r | |
345 | name='stringnl_noescape_pair',\r | |
346 | n=UP_TO_NEWLINE,\r | |
347 | reader=read_stringnl_noescape_pair,\r | |
348 | doc="""A pair of newline-terminated strings.\r | |
349 | \r | |
350 | These are str-style strings, without embedded\r | |
351 | escapes, or bracketing quotes. They should\r | |
352 | consist solely of printable ASCII characters.\r | |
353 | The pair is returned as a single string, with\r | |
354 | a single blank separating the two strings.\r | |
355 | """)\r | |
356 | \r | |
357 | def read_string4(f):\r | |
358 | r"""\r | |
359 | >>> import StringIO\r | |
360 | >>> read_string4(StringIO.StringIO("\x00\x00\x00\x00abc"))\r | |
361 | ''\r | |
362 | >>> read_string4(StringIO.StringIO("\x03\x00\x00\x00abcdef"))\r | |
363 | 'abc'\r | |
364 | >>> read_string4(StringIO.StringIO("\x00\x00\x00\x03abcdef"))\r | |
365 | Traceback (most recent call last):\r | |
366 | ...\r | |
367 | ValueError: expected 50331648 bytes in a string4, but only 6 remain\r | |
368 | """\r | |
369 | \r | |
370 | n = read_int4(f)\r | |
371 | if n < 0:\r | |
372 | raise ValueError("string4 byte count < 0: %d" % n)\r | |
373 | data = f.read(n)\r | |
374 | if len(data) == n:\r | |
375 | return data\r | |
376 | raise ValueError("expected %d bytes in a string4, but only %d remain" %\r | |
377 | (n, len(data)))\r | |
378 | \r | |
379 | string4 = ArgumentDescriptor(\r | |
380 | name="string4",\r | |
381 | n=TAKEN_FROM_ARGUMENT4,\r | |
382 | reader=read_string4,\r | |
383 | doc="""A counted string.\r | |
384 | \r | |
385 | The first argument is a 4-byte little-endian signed int giving\r | |
386 | the number of bytes in the string, and the second argument is\r | |
387 | that many bytes.\r | |
388 | """)\r | |
389 | \r | |
390 | \r | |
391 | def read_string1(f):\r | |
392 | r"""\r | |
393 | >>> import StringIO\r | |
394 | >>> read_string1(StringIO.StringIO("\x00"))\r | |
395 | ''\r | |
396 | >>> read_string1(StringIO.StringIO("\x03abcdef"))\r | |
397 | 'abc'\r | |
398 | """\r | |
399 | \r | |
400 | n = read_uint1(f)\r | |
401 | assert n >= 0\r | |
402 | data = f.read(n)\r | |
403 | if len(data) == n:\r | |
404 | return data\r | |
405 | raise ValueError("expected %d bytes in a string1, but only %d remain" %\r | |
406 | (n, len(data)))\r | |
407 | \r | |
408 | string1 = ArgumentDescriptor(\r | |
409 | name="string1",\r | |
410 | n=TAKEN_FROM_ARGUMENT1,\r | |
411 | reader=read_string1,\r | |
412 | doc="""A counted string.\r | |
413 | \r | |
414 | The first argument is a 1-byte unsigned int giving the number\r | |
415 | of bytes in the string, and the second argument is that many\r | |
416 | bytes.\r | |
417 | """)\r | |
418 | \r | |
419 | \r | |
420 | def read_unicodestringnl(f):\r | |
421 | r"""\r | |
422 | >>> import StringIO\r | |
423 | >>> read_unicodestringnl(StringIO.StringIO("abc\uabcd\njunk"))\r | |
424 | u'abc\uabcd'\r | |
425 | """\r | |
426 | \r | |
427 | data = f.readline()\r | |
428 | if not data.endswith('\n'):\r | |
429 | raise ValueError("no newline found when trying to read "\r | |
430 | "unicodestringnl")\r | |
431 | data = data[:-1] # lose the newline\r | |
432 | return unicode(data, 'raw-unicode-escape')\r | |
433 | \r | |
434 | unicodestringnl = ArgumentDescriptor(\r | |
435 | name='unicodestringnl',\r | |
436 | n=UP_TO_NEWLINE,\r | |
437 | reader=read_unicodestringnl,\r | |
438 | doc="""A newline-terminated Unicode string.\r | |
439 | \r | |
440 | This is raw-unicode-escape encoded, so consists of\r | |
441 | printable ASCII characters, and may contain embedded\r | |
442 | escape sequences.\r | |
443 | """)\r | |
444 | \r | |
445 | def read_unicodestring4(f):\r | |
446 | r"""\r | |
447 | >>> import StringIO\r | |
448 | >>> s = u'abcd\uabcd'\r | |
449 | >>> enc = s.encode('utf-8')\r | |
450 | >>> enc\r | |
451 | 'abcd\xea\xaf\x8d'\r | |
452 | >>> n = chr(len(enc)) + chr(0) * 3 # little-endian 4-byte length\r | |
453 | >>> t = read_unicodestring4(StringIO.StringIO(n + enc + 'junk'))\r | |
454 | >>> s == t\r | |
455 | True\r | |
456 | \r | |
457 | >>> read_unicodestring4(StringIO.StringIO(n + enc[:-1]))\r | |
458 | Traceback (most recent call last):\r | |
459 | ...\r | |
460 | ValueError: expected 7 bytes in a unicodestring4, but only 6 remain\r | |
461 | """\r | |
462 | \r | |
463 | n = read_int4(f)\r | |
464 | if n < 0:\r | |
465 | raise ValueError("unicodestring4 byte count < 0: %d" % n)\r | |
466 | data = f.read(n)\r | |
467 | if len(data) == n:\r | |
468 | return unicode(data, 'utf-8')\r | |
469 | raise ValueError("expected %d bytes in a unicodestring4, but only %d "\r | |
470 | "remain" % (n, len(data)))\r | |
471 | \r | |
472 | unicodestring4 = ArgumentDescriptor(\r | |
473 | name="unicodestring4",\r | |
474 | n=TAKEN_FROM_ARGUMENT4,\r | |
475 | reader=read_unicodestring4,\r | |
476 | doc="""A counted Unicode string.\r | |
477 | \r | |
478 | The first argument is a 4-byte little-endian signed int\r | |
479 | giving the number of bytes in the string, and the second\r | |
480 | argument-- the UTF-8 encoding of the Unicode string --\r | |
481 | contains that many bytes.\r | |
482 | """)\r | |
483 | \r | |
484 | \r | |
485 | def read_decimalnl_short(f):\r | |
486 | r"""\r | |
487 | >>> import StringIO\r | |
488 | >>> read_decimalnl_short(StringIO.StringIO("1234\n56"))\r | |
489 | 1234\r | |
490 | \r | |
491 | >>> read_decimalnl_short(StringIO.StringIO("1234L\n56"))\r | |
492 | Traceback (most recent call last):\r | |
493 | ...\r | |
494 | ValueError: trailing 'L' not allowed in '1234L'\r | |
495 | """\r | |
496 | \r | |
497 | s = read_stringnl(f, decode=False, stripquotes=False)\r | |
498 | if s.endswith("L"):\r | |
499 | raise ValueError("trailing 'L' not allowed in %r" % s)\r | |
500 | \r | |
501 | # It's not necessarily true that the result fits in a Python short int:\r | |
502 | # the pickle may have been written on a 64-bit box. There's also a hack\r | |
503 | # for True and False here.\r | |
504 | if s == "00":\r | |
505 | return False\r | |
506 | elif s == "01":\r | |
507 | return True\r | |
508 | \r | |
509 | try:\r | |
510 | return int(s)\r | |
511 | except OverflowError:\r | |
512 | return long(s)\r | |
513 | \r | |
514 | def read_decimalnl_long(f):\r | |
515 | r"""\r | |
516 | >>> import StringIO\r | |
517 | \r | |
518 | >>> read_decimalnl_long(StringIO.StringIO("1234\n56"))\r | |
519 | Traceback (most recent call last):\r | |
520 | ...\r | |
521 | ValueError: trailing 'L' required in '1234'\r | |
522 | \r | |
523 | Someday the trailing 'L' will probably go away from this output.\r | |
524 | \r | |
525 | >>> read_decimalnl_long(StringIO.StringIO("1234L\n56"))\r | |
526 | 1234L\r | |
527 | \r | |
528 | >>> read_decimalnl_long(StringIO.StringIO("123456789012345678901234L\n6"))\r | |
529 | 123456789012345678901234L\r | |
530 | """\r | |
531 | \r | |
532 | s = read_stringnl(f, decode=False, stripquotes=False)\r | |
533 | if not s.endswith("L"):\r | |
534 | raise ValueError("trailing 'L' required in %r" % s)\r | |
535 | return long(s)\r | |
536 | \r | |
537 | \r | |
538 | decimalnl_short = ArgumentDescriptor(\r | |
539 | name='decimalnl_short',\r | |
540 | n=UP_TO_NEWLINE,\r | |
541 | reader=read_decimalnl_short,\r | |
542 | doc="""A newline-terminated decimal integer literal.\r | |
543 | \r | |
544 | This never has a trailing 'L', and the integer fit\r | |
545 | in a short Python int on the box where the pickle\r | |
546 | was written -- but there's no guarantee it will fit\r | |
547 | in a short Python int on the box where the pickle\r | |
548 | is read.\r | |
549 | """)\r | |
550 | \r | |
551 | decimalnl_long = ArgumentDescriptor(\r | |
552 | name='decimalnl_long',\r | |
553 | n=UP_TO_NEWLINE,\r | |
554 | reader=read_decimalnl_long,\r | |
555 | doc="""A newline-terminated decimal integer literal.\r | |
556 | \r | |
557 | This has a trailing 'L', and can represent integers\r | |
558 | of any size.\r | |
559 | """)\r | |
560 | \r | |
561 | \r | |
562 | def read_floatnl(f):\r | |
563 | r"""\r | |
564 | >>> import StringIO\r | |
565 | >>> read_floatnl(StringIO.StringIO("-1.25\n6"))\r | |
566 | -1.25\r | |
567 | """\r | |
568 | s = read_stringnl(f, decode=False, stripquotes=False)\r | |
569 | return float(s)\r | |
570 | \r | |
571 | floatnl = ArgumentDescriptor(\r | |
572 | name='floatnl',\r | |
573 | n=UP_TO_NEWLINE,\r | |
574 | reader=read_floatnl,\r | |
575 | doc="""A newline-terminated decimal floating literal.\r | |
576 | \r | |
577 | In general this requires 17 significant digits for roundtrip\r | |
578 | identity, and pickling then unpickling infinities, NaNs, and\r | |
579 | minus zero doesn't work across boxes, or on some boxes even\r | |
580 | on itself (e.g., Windows can't read the strings it produces\r | |
581 | for infinities or NaNs).\r | |
582 | """)\r | |
583 | \r | |
584 | def read_float8(f):\r | |
585 | r"""\r | |
586 | >>> import StringIO, struct\r | |
587 | >>> raw = struct.pack(">d", -1.25)\r | |
588 | >>> raw\r | |
589 | '\xbf\xf4\x00\x00\x00\x00\x00\x00'\r | |
590 | >>> read_float8(StringIO.StringIO(raw + "\n"))\r | |
591 | -1.25\r | |
592 | """\r | |
593 | \r | |
594 | data = f.read(8)\r | |
595 | if len(data) == 8:\r | |
596 | return _unpack(">d", data)[0]\r | |
597 | raise ValueError("not enough data in stream to read float8")\r | |
598 | \r | |
599 | \r | |
600 | float8 = ArgumentDescriptor(\r | |
601 | name='float8',\r | |
602 | n=8,\r | |
603 | reader=read_float8,\r | |
604 | doc="""An 8-byte binary representation of a float, big-endian.\r | |
605 | \r | |
606 | The format is unique to Python, and shared with the struct\r | |
607 | module (format string '>d') "in theory" (the struct and cPickle\r | |
608 | implementations don't share the code -- they should). It's\r | |
609 | strongly related to the IEEE-754 double format, and, in normal\r | |
610 | cases, is in fact identical to the big-endian 754 double format.\r | |
611 | On other boxes the dynamic range is limited to that of a 754\r | |
612 | double, and "add a half and chop" rounding is used to reduce\r | |
613 | the precision to 53 bits. However, even on a 754 box,\r | |
614 | infinities, NaNs, and minus zero may not be handled correctly\r | |
615 | (may not survive roundtrip pickling intact).\r | |
616 | """)\r | |
617 | \r | |
618 | # Protocol 2 formats\r | |
619 | \r | |
620 | from pickle import decode_long\r | |
621 | \r | |
622 | def read_long1(f):\r | |
623 | r"""\r | |
624 | >>> import StringIO\r | |
625 | >>> read_long1(StringIO.StringIO("\x00"))\r | |
626 | 0L\r | |
627 | >>> read_long1(StringIO.StringIO("\x02\xff\x00"))\r | |
628 | 255L\r | |
629 | >>> read_long1(StringIO.StringIO("\x02\xff\x7f"))\r | |
630 | 32767L\r | |
631 | >>> read_long1(StringIO.StringIO("\x02\x00\xff"))\r | |
632 | -256L\r | |
633 | >>> read_long1(StringIO.StringIO("\x02\x00\x80"))\r | |
634 | -32768L\r | |
635 | """\r | |
636 | \r | |
637 | n = read_uint1(f)\r | |
638 | data = f.read(n)\r | |
639 | if len(data) != n:\r | |
640 | raise ValueError("not enough data in stream to read long1")\r | |
641 | return decode_long(data)\r | |
642 | \r | |
643 | long1 = ArgumentDescriptor(\r | |
644 | name="long1",\r | |
645 | n=TAKEN_FROM_ARGUMENT1,\r | |
646 | reader=read_long1,\r | |
647 | doc="""A binary long, little-endian, using 1-byte size.\r | |
648 | \r | |
649 | This first reads one byte as an unsigned size, then reads that\r | |
650 | many bytes and interprets them as a little-endian 2's-complement long.\r | |
651 | If the size is 0, that's taken as a shortcut for the long 0L.\r | |
652 | """)\r | |
653 | \r | |
654 | def read_long4(f):\r | |
655 | r"""\r | |
656 | >>> import StringIO\r | |
657 | >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x00"))\r | |
658 | 255L\r | |
659 | >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\xff\x7f"))\r | |
660 | 32767L\r | |
661 | >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\xff"))\r | |
662 | -256L\r | |
663 | >>> read_long4(StringIO.StringIO("\x02\x00\x00\x00\x00\x80"))\r | |
664 | -32768L\r | |
665 | >>> read_long1(StringIO.StringIO("\x00\x00\x00\x00"))\r | |
666 | 0L\r | |
667 | """\r | |
668 | \r | |
669 | n = read_int4(f)\r | |
670 | if n < 0:\r | |
671 | raise ValueError("long4 byte count < 0: %d" % n)\r | |
672 | data = f.read(n)\r | |
673 | if len(data) != n:\r | |
674 | raise ValueError("not enough data in stream to read long4")\r | |
675 | return decode_long(data)\r | |
676 | \r | |
677 | long4 = ArgumentDescriptor(\r | |
678 | name="long4",\r | |
679 | n=TAKEN_FROM_ARGUMENT4,\r | |
680 | reader=read_long4,\r | |
681 | doc="""A binary representation of a long, little-endian.\r | |
682 | \r | |
683 | This first reads four bytes as a signed size (but requires the\r | |
684 | size to be >= 0), then reads that many bytes and interprets them\r | |
685 | as a little-endian 2's-complement long. If the size is 0, that's taken\r | |
686 | as a shortcut for the long 0L, although LONG1 should really be used\r | |
687 | then instead (and in any case where # of bytes < 256).\r | |
688 | """)\r | |
689 | \r | |
690 | \r | |
691 | ##############################################################################\r | |
692 | # Object descriptors. The stack used by the pickle machine holds objects,\r | |
693 | # and in the stack_before and stack_after attributes of OpcodeInfo\r | |
694 | # descriptors we need names to describe the various types of objects that can\r | |
695 | # appear on the stack.\r | |
696 | \r | |
697 | class StackObject(object):\r | |
698 | __slots__ = (\r | |
699 | # name of descriptor record, for info only\r | |
700 | 'name',\r | |
701 | \r | |
702 | # type of object, or tuple of type objects (meaning the object can\r | |
703 | # be of any type in the tuple)\r | |
704 | 'obtype',\r | |
705 | \r | |
706 | # human-readable docs for this kind of stack object; a string\r | |
707 | 'doc',\r | |
708 | )\r | |
709 | \r | |
710 | def __init__(self, name, obtype, doc):\r | |
711 | assert isinstance(name, str)\r | |
712 | self.name = name\r | |
713 | \r | |
714 | assert isinstance(obtype, type) or isinstance(obtype, tuple)\r | |
715 | if isinstance(obtype, tuple):\r | |
716 | for contained in obtype:\r | |
717 | assert isinstance(contained, type)\r | |
718 | self.obtype = obtype\r | |
719 | \r | |
720 | assert isinstance(doc, str)\r | |
721 | self.doc = doc\r | |
722 | \r | |
723 | def __repr__(self):\r | |
724 | return self.name\r | |
725 | \r | |
726 | \r | |
727 | pyint = StackObject(\r | |
728 | name='int',\r | |
729 | obtype=int,\r | |
730 | doc="A short (as opposed to long) Python integer object.")\r | |
731 | \r | |
732 | pylong = StackObject(\r | |
733 | name='long',\r | |
734 | obtype=long,\r | |
735 | doc="A long (as opposed to short) Python integer object.")\r | |
736 | \r | |
737 | pyinteger_or_bool = StackObject(\r | |
738 | name='int_or_bool',\r | |
739 | obtype=(int, long, bool),\r | |
740 | doc="A Python integer object (short or long), or "\r | |
741 | "a Python bool.")\r | |
742 | \r | |
743 | pybool = StackObject(\r | |
744 | name='bool',\r | |
745 | obtype=(bool,),\r | |
746 | doc="A Python bool object.")\r | |
747 | \r | |
748 | pyfloat = StackObject(\r | |
749 | name='float',\r | |
750 | obtype=float,\r | |
751 | doc="A Python float object.")\r | |
752 | \r | |
753 | pystring = StackObject(\r | |
754 | name='str',\r | |
755 | obtype=str,\r | |
756 | doc="A Python string object.")\r | |
757 | \r | |
758 | pyunicode = StackObject(\r | |
759 | name='unicode',\r | |
760 | obtype=unicode,\r | |
761 | doc="A Python Unicode string object.")\r | |
762 | \r | |
763 | pynone = StackObject(\r | |
764 | name="None",\r | |
765 | obtype=type(None),\r | |
766 | doc="The Python None object.")\r | |
767 | \r | |
768 | pytuple = StackObject(\r | |
769 | name="tuple",\r | |
770 | obtype=tuple,\r | |
771 | doc="A Python tuple object.")\r | |
772 | \r | |
773 | pylist = StackObject(\r | |
774 | name="list",\r | |
775 | obtype=list,\r | |
776 | doc="A Python list object.")\r | |
777 | \r | |
778 | pydict = StackObject(\r | |
779 | name="dict",\r | |
780 | obtype=dict,\r | |
781 | doc="A Python dict object.")\r | |
782 | \r | |
783 | anyobject = StackObject(\r | |
784 | name='any',\r | |
785 | obtype=object,\r | |
786 | doc="Any kind of object whatsoever.")\r | |
787 | \r | |
788 | markobject = StackObject(\r | |
789 | name="mark",\r | |
790 | obtype=StackObject,\r | |
791 | doc="""'The mark' is a unique object.\r | |
792 | \r | |
793 | Opcodes that operate on a variable number of objects\r | |
794 | generally don't embed the count of objects in the opcode,\r | |
795 | or pull it off the stack. Instead the MARK opcode is used\r | |
796 | to push a special marker object on the stack, and then\r | |
797 | some other opcodes grab all the objects from the top of\r | |
798 | the stack down to (but not including) the topmost marker\r | |
799 | object.\r | |
800 | """)\r | |
801 | \r | |
802 | stackslice = StackObject(\r | |
803 | name="stackslice",\r | |
804 | obtype=StackObject,\r | |
805 | doc="""An object representing a contiguous slice of the stack.\r | |
806 | \r | |
807 | This is used in conjuction with markobject, to represent all\r | |
808 | of the stack following the topmost markobject. For example,\r | |
809 | the POP_MARK opcode changes the stack from\r | |
810 | \r | |
811 | [..., markobject, stackslice]\r | |
812 | to\r | |
813 | [...]\r | |
814 | \r | |
815 | No matter how many object are on the stack after the topmost\r | |
816 | markobject, POP_MARK gets rid of all of them (including the\r | |
817 | topmost markobject too).\r | |
818 | """)\r | |
819 | \r | |
820 | ##############################################################################\r | |
821 | # Descriptors for pickle opcodes.\r | |
822 | \r | |
823 | class OpcodeInfo(object):\r | |
824 | \r | |
825 | __slots__ = (\r | |
826 | # symbolic name of opcode; a string\r | |
827 | 'name',\r | |
828 | \r | |
829 | # the code used in a bytestream to represent the opcode; a\r | |
830 | # one-character string\r | |
831 | 'code',\r | |
832 | \r | |
833 | # If the opcode has an argument embedded in the byte string, an\r | |
834 | # instance of ArgumentDescriptor specifying its type. Note that\r | |
835 | # arg.reader(s) can be used to read and decode the argument from\r | |
836 | # the bytestream s, and arg.doc documents the format of the raw\r | |
837 | # argument bytes. If the opcode doesn't have an argument embedded\r | |
838 | # in the bytestream, arg should be None.\r | |
839 | 'arg',\r | |
840 | \r | |
841 | # what the stack looks like before this opcode runs; a list\r | |
842 | 'stack_before',\r | |
843 | \r | |
844 | # what the stack looks like after this opcode runs; a list\r | |
845 | 'stack_after',\r | |
846 | \r | |
847 | # the protocol number in which this opcode was introduced; an int\r | |
848 | 'proto',\r | |
849 | \r | |
850 | # human-readable docs for this opcode; a string\r | |
851 | 'doc',\r | |
852 | )\r | |
853 | \r | |
854 | def __init__(self, name, code, arg,\r | |
855 | stack_before, stack_after, proto, doc):\r | |
856 | assert isinstance(name, str)\r | |
857 | self.name = name\r | |
858 | \r | |
859 | assert isinstance(code, str)\r | |
860 | assert len(code) == 1\r | |
861 | self.code = code\r | |
862 | \r | |
863 | assert arg is None or isinstance(arg, ArgumentDescriptor)\r | |
864 | self.arg = arg\r | |
865 | \r | |
866 | assert isinstance(stack_before, list)\r | |
867 | for x in stack_before:\r | |
868 | assert isinstance(x, StackObject)\r | |
869 | self.stack_before = stack_before\r | |
870 | \r | |
871 | assert isinstance(stack_after, list)\r | |
872 | for x in stack_after:\r | |
873 | assert isinstance(x, StackObject)\r | |
874 | self.stack_after = stack_after\r | |
875 | \r | |
876 | assert isinstance(proto, int) and 0 <= proto <= 2\r | |
877 | self.proto = proto\r | |
878 | \r | |
879 | assert isinstance(doc, str)\r | |
880 | self.doc = doc\r | |
881 | \r | |
882 | I = OpcodeInfo\r | |
883 | opcodes = [\r | |
884 | \r | |
885 | # Ways to spell integers.\r | |
886 | \r | |
887 | I(name='INT',\r | |
888 | code='I',\r | |
889 | arg=decimalnl_short,\r | |
890 | stack_before=[],\r | |
891 | stack_after=[pyinteger_or_bool],\r | |
892 | proto=0,\r | |
893 | doc="""Push an integer or bool.\r | |
894 | \r | |
895 | The argument is a newline-terminated decimal literal string.\r | |
896 | \r | |
897 | The intent may have been that this always fit in a short Python int,\r | |
898 | but INT can be generated in pickles written on a 64-bit box that\r | |
899 | require a Python long on a 32-bit box. The difference between this\r | |
900 | and LONG then is that INT skips a trailing 'L', and produces a short\r | |
901 | int whenever possible.\r | |
902 | \r | |
903 | Another difference is due to that, when bool was introduced as a\r | |
904 | distinct type in 2.3, builtin names True and False were also added to\r | |
905 | 2.2.2, mapping to ints 1 and 0. For compatibility in both directions,\r | |
906 | True gets pickled as INT + "I01\\n", and False as INT + "I00\\n".\r | |
907 | Leading zeroes are never produced for a genuine integer. The 2.3\r | |
908 | (and later) unpicklers special-case these and return bool instead;\r | |
909 | earlier unpicklers ignore the leading "0" and return the int.\r | |
910 | """),\r | |
911 | \r | |
912 | I(name='BININT',\r | |
913 | code='J',\r | |
914 | arg=int4,\r | |
915 | stack_before=[],\r | |
916 | stack_after=[pyint],\r | |
917 | proto=1,\r | |
918 | doc="""Push a four-byte signed integer.\r | |
919 | \r | |
920 | This handles the full range of Python (short) integers on a 32-bit\r | |
921 | box, directly as binary bytes (1 for the opcode and 4 for the integer).\r | |
922 | If the integer is non-negative and fits in 1 or 2 bytes, pickling via\r | |
923 | BININT1 or BININT2 saves space.\r | |
924 | """),\r | |
925 | \r | |
926 | I(name='BININT1',\r | |
927 | code='K',\r | |
928 | arg=uint1,\r | |
929 | stack_before=[],\r | |
930 | stack_after=[pyint],\r | |
931 | proto=1,\r | |
932 | doc="""Push a one-byte unsigned integer.\r | |
933 | \r | |
934 | This is a space optimization for pickling very small non-negative ints,\r | |
935 | in range(256).\r | |
936 | """),\r | |
937 | \r | |
938 | I(name='BININT2',\r | |
939 | code='M',\r | |
940 | arg=uint2,\r | |
941 | stack_before=[],\r | |
942 | stack_after=[pyint],\r | |
943 | proto=1,\r | |
944 | doc="""Push a two-byte unsigned integer.\r | |
945 | \r | |
946 | This is a space optimization for pickling small positive ints, in\r | |
947 | range(256, 2**16). Integers in range(256) can also be pickled via\r | |
948 | BININT2, but BININT1 instead saves a byte.\r | |
949 | """),\r | |
950 | \r | |
951 | I(name='LONG',\r | |
952 | code='L',\r | |
953 | arg=decimalnl_long,\r | |
954 | stack_before=[],\r | |
955 | stack_after=[pylong],\r | |
956 | proto=0,\r | |
957 | doc="""Push a long integer.\r | |
958 | \r | |
959 | The same as INT, except that the literal ends with 'L', and always\r | |
960 | unpickles to a Python long. There doesn't seem a real purpose to the\r | |
961 | trailing 'L'.\r | |
962 | \r | |
963 | Note that LONG takes time quadratic in the number of digits when\r | |
964 | unpickling (this is simply due to the nature of decimal->binary\r | |
965 | conversion). Proto 2 added linear-time (in C; still quadratic-time\r | |
966 | in Python) LONG1 and LONG4 opcodes.\r | |
967 | """),\r | |
968 | \r | |
969 | I(name="LONG1",\r | |
970 | code='\x8a',\r | |
971 | arg=long1,\r | |
972 | stack_before=[],\r | |
973 | stack_after=[pylong],\r | |
974 | proto=2,\r | |
975 | doc="""Long integer using one-byte length.\r | |
976 | \r | |
977 | A more efficient encoding of a Python long; the long1 encoding\r | |
978 | says it all."""),\r | |
979 | \r | |
980 | I(name="LONG4",\r | |
981 | code='\x8b',\r | |
982 | arg=long4,\r | |
983 | stack_before=[],\r | |
984 | stack_after=[pylong],\r | |
985 | proto=2,\r | |
986 | doc="""Long integer using found-byte length.\r | |
987 | \r | |
988 | A more efficient encoding of a Python long; the long4 encoding\r | |
989 | says it all."""),\r | |
990 | \r | |
991 | # Ways to spell strings (8-bit, not Unicode).\r | |
992 | \r | |
993 | I(name='STRING',\r | |
994 | code='S',\r | |
995 | arg=stringnl,\r | |
996 | stack_before=[],\r | |
997 | stack_after=[pystring],\r | |
998 | proto=0,\r | |
999 | doc="""Push a Python string object.\r | |
1000 | \r | |
1001 | The argument is a repr-style string, with bracketing quote characters,\r | |
1002 | and perhaps embedded escapes. The argument extends until the next\r | |
1003 | newline character.\r | |
1004 | """),\r | |
1005 | \r | |
1006 | I(name='BINSTRING',\r | |
1007 | code='T',\r | |
1008 | arg=string4,\r | |
1009 | stack_before=[],\r | |
1010 | stack_after=[pystring],\r | |
1011 | proto=1,\r | |
1012 | doc="""Push a Python string object.\r | |
1013 | \r | |
1014 | There are two arguments: the first is a 4-byte little-endian signed int\r | |
1015 | giving the number of bytes in the string, and the second is that many\r | |
1016 | bytes, which are taken literally as the string content.\r | |
1017 | """),\r | |
1018 | \r | |
1019 | I(name='SHORT_BINSTRING',\r | |
1020 | code='U',\r | |
1021 | arg=string1,\r | |
1022 | stack_before=[],\r | |
1023 | stack_after=[pystring],\r | |
1024 | proto=1,\r | |
1025 | doc="""Push a Python string object.\r | |
1026 | \r | |
1027 | There are two arguments: the first is a 1-byte unsigned int giving\r | |
1028 | the number of bytes in the string, and the second is that many bytes,\r | |
1029 | which are taken literally as the string content.\r | |
1030 | """),\r | |
1031 | \r | |
1032 | # Ways to spell None.\r | |
1033 | \r | |
1034 | I(name='NONE',\r | |
1035 | code='N',\r | |
1036 | arg=None,\r | |
1037 | stack_before=[],\r | |
1038 | stack_after=[pynone],\r | |
1039 | proto=0,\r | |
1040 | doc="Push None on the stack."),\r | |
1041 | \r | |
1042 | # Ways to spell bools, starting with proto 2. See INT for how this was\r | |
1043 | # done before proto 2.\r | |
1044 | \r | |
1045 | I(name='NEWTRUE',\r | |
1046 | code='\x88',\r | |
1047 | arg=None,\r | |
1048 | stack_before=[],\r | |
1049 | stack_after=[pybool],\r | |
1050 | proto=2,\r | |
1051 | doc="""True.\r | |
1052 | \r | |
1053 | Push True onto the stack."""),\r | |
1054 | \r | |
1055 | I(name='NEWFALSE',\r | |
1056 | code='\x89',\r | |
1057 | arg=None,\r | |
1058 | stack_before=[],\r | |
1059 | stack_after=[pybool],\r | |
1060 | proto=2,\r | |
1061 | doc="""True.\r | |
1062 | \r | |
1063 | Push False onto the stack."""),\r | |
1064 | \r | |
1065 | # Ways to spell Unicode strings.\r | |
1066 | \r | |
1067 | I(name='UNICODE',\r | |
1068 | code='V',\r | |
1069 | arg=unicodestringnl,\r | |
1070 | stack_before=[],\r | |
1071 | stack_after=[pyunicode],\r | |
1072 | proto=0, # this may be pure-text, but it's a later addition\r | |
1073 | doc="""Push a Python Unicode string object.\r | |
1074 | \r | |
1075 | The argument is a raw-unicode-escape encoding of a Unicode string,\r | |
1076 | and so may contain embedded escape sequences. The argument extends\r | |
1077 | until the next newline character.\r | |
1078 | """),\r | |
1079 | \r | |
1080 | I(name='BINUNICODE',\r | |
1081 | code='X',\r | |
1082 | arg=unicodestring4,\r | |
1083 | stack_before=[],\r | |
1084 | stack_after=[pyunicode],\r | |
1085 | proto=1,\r | |
1086 | doc="""Push a Python Unicode string object.\r | |
1087 | \r | |
1088 | There are two arguments: the first is a 4-byte little-endian signed int\r | |
1089 | giving the number of bytes in the string. The second is that many\r | |
1090 | bytes, and is the UTF-8 encoding of the Unicode string.\r | |
1091 | """),\r | |
1092 | \r | |
1093 | # Ways to spell floats.\r | |
1094 | \r | |
1095 | I(name='FLOAT',\r | |
1096 | code='F',\r | |
1097 | arg=floatnl,\r | |
1098 | stack_before=[],\r | |
1099 | stack_after=[pyfloat],\r | |
1100 | proto=0,\r | |
1101 | doc="""Newline-terminated decimal float literal.\r | |
1102 | \r | |
1103 | The argument is repr(a_float), and in general requires 17 significant\r | |
1104 | digits for roundtrip conversion to be an identity (this is so for\r | |
1105 | IEEE-754 double precision values, which is what Python float maps to\r | |
1106 | on most boxes).\r | |
1107 | \r | |
1108 | In general, FLOAT cannot be used to transport infinities, NaNs, or\r | |
1109 | minus zero across boxes (or even on a single box, if the platform C\r | |
1110 | library can't read the strings it produces for such things -- Windows\r | |
1111 | is like that), but may do less damage than BINFLOAT on boxes with\r | |
1112 | greater precision or dynamic range than IEEE-754 double.\r | |
1113 | """),\r | |
1114 | \r | |
1115 | I(name='BINFLOAT',\r | |
1116 | code='G',\r | |
1117 | arg=float8,\r | |
1118 | stack_before=[],\r | |
1119 | stack_after=[pyfloat],\r | |
1120 | proto=1,\r | |
1121 | doc="""Float stored in binary form, with 8 bytes of data.\r | |
1122 | \r | |
1123 | This generally requires less than half the space of FLOAT encoding.\r | |
1124 | In general, BINFLOAT cannot be used to transport infinities, NaNs, or\r | |
1125 | minus zero, raises an exception if the exponent exceeds the range of\r | |
1126 | an IEEE-754 double, and retains no more than 53 bits of precision (if\r | |
1127 | there are more than that, "add a half and chop" rounding is used to\r | |
1128 | cut it back to 53 significant bits).\r | |
1129 | """),\r | |
1130 | \r | |
1131 | # Ways to build lists.\r | |
1132 | \r | |
1133 | I(name='EMPTY_LIST',\r | |
1134 | code=']',\r | |
1135 | arg=None,\r | |
1136 | stack_before=[],\r | |
1137 | stack_after=[pylist],\r | |
1138 | proto=1,\r | |
1139 | doc="Push an empty list."),\r | |
1140 | \r | |
1141 | I(name='APPEND',\r | |
1142 | code='a',\r | |
1143 | arg=None,\r | |
1144 | stack_before=[pylist, anyobject],\r | |
1145 | stack_after=[pylist],\r | |
1146 | proto=0,\r | |
1147 | doc="""Append an object to a list.\r | |
1148 | \r | |
1149 | Stack before: ... pylist anyobject\r | |
1150 | Stack after: ... pylist+[anyobject]\r | |
1151 | \r | |
1152 | although pylist is really extended in-place.\r | |
1153 | """),\r | |
1154 | \r | |
1155 | I(name='APPENDS',\r | |
1156 | code='e',\r | |
1157 | arg=None,\r | |
1158 | stack_before=[pylist, markobject, stackslice],\r | |
1159 | stack_after=[pylist],\r | |
1160 | proto=1,\r | |
1161 | doc="""Extend a list by a slice of stack objects.\r | |
1162 | \r | |
1163 | Stack before: ... pylist markobject stackslice\r | |
1164 | Stack after: ... pylist+stackslice\r | |
1165 | \r | |
1166 | although pylist is really extended in-place.\r | |
1167 | """),\r | |
1168 | \r | |
1169 | I(name='LIST',\r | |
1170 | code='l',\r | |
1171 | arg=None,\r | |
1172 | stack_before=[markobject, stackslice],\r | |
1173 | stack_after=[pylist],\r | |
1174 | proto=0,\r | |
1175 | doc="""Build a list out of the topmost stack slice, after markobject.\r | |
1176 | \r | |
1177 | All the stack entries following the topmost markobject are placed into\r | |
1178 | a single Python list, which single list object replaces all of the\r | |
1179 | stack from the topmost markobject onward. For example,\r | |
1180 | \r | |
1181 | Stack before: ... markobject 1 2 3 'abc'\r | |
1182 | Stack after: ... [1, 2, 3, 'abc']\r | |
1183 | """),\r | |
1184 | \r | |
1185 | # Ways to build tuples.\r | |
1186 | \r | |
1187 | I(name='EMPTY_TUPLE',\r | |
1188 | code=')',\r | |
1189 | arg=None,\r | |
1190 | stack_before=[],\r | |
1191 | stack_after=[pytuple],\r | |
1192 | proto=1,\r | |
1193 | doc="Push an empty tuple."),\r | |
1194 | \r | |
1195 | I(name='TUPLE',\r | |
1196 | code='t',\r | |
1197 | arg=None,\r | |
1198 | stack_before=[markobject, stackslice],\r | |
1199 | stack_after=[pytuple],\r | |
1200 | proto=0,\r | |
1201 | doc="""Build a tuple out of the topmost stack slice, after markobject.\r | |
1202 | \r | |
1203 | All the stack entries following the topmost markobject are placed into\r | |
1204 | a single Python tuple, which single tuple object replaces all of the\r | |
1205 | stack from the topmost markobject onward. For example,\r | |
1206 | \r | |
1207 | Stack before: ... markobject 1 2 3 'abc'\r | |
1208 | Stack after: ... (1, 2, 3, 'abc')\r | |
1209 | """),\r | |
1210 | \r | |
1211 | I(name='TUPLE1',\r | |
1212 | code='\x85',\r | |
1213 | arg=None,\r | |
1214 | stack_before=[anyobject],\r | |
1215 | stack_after=[pytuple],\r | |
1216 | proto=2,\r | |
1217 | doc="""Build a one-tuple out of the topmost item on the stack.\r | |
1218 | \r | |
1219 | This code pops one value off the stack and pushes a tuple of\r | |
1220 | length 1 whose one item is that value back onto it. In other\r | |
1221 | words:\r | |
1222 | \r | |
1223 | stack[-1] = tuple(stack[-1:])\r | |
1224 | """),\r | |
1225 | \r | |
1226 | I(name='TUPLE2',\r | |
1227 | code='\x86',\r | |
1228 | arg=None,\r | |
1229 | stack_before=[anyobject, anyobject],\r | |
1230 | stack_after=[pytuple],\r | |
1231 | proto=2,\r | |
1232 | doc="""Build a two-tuple out of the top two items on the stack.\r | |
1233 | \r | |
1234 | This code pops two values off the stack and pushes a tuple of\r | |
1235 | length 2 whose items are those values back onto it. In other\r | |
1236 | words:\r | |
1237 | \r | |
1238 | stack[-2:] = [tuple(stack[-2:])]\r | |
1239 | """),\r | |
1240 | \r | |
1241 | I(name='TUPLE3',\r | |
1242 | code='\x87',\r | |
1243 | arg=None,\r | |
1244 | stack_before=[anyobject, anyobject, anyobject],\r | |
1245 | stack_after=[pytuple],\r | |
1246 | proto=2,\r | |
1247 | doc="""Build a three-tuple out of the top three items on the stack.\r | |
1248 | \r | |
1249 | This code pops three values off the stack and pushes a tuple of\r | |
1250 | length 3 whose items are those values back onto it. In other\r | |
1251 | words:\r | |
1252 | \r | |
1253 | stack[-3:] = [tuple(stack[-3:])]\r | |
1254 | """),\r | |
1255 | \r | |
1256 | # Ways to build dicts.\r | |
1257 | \r | |
1258 | I(name='EMPTY_DICT',\r | |
1259 | code='}',\r | |
1260 | arg=None,\r | |
1261 | stack_before=[],\r | |
1262 | stack_after=[pydict],\r | |
1263 | proto=1,\r | |
1264 | doc="Push an empty dict."),\r | |
1265 | \r | |
1266 | I(name='DICT',\r | |
1267 | code='d',\r | |
1268 | arg=None,\r | |
1269 | stack_before=[markobject, stackslice],\r | |
1270 | stack_after=[pydict],\r | |
1271 | proto=0,\r | |
1272 | doc="""Build a dict out of the topmost stack slice, after markobject.\r | |
1273 | \r | |
1274 | All the stack entries following the topmost markobject are placed into\r | |
1275 | a single Python dict, which single dict object replaces all of the\r | |
1276 | stack from the topmost markobject onward. The stack slice alternates\r | |
1277 | key, value, key, value, .... For example,\r | |
1278 | \r | |
1279 | Stack before: ... markobject 1 2 3 'abc'\r | |
1280 | Stack after: ... {1: 2, 3: 'abc'}\r | |
1281 | """),\r | |
1282 | \r | |
1283 | I(name='SETITEM',\r | |
1284 | code='s',\r | |
1285 | arg=None,\r | |
1286 | stack_before=[pydict, anyobject, anyobject],\r | |
1287 | stack_after=[pydict],\r | |
1288 | proto=0,\r | |
1289 | doc="""Add a key+value pair to an existing dict.\r | |
1290 | \r | |
1291 | Stack before: ... pydict key value\r | |
1292 | Stack after: ... pydict\r | |
1293 | \r | |
1294 | where pydict has been modified via pydict[key] = value.\r | |
1295 | """),\r | |
1296 | \r | |
1297 | I(name='SETITEMS',\r | |
1298 | code='u',\r | |
1299 | arg=None,\r | |
1300 | stack_before=[pydict, markobject, stackslice],\r | |
1301 | stack_after=[pydict],\r | |
1302 | proto=1,\r | |
1303 | doc="""Add an arbitrary number of key+value pairs to an existing dict.\r | |
1304 | \r | |
1305 | The slice of the stack following the topmost markobject is taken as\r | |
1306 | an alternating sequence of keys and values, added to the dict\r | |
1307 | immediately under the topmost markobject. Everything at and after the\r | |
1308 | topmost markobject is popped, leaving the mutated dict at the top\r | |
1309 | of the stack.\r | |
1310 | \r | |
1311 | Stack before: ... pydict markobject key_1 value_1 ... key_n value_n\r | |
1312 | Stack after: ... pydict\r | |
1313 | \r | |
1314 | where pydict has been modified via pydict[key_i] = value_i for i in\r | |
1315 | 1, 2, ..., n, and in that order.\r | |
1316 | """),\r | |
1317 | \r | |
1318 | # Stack manipulation.\r | |
1319 | \r | |
1320 | I(name='POP',\r | |
1321 | code='0',\r | |
1322 | arg=None,\r | |
1323 | stack_before=[anyobject],\r | |
1324 | stack_after=[],\r | |
1325 | proto=0,\r | |
1326 | doc="Discard the top stack item, shrinking the stack by one item."),\r | |
1327 | \r | |
1328 | I(name='DUP',\r | |
1329 | code='2',\r | |
1330 | arg=None,\r | |
1331 | stack_before=[anyobject],\r | |
1332 | stack_after=[anyobject, anyobject],\r | |
1333 | proto=0,\r | |
1334 | doc="Push the top stack item onto the stack again, duplicating it."),\r | |
1335 | \r | |
1336 | I(name='MARK',\r | |
1337 | code='(',\r | |
1338 | arg=None,\r | |
1339 | stack_before=[],\r | |
1340 | stack_after=[markobject],\r | |
1341 | proto=0,\r | |
1342 | doc="""Push markobject onto the stack.\r | |
1343 | \r | |
1344 | markobject is a unique object, used by other opcodes to identify a\r | |
1345 | region of the stack containing a variable number of objects for them\r | |
1346 | to work on. See markobject.doc for more detail.\r | |
1347 | """),\r | |
1348 | \r | |
1349 | I(name='POP_MARK',\r | |
1350 | code='1',\r | |
1351 | arg=None,\r | |
1352 | stack_before=[markobject, stackslice],\r | |
1353 | stack_after=[],\r | |
1354 | proto=1,\r | |
1355 | doc="""Pop all the stack objects at and above the topmost markobject.\r | |
1356 | \r | |
1357 | When an opcode using a variable number of stack objects is done,\r | |
1358 | POP_MARK is used to remove those objects, and to remove the markobject\r | |
1359 | that delimited their starting position on the stack.\r | |
1360 | """),\r | |
1361 | \r | |
1362 | # Memo manipulation. There are really only two operations (get and put),\r | |
1363 | # each in all-text, "short binary", and "long binary" flavors.\r | |
1364 | \r | |
1365 | I(name='GET',\r | |
1366 | code='g',\r | |
1367 | arg=decimalnl_short,\r | |
1368 | stack_before=[],\r | |
1369 | stack_after=[anyobject],\r | |
1370 | proto=0,\r | |
1371 | doc="""Read an object from the memo and push it on the stack.\r | |
1372 | \r | |
1373 | The index of the memo object to push is given by the newline-terminated\r | |
1374 | decimal string following. BINGET and LONG_BINGET are space-optimized\r | |
1375 | versions.\r | |
1376 | """),\r | |
1377 | \r | |
1378 | I(name='BINGET',\r | |
1379 | code='h',\r | |
1380 | arg=uint1,\r | |
1381 | stack_before=[],\r | |
1382 | stack_after=[anyobject],\r | |
1383 | proto=1,\r | |
1384 | doc="""Read an object from the memo and push it on the stack.\r | |
1385 | \r | |
1386 | The index of the memo object to push is given by the 1-byte unsigned\r | |
1387 | integer following.\r | |
1388 | """),\r | |
1389 | \r | |
1390 | I(name='LONG_BINGET',\r | |
1391 | code='j',\r | |
1392 | arg=int4,\r | |
1393 | stack_before=[],\r | |
1394 | stack_after=[anyobject],\r | |
1395 | proto=1,\r | |
1396 | doc="""Read an object from the memo and push it on the stack.\r | |
1397 | \r | |
1398 | The index of the memo object to push is given by the 4-byte signed\r | |
1399 | little-endian integer following.\r | |
1400 | """),\r | |
1401 | \r | |
1402 | I(name='PUT',\r | |
1403 | code='p',\r | |
1404 | arg=decimalnl_short,\r | |
1405 | stack_before=[],\r | |
1406 | stack_after=[],\r | |
1407 | proto=0,\r | |
1408 | doc="""Store the stack top into the memo. The stack is not popped.\r | |
1409 | \r | |
1410 | The index of the memo location to write into is given by the newline-\r | |
1411 | terminated decimal string following. BINPUT and LONG_BINPUT are\r | |
1412 | space-optimized versions.\r | |
1413 | """),\r | |
1414 | \r | |
1415 | I(name='BINPUT',\r | |
1416 | code='q',\r | |
1417 | arg=uint1,\r | |
1418 | stack_before=[],\r | |
1419 | stack_after=[],\r | |
1420 | proto=1,\r | |
1421 | doc="""Store the stack top into the memo. The stack is not popped.\r | |
1422 | \r | |
1423 | The index of the memo location to write into is given by the 1-byte\r | |
1424 | unsigned integer following.\r | |
1425 | """),\r | |
1426 | \r | |
1427 | I(name='LONG_BINPUT',\r | |
1428 | code='r',\r | |
1429 | arg=int4,\r | |
1430 | stack_before=[],\r | |
1431 | stack_after=[],\r | |
1432 | proto=1,\r | |
1433 | doc="""Store the stack top into the memo. The stack is not popped.\r | |
1434 | \r | |
1435 | The index of the memo location to write into is given by the 4-byte\r | |
1436 | signed little-endian integer following.\r | |
1437 | """),\r | |
1438 | \r | |
1439 | # Access the extension registry (predefined objects). Akin to the GET\r | |
1440 | # family.\r | |
1441 | \r | |
1442 | I(name='EXT1',\r | |
1443 | code='\x82',\r | |
1444 | arg=uint1,\r | |
1445 | stack_before=[],\r | |
1446 | stack_after=[anyobject],\r | |
1447 | proto=2,\r | |
1448 | doc="""Extension code.\r | |
1449 | \r | |
1450 | This code and the similar EXT2 and EXT4 allow using a registry\r | |
1451 | of popular objects that are pickled by name, typically classes.\r | |
1452 | It is envisioned that through a global negotiation and\r | |
1453 | registration process, third parties can set up a mapping between\r | |
1454 | ints and object names.\r | |
1455 | \r | |
1456 | In order to guarantee pickle interchangeability, the extension\r | |
1457 | code registry ought to be global, although a range of codes may\r | |
1458 | be reserved for private use.\r | |
1459 | \r | |
1460 | EXT1 has a 1-byte integer argument. This is used to index into the\r | |
1461 | extension registry, and the object at that index is pushed on the stack.\r | |
1462 | """),\r | |
1463 | \r | |
1464 | I(name='EXT2',\r | |
1465 | code='\x83',\r | |
1466 | arg=uint2,\r | |
1467 | stack_before=[],\r | |
1468 | stack_after=[anyobject],\r | |
1469 | proto=2,\r | |
1470 | doc="""Extension code.\r | |
1471 | \r | |
1472 | See EXT1. EXT2 has a two-byte integer argument.\r | |
1473 | """),\r | |
1474 | \r | |
1475 | I(name='EXT4',\r | |
1476 | code='\x84',\r | |
1477 | arg=int4,\r | |
1478 | stack_before=[],\r | |
1479 | stack_after=[anyobject],\r | |
1480 | proto=2,\r | |
1481 | doc="""Extension code.\r | |
1482 | \r | |
1483 | See EXT1. EXT4 has a four-byte integer argument.\r | |
1484 | """),\r | |
1485 | \r | |
1486 | # Push a class object, or module function, on the stack, via its module\r | |
1487 | # and name.\r | |
1488 | \r | |
1489 | I(name='GLOBAL',\r | |
1490 | code='c',\r | |
1491 | arg=stringnl_noescape_pair,\r | |
1492 | stack_before=[],\r | |
1493 | stack_after=[anyobject],\r | |
1494 | proto=0,\r | |
1495 | doc="""Push a global object (module.attr) on the stack.\r | |
1496 | \r | |
1497 | Two newline-terminated strings follow the GLOBAL opcode. The first is\r | |
1498 | taken as a module name, and the second as a class name. The class\r | |
1499 | object module.class is pushed on the stack. More accurately, the\r | |
1500 | object returned by self.find_class(module, class) is pushed on the\r | |
1501 | stack, so unpickling subclasses can override this form of lookup.\r | |
1502 | """),\r | |
1503 | \r | |
1504 | # Ways to build objects of classes pickle doesn't know about directly\r | |
1505 | # (user-defined classes). I despair of documenting this accurately\r | |
1506 | # and comprehensibly -- you really have to read the pickle code to\r | |
1507 | # find all the special cases.\r | |
1508 | \r | |
1509 | I(name='REDUCE',\r | |
1510 | code='R',\r | |
1511 | arg=None,\r | |
1512 | stack_before=[anyobject, anyobject],\r | |
1513 | stack_after=[anyobject],\r | |
1514 | proto=0,\r | |
1515 | doc="""Push an object built from a callable and an argument tuple.\r | |
1516 | \r | |
1517 | The opcode is named to remind of the __reduce__() method.\r | |
1518 | \r | |
1519 | Stack before: ... callable pytuple\r | |
1520 | Stack after: ... callable(*pytuple)\r | |
1521 | \r | |
1522 | The callable and the argument tuple are the first two items returned\r | |
1523 | by a __reduce__ method. Applying the callable to the argtuple is\r | |
1524 | supposed to reproduce the original object, or at least get it started.\r | |
1525 | If the __reduce__ method returns a 3-tuple, the last component is an\r | |
1526 | argument to be passed to the object's __setstate__, and then the REDUCE\r | |
1527 | opcode is followed by code to create setstate's argument, and then a\r | |
1528 | BUILD opcode to apply __setstate__ to that argument.\r | |
1529 | \r | |
1530 | If type(callable) is not ClassType, REDUCE complains unless the\r | |
1531 | callable has been registered with the copy_reg module's\r | |
1532 | safe_constructors dict, or the callable has a magic\r | |
1533 | '__safe_for_unpickling__' attribute with a true value. I'm not sure\r | |
1534 | why it does this, but I've sure seen this complaint often enough when\r | |
1535 | I didn't want to <wink>.\r | |
1536 | """),\r | |
1537 | \r | |
1538 | I(name='BUILD',\r | |
1539 | code='b',\r | |
1540 | arg=None,\r | |
1541 | stack_before=[anyobject, anyobject],\r | |
1542 | stack_after=[anyobject],\r | |
1543 | proto=0,\r | |
1544 | doc="""Finish building an object, via __setstate__ or dict update.\r | |
1545 | \r | |
1546 | Stack before: ... anyobject argument\r | |
1547 | Stack after: ... anyobject\r | |
1548 | \r | |
1549 | where anyobject may have been mutated, as follows:\r | |
1550 | \r | |
1551 | If the object has a __setstate__ method,\r | |
1552 | \r | |
1553 | anyobject.__setstate__(argument)\r | |
1554 | \r | |
1555 | is called.\r | |
1556 | \r | |
1557 | Else the argument must be a dict, the object must have a __dict__, and\r | |
1558 | the object is updated via\r | |
1559 | \r | |
1560 | anyobject.__dict__.update(argument)\r | |
1561 | \r | |
1562 | This may raise RuntimeError in restricted execution mode (which\r | |
1563 | disallows access to __dict__ directly); in that case, the object\r | |
1564 | is updated instead via\r | |
1565 | \r | |
1566 | for k, v in argument.items():\r | |
1567 | anyobject[k] = v\r | |
1568 | """),\r | |
1569 | \r | |
1570 | I(name='INST',\r | |
1571 | code='i',\r | |
1572 | arg=stringnl_noescape_pair,\r | |
1573 | stack_before=[markobject, stackslice],\r | |
1574 | stack_after=[anyobject],\r | |
1575 | proto=0,\r | |
1576 | doc="""Build a class instance.\r | |
1577 | \r | |
1578 | This is the protocol 0 version of protocol 1's OBJ opcode.\r | |
1579 | INST is followed by two newline-terminated strings, giving a\r | |
1580 | module and class name, just as for the GLOBAL opcode (and see\r | |
1581 | GLOBAL for more details about that). self.find_class(module, name)\r | |
1582 | is used to get a class object.\r | |
1583 | \r | |
1584 | In addition, all the objects on the stack following the topmost\r | |
1585 | markobject are gathered into a tuple and popped (along with the\r | |
1586 | topmost markobject), just as for the TUPLE opcode.\r | |
1587 | \r | |
1588 | Now it gets complicated. If all of these are true:\r | |
1589 | \r | |
1590 | + The argtuple is empty (markobject was at the top of the stack\r | |
1591 | at the start).\r | |
1592 | \r | |
1593 | + It's an old-style class object (the type of the class object is\r | |
1594 | ClassType).\r | |
1595 | \r | |
1596 | + The class object does not have a __getinitargs__ attribute.\r | |
1597 | \r | |
1598 | then we want to create an old-style class instance without invoking\r | |
1599 | its __init__() method (pickle has waffled on this over the years; not\r | |
1600 | calling __init__() is current wisdom). In this case, an instance of\r | |
1601 | an old-style dummy class is created, and then we try to rebind its\r | |
1602 | __class__ attribute to the desired class object. If this succeeds,\r | |
1603 | the new instance object is pushed on the stack, and we're done. In\r | |
1604 | restricted execution mode it can fail (assignment to __class__ is\r | |
1605 | disallowed), and I'm not really sure what happens then -- it looks\r | |
1606 | like the code ends up calling the class object's __init__ anyway,\r | |
1607 | via falling into the next case.\r | |
1608 | \r | |
1609 | Else (the argtuple is not empty, it's not an old-style class object,\r | |
1610 | or the class object does have a __getinitargs__ attribute), the code\r | |
1611 | first insists that the class object have a __safe_for_unpickling__\r | |
1612 | attribute. Unlike as for the __safe_for_unpickling__ check in REDUCE,\r | |
1613 | it doesn't matter whether this attribute has a true or false value, it\r | |
1614 | only matters whether it exists (XXX this is a bug; cPickle\r | |
1615 | requires the attribute to be true). If __safe_for_unpickling__\r | |
1616 | doesn't exist, UnpicklingError is raised.\r | |
1617 | \r | |
1618 | Else (the class object does have a __safe_for_unpickling__ attr),\r | |
1619 | the class object obtained from INST's arguments is applied to the\r | |
1620 | argtuple obtained from the stack, and the resulting instance object\r | |
1621 | is pushed on the stack.\r | |
1622 | \r | |
1623 | NOTE: checks for __safe_for_unpickling__ went away in Python 2.3.\r | |
1624 | """),\r | |
1625 | \r | |
1626 | I(name='OBJ',\r | |
1627 | code='o',\r | |
1628 | arg=None,\r | |
1629 | stack_before=[markobject, anyobject, stackslice],\r | |
1630 | stack_after=[anyobject],\r | |
1631 | proto=1,\r | |
1632 | doc="""Build a class instance.\r | |
1633 | \r | |
1634 | This is the protocol 1 version of protocol 0's INST opcode, and is\r | |
1635 | very much like it. The major difference is that the class object\r | |
1636 | is taken off the stack, allowing it to be retrieved from the memo\r | |
1637 | repeatedly if several instances of the same class are created. This\r | |
1638 | can be much more efficient (in both time and space) than repeatedly\r | |
1639 | embedding the module and class names in INST opcodes.\r | |
1640 | \r | |
1641 | Unlike INST, OBJ takes no arguments from the opcode stream. Instead\r | |
1642 | the class object is taken off the stack, immediately above the\r | |
1643 | topmost markobject:\r | |
1644 | \r | |
1645 | Stack before: ... markobject classobject stackslice\r | |
1646 | Stack after: ... new_instance_object\r | |
1647 | \r | |
1648 | As for INST, the remainder of the stack above the markobject is\r | |
1649 | gathered into an argument tuple, and then the logic seems identical,\r | |
1650 | except that no __safe_for_unpickling__ check is done (XXX this is\r | |
1651 | a bug; cPickle does test __safe_for_unpickling__). See INST for\r | |
1652 | the gory details.\r | |
1653 | \r | |
1654 | NOTE: In Python 2.3, INST and OBJ are identical except for how they\r | |
1655 | get the class object. That was always the intent; the implementations\r | |
1656 | had diverged for accidental reasons.\r | |
1657 | """),\r | |
1658 | \r | |
1659 | I(name='NEWOBJ',\r | |
1660 | code='\x81',\r | |
1661 | arg=None,\r | |
1662 | stack_before=[anyobject, anyobject],\r | |
1663 | stack_after=[anyobject],\r | |
1664 | proto=2,\r | |
1665 | doc="""Build an object instance.\r | |
1666 | \r | |
1667 | The stack before should be thought of as containing a class\r | |
1668 | object followed by an argument tuple (the tuple being the stack\r | |
1669 | top). Call these cls and args. They are popped off the stack,\r | |
1670 | and the value returned by cls.__new__(cls, *args) is pushed back\r | |
1671 | onto the stack.\r | |
1672 | """),\r | |
1673 | \r | |
1674 | # Machine control.\r | |
1675 | \r | |
1676 | I(name='PROTO',\r | |
1677 | code='\x80',\r | |
1678 | arg=uint1,\r | |
1679 | stack_before=[],\r | |
1680 | stack_after=[],\r | |
1681 | proto=2,\r | |
1682 | doc="""Protocol version indicator.\r | |
1683 | \r | |
1684 | For protocol 2 and above, a pickle must start with this opcode.\r | |
1685 | The argument is the protocol version, an int in range(2, 256).\r | |
1686 | """),\r | |
1687 | \r | |
1688 | I(name='STOP',\r | |
1689 | code='.',\r | |
1690 | arg=None,\r | |
1691 | stack_before=[anyobject],\r | |
1692 | stack_after=[],\r | |
1693 | proto=0,\r | |
1694 | doc="""Stop the unpickling machine.\r | |
1695 | \r | |
1696 | Every pickle ends with this opcode. The object at the top of the stack\r | |
1697 | is popped, and that's the result of unpickling. The stack should be\r | |
1698 | empty then.\r | |
1699 | """),\r | |
1700 | \r | |
1701 | # Ways to deal with persistent IDs.\r | |
1702 | \r | |
1703 | I(name='PERSID',\r | |
1704 | code='P',\r | |
1705 | arg=stringnl_noescape,\r | |
1706 | stack_before=[],\r | |
1707 | stack_after=[anyobject],\r | |
1708 | proto=0,\r | |
1709 | doc="""Push an object identified by a persistent ID.\r | |
1710 | \r | |
1711 | The pickle module doesn't define what a persistent ID means. PERSID's\r | |
1712 | argument is a newline-terminated str-style (no embedded escapes, no\r | |
1713 | bracketing quote characters) string, which *is* "the persistent ID".\r | |
1714 | The unpickler passes this string to self.persistent_load(). Whatever\r | |
1715 | object that returns is pushed on the stack. There is no implementation\r | |
1716 | of persistent_load() in Python's unpickler: it must be supplied by an\r | |
1717 | unpickler subclass.\r | |
1718 | """),\r | |
1719 | \r | |
1720 | I(name='BINPERSID',\r | |
1721 | code='Q',\r | |
1722 | arg=None,\r | |
1723 | stack_before=[anyobject],\r | |
1724 | stack_after=[anyobject],\r | |
1725 | proto=1,\r | |
1726 | doc="""Push an object identified by a persistent ID.\r | |
1727 | \r | |
1728 | Like PERSID, except the persistent ID is popped off the stack (instead\r | |
1729 | of being a string embedded in the opcode bytestream). The persistent\r | |
1730 | ID is passed to self.persistent_load(), and whatever object that\r | |
1731 | returns is pushed on the stack. See PERSID for more detail.\r | |
1732 | """),\r | |
1733 | ]\r | |
1734 | del I\r | |
1735 | \r | |
1736 | # Verify uniqueness of .name and .code members.\r | |
1737 | name2i = {}\r | |
1738 | code2i = {}\r | |
1739 | \r | |
1740 | for i, d in enumerate(opcodes):\r | |
1741 | if d.name in name2i:\r | |
1742 | raise ValueError("repeated name %r at indices %d and %d" %\r | |
1743 | (d.name, name2i[d.name], i))\r | |
1744 | if d.code in code2i:\r | |
1745 | raise ValueError("repeated code %r at indices %d and %d" %\r | |
1746 | (d.code, code2i[d.code], i))\r | |
1747 | \r | |
1748 | name2i[d.name] = i\r | |
1749 | code2i[d.code] = i\r | |
1750 | \r | |
1751 | del name2i, code2i, i, d\r | |
1752 | \r | |
1753 | ##############################################################################\r | |
1754 | # Build a code2op dict, mapping opcode characters to OpcodeInfo records.\r | |
1755 | # Also ensure we've got the same stuff as pickle.py, although the\r | |
1756 | # introspection here is dicey.\r | |
1757 | \r | |
1758 | code2op = {}\r | |
1759 | for d in opcodes:\r | |
1760 | code2op[d.code] = d\r | |
1761 | del d\r | |
1762 | \r | |
1763 | def assure_pickle_consistency(verbose=False):\r | |
1764 | import pickle, re\r | |
1765 | \r | |
1766 | copy = code2op.copy()\r | |
1767 | for name in pickle.__all__:\r | |
1768 | if not re.match("[A-Z][A-Z0-9_]+$", name):\r | |
1769 | if verbose:\r | |
1770 | print "skipping %r: it doesn't look like an opcode name" % name\r | |
1771 | continue\r | |
1772 | picklecode = getattr(pickle, name)\r | |
1773 | if not isinstance(picklecode, str) or len(picklecode) != 1:\r | |
1774 | if verbose:\r | |
1775 | print ("skipping %r: value %r doesn't look like a pickle "\r | |
1776 | "code" % (name, picklecode))\r | |
1777 | continue\r | |
1778 | if picklecode in copy:\r | |
1779 | if verbose:\r | |
1780 | print "checking name %r w/ code %r for consistency" % (\r | |
1781 | name, picklecode)\r | |
1782 | d = copy[picklecode]\r | |
1783 | if d.name != name:\r | |
1784 | raise ValueError("for pickle code %r, pickle.py uses name %r "\r | |
1785 | "but we're using name %r" % (picklecode,\r | |
1786 | name,\r | |
1787 | d.name))\r | |
1788 | # Forget this one. Any left over in copy at the end are a problem\r | |
1789 | # of a different kind.\r | |
1790 | del copy[picklecode]\r | |
1791 | else:\r | |
1792 | raise ValueError("pickle.py appears to have a pickle opcode with "\r | |
1793 | "name %r and code %r, but we don't" %\r | |
1794 | (name, picklecode))\r | |
1795 | if copy:\r | |
1796 | msg = ["we appear to have pickle opcodes that pickle.py doesn't have:"]\r | |
1797 | for code, d in copy.items():\r | |
1798 | msg.append(" name %r with code %r" % (d.name, code))\r | |
1799 | raise ValueError("\n".join(msg))\r | |
1800 | \r | |
1801 | assure_pickle_consistency()\r | |
1802 | del assure_pickle_consistency\r | |
1803 | \r | |
1804 | ##############################################################################\r | |
1805 | # A pickle opcode generator.\r | |
1806 | \r | |
1807 | def genops(pickle):\r | |
1808 | """Generate all the opcodes in a pickle.\r | |
1809 | \r | |
1810 | 'pickle' is a file-like object, or string, containing the pickle.\r | |
1811 | \r | |
1812 | Each opcode in the pickle is generated, from the current pickle position,\r | |
1813 | stopping after a STOP opcode is delivered. A triple is generated for\r | |
1814 | each opcode:\r | |
1815 | \r | |
1816 | opcode, arg, pos\r | |
1817 | \r | |
1818 | opcode is an OpcodeInfo record, describing the current opcode.\r | |
1819 | \r | |
1820 | If the opcode has an argument embedded in the pickle, arg is its decoded\r | |
1821 | value, as a Python object. If the opcode doesn't have an argument, arg\r | |
1822 | is None.\r | |
1823 | \r | |
1824 | If the pickle has a tell() method, pos was the value of pickle.tell()\r | |
1825 | before reading the current opcode. If the pickle is a string object,\r | |
1826 | it's wrapped in a StringIO object, and the latter's tell() result is\r | |
1827 | used. Else (the pickle doesn't have a tell(), and it's not obvious how\r | |
1828 | to query its current position) pos is None.\r | |
1829 | """\r | |
1830 | \r | |
1831 | import cStringIO as StringIO\r | |
1832 | \r | |
1833 | if isinstance(pickle, str):\r | |
1834 | pickle = StringIO.StringIO(pickle)\r | |
1835 | \r | |
1836 | if hasattr(pickle, "tell"):\r | |
1837 | getpos = pickle.tell\r | |
1838 | else:\r | |
1839 | getpos = lambda: None\r | |
1840 | \r | |
1841 | while True:\r | |
1842 | pos = getpos()\r | |
1843 | code = pickle.read(1)\r | |
1844 | opcode = code2op.get(code)\r | |
1845 | if opcode is None:\r | |
1846 | if code == "":\r | |
1847 | raise ValueError("pickle exhausted before seeing STOP")\r | |
1848 | else:\r | |
1849 | raise ValueError("at position %s, opcode %r unknown" % (\r | |
1850 | pos is None and "<unknown>" or pos,\r | |
1851 | code))\r | |
1852 | if opcode.arg is None:\r | |
1853 | arg = None\r | |
1854 | else:\r | |
1855 | arg = opcode.arg.reader(pickle)\r | |
1856 | yield opcode, arg, pos\r | |
1857 | if code == '.':\r | |
1858 | assert opcode.name == 'STOP'\r | |
1859 | break\r | |
1860 | \r | |
1861 | ##############################################################################\r | |
1862 | # A pickle optimizer.\r | |
1863 | \r | |
1864 | def optimize(p):\r | |
1865 | 'Optimize a pickle string by removing unused PUT opcodes'\r | |
1866 | gets = set() # set of args used by a GET opcode\r | |
1867 | puts = [] # (arg, startpos, stoppos) for the PUT opcodes\r | |
1868 | prevpos = None # set to pos if previous opcode was a PUT\r | |
1869 | for opcode, arg, pos in genops(p):\r | |
1870 | if prevpos is not None:\r | |
1871 | puts.append((prevarg, prevpos, pos))\r | |
1872 | prevpos = None\r | |
1873 | if 'PUT' in opcode.name:\r | |
1874 | prevarg, prevpos = arg, pos\r | |
1875 | elif 'GET' in opcode.name:\r | |
1876 | gets.add(arg)\r | |
1877 | \r | |
1878 | # Copy the pickle string except for PUTS without a corresponding GET\r | |
1879 | s = []\r | |
1880 | i = 0\r | |
1881 | for arg, start, stop in puts:\r | |
1882 | j = stop if (arg in gets) else start\r | |
1883 | s.append(p[i:j])\r | |
1884 | i = stop\r | |
1885 | s.append(p[i:])\r | |
1886 | return ''.join(s)\r | |
1887 | \r | |
1888 | ##############################################################################\r | |
1889 | # A symbolic pickle disassembler.\r | |
1890 | \r | |
1891 | def dis(pickle, out=None, memo=None, indentlevel=4):\r | |
1892 | """Produce a symbolic disassembly of a pickle.\r | |
1893 | \r | |
1894 | 'pickle' is a file-like object, or string, containing a (at least one)\r | |
1895 | pickle. The pickle is disassembled from the current position, through\r | |
1896 | the first STOP opcode encountered.\r | |
1897 | \r | |
1898 | Optional arg 'out' is a file-like object to which the disassembly is\r | |
1899 | printed. It defaults to sys.stdout.\r | |
1900 | \r | |
1901 | Optional arg 'memo' is a Python dict, used as the pickle's memo. It\r | |
1902 | may be mutated by dis(), if the pickle contains PUT or BINPUT opcodes.\r | |
1903 | Passing the same memo object to another dis() call then allows disassembly\r | |
1904 | to proceed across multiple pickles that were all created by the same\r | |
1905 | pickler with the same memo. Ordinarily you don't need to worry about this.\r | |
1906 | \r | |
1907 | Optional arg indentlevel is the number of blanks by which to indent\r | |
1908 | a new MARK level. It defaults to 4.\r | |
1909 | \r | |
1910 | In addition to printing the disassembly, some sanity checks are made:\r | |
1911 | \r | |
1912 | + All embedded opcode arguments "make sense".\r | |
1913 | \r | |
1914 | + Explicit and implicit pop operations have enough items on the stack.\r | |
1915 | \r | |
1916 | + When an opcode implicitly refers to a markobject, a markobject is\r | |
1917 | actually on the stack.\r | |
1918 | \r | |
1919 | + A memo entry isn't referenced before it's defined.\r | |
1920 | \r | |
1921 | + The markobject isn't stored in the memo.\r | |
1922 | \r | |
1923 | + A memo entry isn't redefined.\r | |
1924 | """\r | |
1925 | \r | |
1926 | # Most of the hair here is for sanity checks, but most of it is needed\r | |
1927 | # anyway to detect when a protocol 0 POP takes a MARK off the stack\r | |
1928 | # (which in turn is needed to indent MARK blocks correctly).\r | |
1929 | \r | |
1930 | stack = [] # crude emulation of unpickler stack\r | |
1931 | if memo is None:\r | |
1932 | memo = {} # crude emulation of unpicker memo\r | |
1933 | maxproto = -1 # max protocol number seen\r | |
1934 | markstack = [] # bytecode positions of MARK opcodes\r | |
1935 | indentchunk = ' ' * indentlevel\r | |
1936 | errormsg = None\r | |
1937 | for opcode, arg, pos in genops(pickle):\r | |
1938 | if pos is not None:\r | |
1939 | print >> out, "%5d:" % pos,\r | |
1940 | \r | |
1941 | line = "%-4s %s%s" % (repr(opcode.code)[1:-1],\r | |
1942 | indentchunk * len(markstack),\r | |
1943 | opcode.name)\r | |
1944 | \r | |
1945 | maxproto = max(maxproto, opcode.proto)\r | |
1946 | before = opcode.stack_before # don't mutate\r | |
1947 | after = opcode.stack_after # don't mutate\r | |
1948 | numtopop = len(before)\r | |
1949 | \r | |
1950 | # See whether a MARK should be popped.\r | |
1951 | markmsg = None\r | |
1952 | if markobject in before or (opcode.name == "POP" and\r | |
1953 | stack and\r | |
1954 | stack[-1] is markobject):\r | |
1955 | assert markobject not in after\r | |
1956 | if __debug__:\r | |
1957 | if markobject in before:\r | |
1958 | assert before[-1] is stackslice\r | |
1959 | if markstack:\r | |
1960 | markpos = markstack.pop()\r | |
1961 | if markpos is None:\r | |
1962 | markmsg = "(MARK at unknown opcode offset)"\r | |
1963 | else:\r | |
1964 | markmsg = "(MARK at %d)" % markpos\r | |
1965 | # Pop everything at and after the topmost markobject.\r | |
1966 | while stack[-1] is not markobject:\r | |
1967 | stack.pop()\r | |
1968 | stack.pop()\r | |
1969 | # Stop later code from popping too much.\r | |
1970 | try:\r | |
1971 | numtopop = before.index(markobject)\r | |
1972 | except ValueError:\r | |
1973 | assert opcode.name == "POP"\r | |
1974 | numtopop = 0\r | |
1975 | else:\r | |
1976 | errormsg = markmsg = "no MARK exists on stack"\r | |
1977 | \r | |
1978 | # Check for correct memo usage.\r | |
1979 | if opcode.name in ("PUT", "BINPUT", "LONG_BINPUT"):\r | |
1980 | assert arg is not None\r | |
1981 | if arg in memo:\r | |
1982 | errormsg = "memo key %r already defined" % arg\r | |
1983 | elif not stack:\r | |
1984 | errormsg = "stack is empty -- can't store into memo"\r | |
1985 | elif stack[-1] is markobject:\r | |
1986 | errormsg = "can't store markobject in the memo"\r | |
1987 | else:\r | |
1988 | memo[arg] = stack[-1]\r | |
1989 | \r | |
1990 | elif opcode.name in ("GET", "BINGET", "LONG_BINGET"):\r | |
1991 | if arg in memo:\r | |
1992 | assert len(after) == 1\r | |
1993 | after = [memo[arg]] # for better stack emulation\r | |
1994 | else:\r | |
1995 | errormsg = "memo key %r has never been stored into" % arg\r | |
1996 | \r | |
1997 | if arg is not None or markmsg:\r | |
1998 | # make a mild effort to align arguments\r | |
1999 | line += ' ' * (10 - len(opcode.name))\r | |
2000 | if arg is not None:\r | |
2001 | line += ' ' + repr(arg)\r | |
2002 | if markmsg:\r | |
2003 | line += ' ' + markmsg\r | |
2004 | print >> out, line\r | |
2005 | \r | |
2006 | if errormsg:\r | |
2007 | # Note that we delayed complaining until the offending opcode\r | |
2008 | # was printed.\r | |
2009 | raise ValueError(errormsg)\r | |
2010 | \r | |
2011 | # Emulate the stack effects.\r | |
2012 | if len(stack) < numtopop:\r | |
2013 | raise ValueError("tries to pop %d items from stack with "\r | |
2014 | "only %d items" % (numtopop, len(stack)))\r | |
2015 | if numtopop:\r | |
2016 | del stack[-numtopop:]\r | |
2017 | if markobject in after:\r | |
2018 | assert markobject not in before\r | |
2019 | markstack.append(pos)\r | |
2020 | \r | |
2021 | stack.extend(after)\r | |
2022 | \r | |
2023 | print >> out, "highest protocol among opcodes =", maxproto\r | |
2024 | if stack:\r | |
2025 | raise ValueError("stack not empty after STOP: %r" % stack)\r | |
2026 | \r | |
2027 | # For use in the doctest, simply as an example of a class to pickle.\r | |
2028 | class _Example:\r | |
2029 | def __init__(self, value):\r | |
2030 | self.value = value\r | |
2031 | \r | |
2032 | _dis_test = r"""\r | |
2033 | >>> import pickle\r | |
2034 | >>> x = [1, 2, (3, 4), {'abc': u"def"}]\r | |
2035 | >>> pkl = pickle.dumps(x, 0)\r | |
2036 | >>> dis(pkl)\r | |
2037 | 0: ( MARK\r | |
2038 | 1: l LIST (MARK at 0)\r | |
2039 | 2: p PUT 0\r | |
2040 | 5: I INT 1\r | |
2041 | 8: a APPEND\r | |
2042 | 9: I INT 2\r | |
2043 | 12: a APPEND\r | |
2044 | 13: ( MARK\r | |
2045 | 14: I INT 3\r | |
2046 | 17: I INT 4\r | |
2047 | 20: t TUPLE (MARK at 13)\r | |
2048 | 21: p PUT 1\r | |
2049 | 24: a APPEND\r | |
2050 | 25: ( MARK\r | |
2051 | 26: d DICT (MARK at 25)\r | |
2052 | 27: p PUT 2\r | |
2053 | 30: S STRING 'abc'\r | |
2054 | 37: p PUT 3\r | |
2055 | 40: V UNICODE u'def'\r | |
2056 | 45: p PUT 4\r | |
2057 | 48: s SETITEM\r | |
2058 | 49: a APPEND\r | |
2059 | 50: . STOP\r | |
2060 | highest protocol among opcodes = 0\r | |
2061 | \r | |
2062 | Try again with a "binary" pickle.\r | |
2063 | \r | |
2064 | >>> pkl = pickle.dumps(x, 1)\r | |
2065 | >>> dis(pkl)\r | |
2066 | 0: ] EMPTY_LIST\r | |
2067 | 1: q BINPUT 0\r | |
2068 | 3: ( MARK\r | |
2069 | 4: K BININT1 1\r | |
2070 | 6: K BININT1 2\r | |
2071 | 8: ( MARK\r | |
2072 | 9: K BININT1 3\r | |
2073 | 11: K BININT1 4\r | |
2074 | 13: t TUPLE (MARK at 8)\r | |
2075 | 14: q BINPUT 1\r | |
2076 | 16: } EMPTY_DICT\r | |
2077 | 17: q BINPUT 2\r | |
2078 | 19: U SHORT_BINSTRING 'abc'\r | |
2079 | 24: q BINPUT 3\r | |
2080 | 26: X BINUNICODE u'def'\r | |
2081 | 34: q BINPUT 4\r | |
2082 | 36: s SETITEM\r | |
2083 | 37: e APPENDS (MARK at 3)\r | |
2084 | 38: . STOP\r | |
2085 | highest protocol among opcodes = 1\r | |
2086 | \r | |
2087 | Exercise the INST/OBJ/BUILD family.\r | |
2088 | \r | |
2089 | >>> import pickletools\r | |
2090 | >>> dis(pickle.dumps(pickletools.dis, 0))\r | |
2091 | 0: c GLOBAL 'pickletools dis'\r | |
2092 | 17: p PUT 0\r | |
2093 | 20: . STOP\r | |
2094 | highest protocol among opcodes = 0\r | |
2095 | \r | |
2096 | >>> from pickletools import _Example\r | |
2097 | >>> x = [_Example(42)] * 2\r | |
2098 | >>> dis(pickle.dumps(x, 0))\r | |
2099 | 0: ( MARK\r | |
2100 | 1: l LIST (MARK at 0)\r | |
2101 | 2: p PUT 0\r | |
2102 | 5: ( MARK\r | |
2103 | 6: i INST 'pickletools _Example' (MARK at 5)\r | |
2104 | 28: p PUT 1\r | |
2105 | 31: ( MARK\r | |
2106 | 32: d DICT (MARK at 31)\r | |
2107 | 33: p PUT 2\r | |
2108 | 36: S STRING 'value'\r | |
2109 | 45: p PUT 3\r | |
2110 | 48: I INT 42\r | |
2111 | 52: s SETITEM\r | |
2112 | 53: b BUILD\r | |
2113 | 54: a APPEND\r | |
2114 | 55: g GET 1\r | |
2115 | 58: a APPEND\r | |
2116 | 59: . STOP\r | |
2117 | highest protocol among opcodes = 0\r | |
2118 | \r | |
2119 | >>> dis(pickle.dumps(x, 1))\r | |
2120 | 0: ] EMPTY_LIST\r | |
2121 | 1: q BINPUT 0\r | |
2122 | 3: ( MARK\r | |
2123 | 4: ( MARK\r | |
2124 | 5: c GLOBAL 'pickletools _Example'\r | |
2125 | 27: q BINPUT 1\r | |
2126 | 29: o OBJ (MARK at 4)\r | |
2127 | 30: q BINPUT 2\r | |
2128 | 32: } EMPTY_DICT\r | |
2129 | 33: q BINPUT 3\r | |
2130 | 35: U SHORT_BINSTRING 'value'\r | |
2131 | 42: q BINPUT 4\r | |
2132 | 44: K BININT1 42\r | |
2133 | 46: s SETITEM\r | |
2134 | 47: b BUILD\r | |
2135 | 48: h BINGET 2\r | |
2136 | 50: e APPENDS (MARK at 3)\r | |
2137 | 51: . STOP\r | |
2138 | highest protocol among opcodes = 1\r | |
2139 | \r | |
2140 | Try "the canonical" recursive-object test.\r | |
2141 | \r | |
2142 | >>> L = []\r | |
2143 | >>> T = L,\r | |
2144 | >>> L.append(T)\r | |
2145 | >>> L[0] is T\r | |
2146 | True\r | |
2147 | >>> T[0] is L\r | |
2148 | True\r | |
2149 | >>> L[0][0] is L\r | |
2150 | True\r | |
2151 | >>> T[0][0] is T\r | |
2152 | True\r | |
2153 | >>> dis(pickle.dumps(L, 0))\r | |
2154 | 0: ( MARK\r | |
2155 | 1: l LIST (MARK at 0)\r | |
2156 | 2: p PUT 0\r | |
2157 | 5: ( MARK\r | |
2158 | 6: g GET 0\r | |
2159 | 9: t TUPLE (MARK at 5)\r | |
2160 | 10: p PUT 1\r | |
2161 | 13: a APPEND\r | |
2162 | 14: . STOP\r | |
2163 | highest protocol among opcodes = 0\r | |
2164 | \r | |
2165 | >>> dis(pickle.dumps(L, 1))\r | |
2166 | 0: ] EMPTY_LIST\r | |
2167 | 1: q BINPUT 0\r | |
2168 | 3: ( MARK\r | |
2169 | 4: h BINGET 0\r | |
2170 | 6: t TUPLE (MARK at 3)\r | |
2171 | 7: q BINPUT 1\r | |
2172 | 9: a APPEND\r | |
2173 | 10: . STOP\r | |
2174 | highest protocol among opcodes = 1\r | |
2175 | \r | |
2176 | Note that, in the protocol 0 pickle of the recursive tuple, the disassembler\r | |
2177 | has to emulate the stack in order to realize that the POP opcode at 16 gets\r | |
2178 | rid of the MARK at 0.\r | |
2179 | \r | |
2180 | >>> dis(pickle.dumps(T, 0))\r | |
2181 | 0: ( MARK\r | |
2182 | 1: ( MARK\r | |
2183 | 2: l LIST (MARK at 1)\r | |
2184 | 3: p PUT 0\r | |
2185 | 6: ( MARK\r | |
2186 | 7: g GET 0\r | |
2187 | 10: t TUPLE (MARK at 6)\r | |
2188 | 11: p PUT 1\r | |
2189 | 14: a APPEND\r | |
2190 | 15: 0 POP\r | |
2191 | 16: 0 POP (MARK at 0)\r | |
2192 | 17: g GET 1\r | |
2193 | 20: . STOP\r | |
2194 | highest protocol among opcodes = 0\r | |
2195 | \r | |
2196 | >>> dis(pickle.dumps(T, 1))\r | |
2197 | 0: ( MARK\r | |
2198 | 1: ] EMPTY_LIST\r | |
2199 | 2: q BINPUT 0\r | |
2200 | 4: ( MARK\r | |
2201 | 5: h BINGET 0\r | |
2202 | 7: t TUPLE (MARK at 4)\r | |
2203 | 8: q BINPUT 1\r | |
2204 | 10: a APPEND\r | |
2205 | 11: 1 POP_MARK (MARK at 0)\r | |
2206 | 12: h BINGET 1\r | |
2207 | 14: . STOP\r | |
2208 | highest protocol among opcodes = 1\r | |
2209 | \r | |
2210 | Try protocol 2.\r | |
2211 | \r | |
2212 | >>> dis(pickle.dumps(L, 2))\r | |
2213 | 0: \x80 PROTO 2\r | |
2214 | 2: ] EMPTY_LIST\r | |
2215 | 3: q BINPUT 0\r | |
2216 | 5: h BINGET 0\r | |
2217 | 7: \x85 TUPLE1\r | |
2218 | 8: q BINPUT 1\r | |
2219 | 10: a APPEND\r | |
2220 | 11: . STOP\r | |
2221 | highest protocol among opcodes = 2\r | |
2222 | \r | |
2223 | >>> dis(pickle.dumps(T, 2))\r | |
2224 | 0: \x80 PROTO 2\r | |
2225 | 2: ] EMPTY_LIST\r | |
2226 | 3: q BINPUT 0\r | |
2227 | 5: h BINGET 0\r | |
2228 | 7: \x85 TUPLE1\r | |
2229 | 8: q BINPUT 1\r | |
2230 | 10: a APPEND\r | |
2231 | 11: 0 POP\r | |
2232 | 12: h BINGET 1\r | |
2233 | 14: . STOP\r | |
2234 | highest protocol among opcodes = 2\r | |
2235 | """\r | |
2236 | \r | |
2237 | _memo_test = r"""\r | |
2238 | >>> import pickle\r | |
2239 | >>> from StringIO import StringIO\r | |
2240 | >>> f = StringIO()\r | |
2241 | >>> p = pickle.Pickler(f, 2)\r | |
2242 | >>> x = [1, 2, 3]\r | |
2243 | >>> p.dump(x)\r | |
2244 | >>> p.dump(x)\r | |
2245 | >>> f.seek(0)\r | |
2246 | >>> memo = {}\r | |
2247 | >>> dis(f, memo=memo)\r | |
2248 | 0: \x80 PROTO 2\r | |
2249 | 2: ] EMPTY_LIST\r | |
2250 | 3: q BINPUT 0\r | |
2251 | 5: ( MARK\r | |
2252 | 6: K BININT1 1\r | |
2253 | 8: K BININT1 2\r | |
2254 | 10: K BININT1 3\r | |
2255 | 12: e APPENDS (MARK at 5)\r | |
2256 | 13: . STOP\r | |
2257 | highest protocol among opcodes = 2\r | |
2258 | >>> dis(f, memo=memo)\r | |
2259 | 14: \x80 PROTO 2\r | |
2260 | 16: h BINGET 0\r | |
2261 | 18: . STOP\r | |
2262 | highest protocol among opcodes = 2\r | |
2263 | """\r | |
2264 | \r | |
2265 | __test__ = {'disassembler_test': _dis_test,\r | |
2266 | 'disassembler_memo_test': _memo_test,\r | |
2267 | }\r | |
2268 | \r | |
2269 | def _test():\r | |
2270 | import doctest\r | |
2271 | return doctest.testmod()\r | |
2272 | \r | |
2273 | if __name__ == "__main__":\r | |
2274 | _test()\r |