]>
Commit | Line | Data |
---|---|---|
4710c53d | 1 | Intro\r |
2 | =====\r | |
3 | \r | |
4 | The basic rule for dealing with weakref callbacks (and __del__ methods too,\r | |
5 | for that matter) during cyclic gc:\r | |
6 | \r | |
7 | Once gc has computed the set of unreachable objects, no Python-level\r | |
8 | code can be allowed to access an unreachable object.\r | |
9 | \r | |
10 | If that can happen, then the Python code can resurrect unreachable objects\r | |
11 | too, and gc can't detect that without starting over. Since gc eventually\r | |
12 | runs tp_clear on all unreachable objects, if an unreachable object is\r | |
13 | resurrected then tp_clear will eventually be called on it (or may already\r | |
14 | have been called before resurrection). At best (and this has been an\r | |
15 | historically common bug), tp_clear empties an instance's __dict__, and\r | |
16 | "impossible" AttributeErrors result. At worst, tp_clear leaves behind an\r | |
17 | insane object at the C level, and segfaults result (historically, most\r | |
18 | often by setting a new-style class's mro pointer to NULL, after which\r | |
19 | attribute lookups performed by the class can segfault).\r | |
20 | \r | |
21 | OTOH, it's OK to run Python-level code that can't access unreachable\r | |
22 | objects, and sometimes that's necessary. The chief example is the callback\r | |
23 | attached to a reachable weakref W to an unreachable object O. Since O is\r | |
24 | going away, and W is still alive, the callback must be invoked. Because W\r | |
25 | is still alive, everything reachable from its callback is also reachable,\r | |
26 | so it's also safe to invoke the callback (although that's trickier than it\r | |
27 | sounds, since other reachable weakrefs to other unreachable objects may\r | |
28 | still exist, and be accessible to the callback -- there are lots of painful\r | |
29 | details like this covered in the rest of this file).\r | |
30 | \r | |
31 | Python 2.4/2.3.5\r | |
32 | ================\r | |
33 | \r | |
34 | The "Before 2.3.3" section below turned out to be wrong in some ways, but\r | |
35 | I'm leaving it as-is because it's more right than wrong, and serves as a\r | |
36 | wonderful example of how painful analysis can miss not only the forest for\r | |
37 | the trees, but also miss the trees for the aphids sucking the trees\r | |
38 | dry <wink>.\r | |
39 | \r | |
40 | The primary thing it missed is that when a weakref to a piece of cyclic\r | |
41 | trash (CT) exists, then any call to any Python code whatsoever can end up\r | |
42 | materializing a strong reference to that weakref's CT referent, and so\r | |
43 | possibly resurrect an insane object (one for which cyclic gc has called-- or\r | |
44 | will call before it's done --tp_clear()). It's not even necessarily that a\r | |
45 | weakref callback or __del__ method does something nasty on purpose: as\r | |
46 | soon as we execute Python code, threads other than the gc thread can run\r | |
47 | too, and they can do ordinary things with weakrefs that end up resurrecting\r | |
48 | CT while gc is running.\r | |
49 | \r | |
50 | http://www.python.org/sf/1055820\r | |
51 | \r | |
52 | shows how innocent it can be, and also how nasty. Variants of the three\r | |
53 | focussed test cases attached to that bug report are now part of Python's\r | |
54 | standard Lib/test/test_gc.py.\r | |
55 | \r | |
56 | Jim Fulton gave the best nutshell summary of the new (in 2.4 and 2.3.5)\r | |
57 | approach:\r | |
58 | \r | |
59 | Clearing cyclic trash can call Python code. If there are weakrefs to\r | |
60 | any of the cyclic trash, then those weakrefs can be used to resurrect\r | |
61 | the objects. Therefore, *before* clearing cyclic trash, we need to\r | |
62 | remove any weakrefs. If any of the weakrefs being removed have\r | |
63 | callbacks, then we need to save the callbacks and call them *after* all\r | |
64 | of the weakrefs have been cleared.\r | |
65 | \r | |
66 | Alas, doing just that much doesn't work, because it overlooks what turned\r | |
67 | out to be the much subtler problems that were fixed earlier, and described\r | |
68 | below. We do clear all weakrefs to CT now before breaking cycles, but not\r | |
69 | all callbacks encountered can be run later. That's explained in horrid\r | |
70 | detail below.\r | |
71 | \r | |
72 | Older text follows, with a some later comments in [] brackets:\r | |
73 | \r | |
74 | Before 2.3.3\r | |
75 | ============\r | |
76 | \r | |
77 | Before 2.3.3, Python's cyclic gc didn't pay any attention to weakrefs.\r | |
78 | Segfaults in Zope3 resulted.\r | |
79 | \r | |
80 | weakrefs in Python are designed to, at worst, let *other* objects learn\r | |
81 | that a given object has died, via a callback function. The weakly\r | |
82 | referenced object itself is not passed to the callback, and the presumption\r | |
83 | is that the weakly referenced object is unreachable trash at the time the\r | |
84 | callback is invoked.\r | |
85 | \r | |
86 | That's usually true, but not always. Suppose a weakly referenced object\r | |
87 | becomes part of a clump of cyclic trash. When enough cycles are broken by\r | |
88 | cyclic gc that the object is reclaimed, the callback is invoked. If it's\r | |
89 | possible for the callback to get at objects in the cycle(s), then it may be\r | |
90 | possible for those objects to access (via strong references in the cycle)\r | |
91 | the weakly referenced object being torn down, or other objects in the cycle\r | |
92 | that have already suffered a tp_clear() call. There's no guarantee that an\r | |
93 | object is in a sane state after tp_clear(). Bad things (including\r | |
94 | segfaults) can happen right then, during the callback's execution, or can\r | |
95 | happen at any later time if the callback manages to resurrect an insane\r | |
96 | object.\r | |
97 | \r | |
98 | [That missed that, in addition, a weakref to CT can exist outside CT, and\r | |
99 | any callback into Python can use such a non-CT weakref to resurrect its CT\r | |
100 | referent. The same bad kinds of things can happen then.]\r | |
101 | \r | |
102 | Note that if it's possible for the callback to get at objects in the trash\r | |
103 | cycles, it must also be the case that the callback itself is part of the\r | |
104 | trash cycles. Else the callback would have acted as an external root to\r | |
105 | the current collection, and nothing reachable from it would be in cyclic\r | |
106 | trash either.\r | |
107 | \r | |
108 | [Except that a non-CT callback can also use a non-CT weakref to get at\r | |
109 | CT objects.]\r | |
110 | \r | |
111 | More, if the callback itself is in cyclic trash, then the weakref to which\r | |
112 | the callback is attached must also be trash, and for the same kind of\r | |
113 | reason: if the weakref acted as an external root, then the callback could\r | |
114 | not have been cyclic trash.\r | |
115 | \r | |
116 | So a problem here requires that a weakref, that weakref's callback, and the\r | |
117 | weakly referenced object, all be in cyclic trash at the same time. This\r | |
118 | isn't easy to stumble into by accident while Python is running, and, indeed,\r | |
119 | it took quite a while to dream up failing test cases. Zope3 saw segfaults\r | |
120 | during shutdown, during the second call of gc in Py_Finalize, after most\r | |
121 | modules had been torn down. That creates many trash cycles (esp. those\r | |
122 | involving new-style classes), making the problem much more likely. Once you\r | |
123 | know what's required to provoke the problem, though, it's easy to create\r | |
124 | tests that segfault before shutdown.\r | |
125 | \r | |
126 | In 2.3.3, before breaking cycles, we first clear all the weakrefs with\r | |
127 | callbacks in cyclic trash. Since the weakrefs *are* trash, and there's no\r | |
128 | defined-- or even predictable --order in which tp_clear() gets called on\r | |
129 | cyclic trash, it's defensible to first clear weakrefs with callbacks. It's\r | |
130 | a feature of Python's weakrefs too that when a weakref goes away, the\r | |
131 | callback (if any) associated with it is thrown away too, unexecuted.\r | |
132 | \r | |
133 | [In 2.4/2.3.5, we first clear all weakrefs to CT objects, whether or not\r | |
134 | those weakrefs are themselves CT, and whether or not they have callbacks.\r | |
135 | The callbacks (if any) on non-CT weakrefs (if any) are invoked later,\r | |
136 | after all weakrefs-to-CT have been cleared. The callbacks (if any) on CT\r | |
137 | weakrefs (if any) are never invoked, for the excruciating reasons\r | |
138 | explained here.]\r | |
139 | \r | |
140 | Just that much is almost enough to prevent problems, by throwing away\r | |
141 | *almost* all the weakref callbacks that could get triggered by gc. The\r | |
142 | problem remaining is that clearing a weakref with a callback decrefs the\r | |
143 | callback object, and the callback object may *itself* be weakly referenced,\r | |
144 | via another weakref with another callback. So the process of clearing\r | |
145 | weakrefs can trigger callbacks attached to other weakrefs, and those\r | |
146 | latter weakrefs may or may not be part of cyclic trash.\r | |
147 | \r | |
148 | So, to prevent any Python code from running while gc is invoking tp_clear()\r | |
149 | on all the objects in cyclic trash,\r | |
150 | \r | |
151 | [That was always wrong: we can't stop Python code from running when gc\r | |
152 | is breaking cycles. If an object with a __del__ method is not itself in\r | |
153 | a cycle, but is reachable only from CT, then breaking cycles will, as a\r | |
154 | matter of course, drop the refcount on that object to 0, and its __del__\r | |
155 | will run right then. What we can and must stop is running any Python\r | |
156 | code that could access CT.]\r | |
157 | it's not quite enough just to invoke\r | |
158 | tp_clear() on weakrefs with callbacks first. Instead the weakref module\r | |
159 | grew a new private function (_PyWeakref_ClearRef) that does only part of\r | |
160 | tp_clear(): it removes the weakref from the weakly-referenced object's list\r | |
161 | of weakrefs, but does not decref the callback object. So calling\r | |
162 | _PyWeakref_ClearRef(wr) ensures that wr's callback object will never\r | |
163 | trigger, and (unlike weakref's tp_clear()) also prevents any callback\r | |
164 | associated *with* wr's callback object from triggering.\r | |
165 | \r | |
166 | [Although we may trigger such callbacks later, as explained below.]\r | |
167 | \r | |
168 | Then we can call tp_clear on all the cyclic objects and never trigger\r | |
169 | Python code.\r | |
170 | \r | |
171 | [As above, not so: it means never trigger Python code that can access CT.]\r | |
172 | \r | |
173 | After we do that, the callback objects still need to be decref'ed. Callbacks\r | |
174 | (if any) *on* the callback objects that were also part of cyclic trash won't\r | |
175 | get invoked, because we cleared all trash weakrefs with callbacks at the\r | |
176 | start. Callbacks on the callback objects that were not part of cyclic trash\r | |
177 | acted as external roots to everything reachable from them, so nothing\r | |
178 | reachable from them was part of cyclic trash, so gc didn't do any damage to\r | |
179 | objects reachable from them, and it's safe to call them at the end of gc.\r | |
180 | \r | |
181 | [That's so. In addition, now we also invoke (if any) the callbacks on\r | |
182 | non-CT weakrefs to CT objects, during the same pass that decrefs the\r | |
183 | callback objects.]\r | |
184 | \r | |
185 | An alternative would have been to treat objects with callbacks like objects\r | |
186 | with __del__ methods, refusing to collect them, appending them to gc.garbage\r | |
187 | instead. That would have been much easier. Jim Fulton gave a strong\r | |
188 | argument against that (on Python-Dev):\r | |
189 | \r | |
190 | There's a big difference between __del__ and weakref callbacks.\r | |
191 | The __del__ method is "internal" to a design. When you design a\r | |
192 | class with a del method, you know you have to avoid including the\r | |
193 | class in cycles.\r | |
194 | \r | |
195 | Now, suppose you have a design that makes has no __del__ methods but\r | |
196 | that does use cyclic data structures. You reason about the design,\r | |
197 | run tests, and convince yourself you don't have a leak.\r | |
198 | \r | |
199 | Now, suppose some external code creates a weakref to one of your\r | |
200 | objects. All of a sudden, you start leaking. You can look at your\r | |
201 | code all you want and you won't find a reason for the leak.\r | |
202 | \r | |
203 | IOW, a class designer can out-think __del__ problems, but has no control\r | |
204 | over who creates weakrefs to his classes or class instances. The class\r | |
205 | user has little chance either of predicting when the weakrefs he creates\r | |
206 | may end up in cycles.\r | |
207 | \r | |
208 | Callbacks on weakref callbacks are executed in an arbitrary order, and\r | |
209 | that's not good (a primary reason not to collect cycles with objects with\r | |
210 | __del__ methods is to avoid running finalizers in an arbitrary order).\r | |
211 | However, a weakref callback on a weakref callback has got to be rare.\r | |
212 | It's possible to do such a thing, so gc has to be robust against it, but\r | |
213 | I doubt anyone has done it outside the test case I wrote for it.\r | |
214 | \r | |
215 | [The callbacks (if any) on non-CT weakrefs to CT objects are also executed\r | |
216 | in an arbitrary order now. But they were before too, depending on the\r | |
217 | vagaries of when tp_clear() happened to break enough cycles to trigger\r | |
218 | them. People simply shouldn't try to use __del__ or weakref callbacks to\r | |
219 | do fancy stuff.]\r |