]>
Commit | Line | Data |
---|---|---|
85aaf69f SL |
1 | ================================= |
2 | MergeFunctions pass, how it works | |
3 | ================================= | |
4 | ||
5 | .. contents:: | |
6 | :local: | |
7 | ||
8 | Introduction | |
9 | ============ | |
10 | Sometimes code contains equal functions, or functions that does exactly the same | |
11 | thing even though they are non-equal on the IR level (e.g.: multiplication on 2 | |
12 | and 'shl 1'). It could happen due to several reasons: mainly, the usage of | |
13 | templates and automatic code generators. Though, sometimes user itself could | |
14 | write the same thing twice :-) | |
15 | ||
16 | The main purpose of this pass is to recognize such functions and merge them. | |
17 | ||
18 | Why would I want to read this document? | |
19 | --------------------------------------- | |
20 | Document is the extension to pass comments and describes the pass logic. It | |
21 | describes algorithm that is used in order to compare functions, it also | |
22 | explains how we could combine equal functions correctly, keeping module valid. | |
23 | ||
24 | Material is brought in top-down form, so reader could start learn pass from | |
25 | ideas and end up with low-level algorithm details, thus preparing him for | |
26 | reading the sources. | |
27 | ||
28 | So main goal is do describe algorithm and logic here; the concept. This document | |
29 | is good for you, if you *don't want* to read the source code, but want to | |
30 | understand pass algorithms. Author tried not to repeat the source-code and | |
31 | cover only common cases, and thus avoid cases when after minor code changes we | |
32 | need to update this document. | |
33 | ||
34 | ||
35 | What should I know to be able to follow along with this document? | |
36 | ----------------------------------------------------------------- | |
37 | ||
38 | Reader should be familiar with common compile-engineering principles and LLVM | |
39 | code fundamentals. In this article we suppose reader is familiar with | |
40 | `Single Static Assingment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ | |
41 | concepts. Understanding of | |
42 | `IR structure <http://llvm.org/docs/LangRef.html#high-level-structure>`_ is | |
43 | also important. | |
44 | ||
45 | We will use such terms as | |
46 | "`module <http://llvm.org/docs/LangRef.html#high-level-structure>`_", | |
47 | "`function <http://llvm.org/docs/ProgrammersManual.html#the-function-class>`_", | |
48 | "`basic block <http://en.wikipedia.org/wiki/Basic_block>`_", | |
49 | "`user <http://llvm.org/docs/ProgrammersManual.html#the-user-class>`_", | |
50 | "`value <http://llvm.org/docs/ProgrammersManual.html#the-value-class>`_", | |
51 | "`instruction <http://llvm.org/docs/ProgrammersManual.html#the-instruction-class>`_". | |
52 | ||
53 | As a good start point, Kaleidoscope tutorial could be used: | |
54 | ||
55 | :doc:`tutorial/index` | |
56 | ||
57 | Especially it's important to understand chapter 3 of tutorial: | |
58 | ||
59 | :doc:`tutorial/LangImpl3` | |
60 | ||
61 | Reader also should know how passes work in LLVM, he could use next article as a | |
62 | reference and start point here: | |
63 | ||
64 | :doc:`WritingAnLLVMPass` | |
65 | ||
66 | What else? Well perhaps reader also should have some experience in LLVM pass | |
67 | debugging and bug-fixing. | |
68 | ||
69 | What I gain by reading this document? | |
70 | ------------------------------------- | |
71 | Main purpose is to provide reader with comfortable form of algorithms | |
72 | description, namely the human reading text. Since it could be hard to | |
73 | understand algorithm straight from the source code: pass uses some principles | |
74 | that have to be explained first. | |
75 | ||
76 | Author wishes to everybody to avoid case, when you read code from top to bottom | |
77 | again and again, and yet you don't understand why we implemented it that way. | |
78 | ||
79 | We hope that after this article reader could easily debug and improve | |
80 | MergeFunctions pass and thus help LLVM project. | |
81 | ||
82 | Narrative structure | |
83 | ------------------- | |
84 | Article consists of three parts. First part explains pass functionality on the | |
85 | top-level. Second part describes the comparison procedure itself. The third | |
86 | part describes the merging process. | |
87 | ||
88 | In every part author also tried to put the contents into the top-down form. | |
89 | First, the top-level methods will be described, while the terminal ones will be | |
90 | at the end, in the tail of each part. If reader will see the reference to the | |
91 | method that wasn't described yet, he will find its description a bit below. | |
92 | ||
93 | Basics | |
94 | ====== | |
95 | ||
96 | How to do it? | |
97 | ------------- | |
98 | Do we need to merge functions? Obvious thing is: yes that's a quite possible | |
99 | case, since usually we *do* have duplicates. And it would be good to get rid of | |
100 | them. But how to detect such a duplicates? The idea is next: we split functions | |
101 | onto small bricks (parts), then we compare "bricks" amount, and if it equal, | |
102 | compare "bricks" themselves, and then do our conclusions about functions | |
103 | themselves. | |
104 | ||
105 | What the difference it could be? For example, on machine with 64-bit pointers | |
106 | (let's assume we have only one address space), one function stores 64-bit | |
107 | integer, while another one stores a pointer. So if the target is a machine | |
108 | mentioned above, and if functions are identical, except the parameter type (we | |
109 | could consider it as a part of function type), then we can treat ``uint64_t`` | |
110 | and``void*`` as equal. | |
111 | ||
112 | It was just an example; possible details are described a bit below. | |
113 | ||
114 | As another example reader may imagine two more functions. First function | |
115 | performs multiplication on 2, while the second one performs arithmetic right | |
116 | shift on 1. | |
117 | ||
118 | Possible solutions | |
119 | ^^^^^^^^^^^^^^^^^^ | |
120 | Let's briefly consider possible options about how and what we have to implement | |
121 | in order to create full-featured functions merging, and also what it would | |
122 | meant for us. | |
123 | ||
124 | Equal functions detection, obviously supposes "detector" method to be | |
125 | implemented, latter should answer the question "whether functions are equal". | |
126 | This "detector" method consists of tiny "sub-detectors", each of them answers | |
127 | exactly the same question, but for function parts. | |
128 | ||
129 | As the second step, we should merge equal functions. So it should be a "merger" | |
130 | method. "Merger" accepts two functions *F1* and *F2*, and produces *F1F2* | |
131 | function, the result of merging. | |
132 | ||
133 | Having such a routines in our hands, we can process whole module, and merge all | |
134 | equal functions. | |
135 | ||
136 | In this case, we have to compare every function with every another function. As | |
137 | reader could notice, this way seems to be quite expensive. Of course we could | |
138 | introduce hashing and other helpers, but it is still just an optimization, and | |
139 | thus the level of O(N*N) complexity. | |
140 | ||
141 | Can we reach another level? Could we introduce logarithmical search, or random | |
142 | access lookup? The answer is: "yes". | |
143 | ||
144 | Random-access | |
145 | """"""""""""" | |
146 | How it could be done? Just convert each function to number, and gather all of | |
147 | them in special hash-table. Functions with equal hash are equal. Good hashing | |
148 | means, that every function part must be taken into account. That means we have | |
149 | to convert every function part into some number, and then add it into hash. | |
150 | Lookup-up time would be small, but such approach adds some delay due to hashing | |
151 | routine. | |
152 | ||
153 | Logarithmical search | |
154 | """""""""""""""""""" | |
155 | We could introduce total ordering among the functions set, once we had it we | |
156 | could then implement a logarithmical search. Lookup time still depends on N, | |
157 | but adds a little of delay (*log(N)*). | |
158 | ||
159 | Present state | |
160 | """"""""""""" | |
161 | Both of approaches (random-access and logarithmical) has been implemented and | |
162 | tested. And both of them gave a very good improvement. And what was most | |
163 | surprising, logarithmical search was faster; sometimes up to 15%. Hashing needs | |
164 | some extra CPU time, and it is the main reason why it works slower; in most of | |
165 | cases total "hashing" time was greater than total "logarithmical-search" time. | |
166 | ||
167 | So, preference has been granted to the "logarithmical search". | |
168 | ||
169 | Though in the case of need, *logarithmical-search* (read "total-ordering") could | |
170 | be used as a milestone on our way to the *random-access* implementation. | |
171 | ||
172 | Every comparison is based either on the numbers or on flags comparison. In | |
173 | *random-access* approach we could use the same comparison algorithm. During | |
174 | comparison we exit once we find the difference, but here we might have to scan | |
175 | whole function body every time (note, it could be slower). Like in | |
176 | "total-ordering", we will track every numbers and flags, but instead of | |
177 | comparison, we should get numbers sequence and then create the hash number. So, | |
178 | once again, *total-ordering* could be considered as a milestone for even faster | |
179 | (in theory) random-access approach. | |
180 | ||
181 | MergeFunctions, main fields and runOnModule | |
182 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
183 | There are two most important fields in class: | |
184 | ||
185 | ``FnTree`` – the set of all unique functions. It keeps items that couldn't be | |
186 | merged with each other. It is defined as: | |
187 | ||
188 | ``std::set<FunctionNode> FnTree;`` | |
189 | ||
190 | Here ``FunctionNode`` is a wrapper for ``llvm::Function`` class, with | |
191 | implemented “<” operator among the functions set (below we explain how it works | |
192 | exactly; this is a key point in fast functions comparison). | |
193 | ||
194 | ``Deferred`` – merging process can affect bodies of functions that are in | |
195 | ``FnTree`` already. Obviously such functions should be rechecked again. In this | |
196 | case we remove them from ``FnTree``, and mark them as to be rescanned, namely | |
197 | put them into ``Deferred`` list. | |
198 | ||
199 | runOnModule | |
200 | """"""""""" | |
201 | The algorithm is pretty simple: | |
202 | ||
203 | 1. Put all module's functions into the *worklist*. | |
204 | ||
205 | 2. Scan *worklist*'s functions twice: first enumerate only strong functions and | |
206 | then only weak ones: | |
207 | ||
208 | 2.1. Loop body: take function from *worklist* (call it *FCur*) and try to | |
209 | insert it into *FnTree*: check whether *FCur* is equal to one of functions | |
210 | in *FnTree*. If there *is* equal function in *FnTree* (call it *FExists*): | |
211 | merge function *FCur* with *FExists*. Otherwise add function from *worklist* | |
212 | to *FnTree*. | |
213 | ||
214 | 3. Once *worklist* scanning and merging operations is complete, check *Deferred* | |
215 | list. If it is not empty: refill *worklist* contents with *Deferred* list and | |
216 | do step 2 again, if *Deferred* is empty, then exit from method. | |
217 | ||
218 | Comparison and logarithmical search | |
219 | """"""""""""""""""""""""""""""""""" | |
220 | Let's recall our task: for every function *F* from module *M*, we have to find | |
221 | equal functions *F`* in shortest time, and merge them into the single function. | |
222 | ||
223 | Defining total ordering among the functions set allows to organize functions | |
224 | into the binary tree. The lookup procedure complexity would be estimated as | |
225 | O(log(N)) in this case. But how to define *total-ordering*? | |
226 | ||
227 | We have to introduce a single rule applicable to every pair of functions, and | |
228 | following this rule then evaluate which of them is greater. What kind of rule | |
229 | it could be? Let's declare it as "compare" method, that returns one of 3 | |
230 | possible values: | |
231 | ||
232 | -1, left is *less* than right, | |
233 | ||
234 | 0, left and right are *equal*, | |
235 | ||
236 | 1, left is *greater* than right. | |
237 | ||
238 | Of course it means, that we have to maintain | |
239 | *strict and non-strict order relation properties*: | |
240 | ||
241 | * reflexivity (``a <= a``, ``a == a``, ``a >= a``), | |
242 | * antisymmetry (if ``a <= b`` and ``b <= a`` then ``a == b``), | |
243 | * transitivity (``a <= b`` and ``b <= c``, then ``a <= c``) | |
244 | * asymmetry (if ``a < b``, then ``a > b`` or ``a == b``). | |
245 | ||
246 | As it was mentioned before, comparison routine consists of | |
247 | "sub-comparison-routines", each of them also consists | |
248 | "sub-comparison-routines", and so on, finally it ends up with a primitives | |
249 | comparison. | |
250 | ||
251 | Below, we will use the next operations: | |
252 | ||
253 | #. ``cmpNumbers(number1, number2)`` is method that returns -1 if left is less | |
254 | than right; 0, if left and right are equal; and 1 otherwise. | |
255 | ||
256 | #. ``cmpFlags(flag1, flag2)`` is hypothetical method that compares two flags. | |
257 | The logic is the same as in ``cmpNumbers``, where ``true`` is 1, and | |
258 | ``false`` is 0. | |
259 | ||
260 | The rest of article is based on *MergeFunctions.cpp* source code | |
261 | (*<llvm_dir>/lib/Transforms/IPO/MergeFunctions.cpp*). We would like to ask | |
262 | reader to keep this file open nearby, so we could use it as a reference for | |
263 | further explanations. | |
264 | ||
265 | Now we're ready to proceed to the next chapter and see how it works. | |
266 | ||
267 | Functions comparison | |
268 | ==================== | |
269 | At first, let's define how exactly we compare complex objects. | |
270 | ||
271 | Complex objects comparison (function, basic-block, etc) is mostly based on its | |
272 | sub-objects comparison results. So it is similar to the next "tree" objects | |
273 | comparison: | |
274 | ||
275 | #. For two trees *T1* and *T2* we perform *depth-first-traversal* and have | |
276 | two sequences as a product: "*T1Items*" and "*T2Items*". | |
277 | ||
278 | #. Then compare chains "*T1Items*" and "*T2Items*" in | |
279 | most-significant-item-first order. Result of items comparison would be the | |
280 | result of *T1* and *T2* comparison itself. | |
281 | ||
282 | FunctionComparator::compare(void) | |
283 | --------------------------------- | |
284 | Brief look at the source code tells us, that comparison starts in | |
285 | “``int FunctionComparator::compare(void)``” method. | |
286 | ||
287 | 1. First parts to be compared are function's attributes and some properties that | |
288 | outsides “attributes” term, but still could make function different without | |
289 | changing its body. This part of comparison is usually done within simple | |
290 | *cmpNumbers* or *cmpFlags* operations (e.g. | |
291 | ``cmpFlags(F1->hasGC(), F2->hasGC())``). Below is full list of function's | |
292 | properties to be compared on this stage: | |
293 | ||
294 | * *Attributes* (those are returned by ``Function::getAttributes()`` | |
295 | method). | |
296 | ||
297 | * *GC*, for equivalence, *RHS* and *LHS* should be both either without | |
298 | *GC* or with the same one. | |
299 | ||
300 | * *Section*, just like a *GC*: *RHS* and *LHS* should be defined in the | |
301 | same section. | |
302 | ||
303 | * *Variable arguments*. *LHS* and *RHS* should be both either with or | |
304 | without *var-args*. | |
305 | ||
306 | * *Calling convention* should be the same. | |
307 | ||
308 | 2. Function type. Checked by ``FunctionComparator::cmpType(Type*, Type*)`` | |
309 | method. It checks return type and parameters type; the method itself will be | |
310 | described later. | |
311 | ||
312 | 3. Associate function formal parameters with each other. Then comparing function | |
313 | bodies, if we see the usage of *LHS*'s *i*-th argument in *LHS*'s body, then, | |
314 | we want to see usage of *RHS*'s *i*-th argument at the same place in *RHS*'s | |
315 | body, otherwise functions are different. On this stage we grant the preference | |
316 | to those we met later in function body (value we met first would be *less*). | |
317 | This is done by “``FunctionComparator::cmpValues(const Value*, const Value*)``” | |
318 | method (will be described a bit later). | |
319 | ||
320 | 4. Function body comparison. As it written in method comments: | |
321 | ||
322 | “We do a CFG-ordered walk since the actual ordering of the blocks in the linked | |
323 | list is immaterial. Our walk starts at the entry block for both functions, then | |
324 | takes each block from each terminator in order. As an artifact, this also means | |
325 | that unreachable blocks are ignored.” | |
326 | ||
327 | So, using this walk we get BBs from *left* and *right* in the same order, and | |
328 | compare them by “``FunctionComparator::compare(const BasicBlock*, const | |
329 | BasicBlock*)``” method. | |
330 | ||
331 | We also associate BBs with each other, like we did it with function formal | |
332 | arguments (see ``cmpValues`` method below). | |
333 | ||
334 | FunctionComparator::cmpType | |
335 | --------------------------- | |
336 | Consider how types comparison works. | |
337 | ||
338 | 1. Coerce pointer to integer. If left type is a pointer, try to coerce it to the | |
339 | integer type. It could be done if its address space is 0, or if address spaces | |
340 | are ignored at all. Do the same thing for the right type. | |
341 | ||
342 | 2. If left and right types are equal, return 0. Otherwise we need to give | |
343 | preference to one of them. So proceed to the next step. | |
344 | ||
345 | 3. If types are of different kind (different type IDs). Return result of type | |
346 | IDs comparison, treating them as a numbers (use ``cmpNumbers`` operation). | |
347 | ||
348 | 4. If types are vectors or integers, return result of their pointers comparison, | |
349 | comparing them as numbers. | |
350 | ||
351 | 5. Check whether type ID belongs to the next group (call it equivalent-group): | |
352 | ||
353 | * Void | |
354 | ||
355 | * Float | |
356 | ||
357 | * Double | |
358 | ||
359 | * X86_FP80 | |
360 | ||
361 | * FP128 | |
362 | ||
363 | * PPC_FP128 | |
364 | ||
365 | * Label | |
366 | ||
367 | * Metadata. | |
368 | ||
369 | If ID belongs to group above, return 0. Since it's enough to see that | |
370 | types has the same ``TypeID``. No additional information is required. | |
371 | ||
372 | 6. Left and right are pointers. Return result of address space comparison | |
373 | (numbers comparison). | |
374 | ||
375 | 7. Complex types (structures, arrays, etc.). Follow complex objects comparison | |
376 | technique (see the very first paragraph of this chapter). Both *left* and | |
377 | *right* are to be expanded and their element types will be checked the same | |
378 | way. If we get -1 or 1 on some stage, return it. Otherwise return 0. | |
379 | ||
380 | 8. Steps 1-6 describe all the possible cases, if we passed steps 1-6 and didn't | |
381 | get any conclusions, then invoke ``llvm_unreachable``, since it's quite | |
382 | unexpectable case. | |
383 | ||
384 | cmpValues(const Value*, const Value*) | |
385 | ------------------------------------- | |
386 | Method that compares local values. | |
387 | ||
388 | This method gives us an answer on a very curious quesion: whether we could treat | |
389 | local values as equal, and which value is greater otherwise. It's better to | |
390 | start from example: | |
391 | ||
392 | Consider situation when we're looking at the same place in left function "*FL*" | |
393 | and in right function "*FR*". And every part of *left* place is equal to the | |
394 | corresponding part of *right* place, and (!) both parts use *Value* instances, | |
395 | for example: | |
396 | ||
397 | .. code-block:: llvm | |
398 | ||
399 | instr0 i32 %LV ; left side, function FL | |
400 | instr0 i32 %RV ; right side, function FR | |
401 | ||
402 | So, now our conclusion depends on *Value* instances comparison. | |
403 | ||
404 | Main purpose of this method is to determine relation between such values. | |
405 | ||
406 | What we expect from equal functions? At the same place, in functions "*FL*" and | |
407 | "*FR*" we expect to see *equal* values, or values *defined* at the same place | |
408 | in "*FL*" and "*FR*". | |
409 | ||
410 | Consider small example here: | |
411 | ||
412 | .. code-block:: llvm | |
413 | ||
414 | define void %f(i32 %pf0, i32 %pf1) { | |
415 | instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123 | |
416 | } | |
417 | ||
418 | .. code-block:: llvm | |
419 | ||
420 | define void %g(i32 %pg0, i32 %pg1) { | |
421 | instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123 | |
422 | } | |
423 | ||
424 | In this example, *pf0* is associated with *pg0*, *pf1* is associated with *pg1*, | |
425 | and we also declare that *pf0* < *pf1*, and thus *pg0* < *pf1*. | |
426 | ||
427 | Instructions with opcode "*instr0*" would be *equal*, since their types and | |
428 | opcodes are equal, and values are *associated*. | |
429 | ||
430 | Instruction with opcode "*instr1*" from *f* is *greater* than instruction with | |
431 | opcode "*instr1*" from *g*; here we have equal types and opcodes, but "*pf1* is | |
432 | greater than "*pg0*". | |
433 | ||
434 | And instructions with opcode "*instr2*" are equal, because their opcodes and | |
435 | types are equal, and the same constant is used as a value. | |
436 | ||
437 | What we assiciate in cmpValues? | |
438 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
439 | * Function arguments. *i*-th argument from left function associated with | |
440 | *i*-th argument from right function. | |
441 | * BasicBlock instances. In basic-block enumeration loop we associate *i*-th | |
442 | BasicBlock from the left function with *i*-th BasicBlock from the right | |
443 | function. | |
444 | * Instructions. | |
445 | * Instruction operands. Note, we can meet *Value* here we have never seen | |
446 | before. In this case it is not a function argument, nor *BasicBlock*, nor | |
447 | *Instruction*. It is global value. It is constant, since its the only | |
448 | supposed global here. Method also compares: | |
449 | * Constants that are of the same type. | |
450 | * If right constant could be losslessly bit-casted to the left one, then we | |
451 | also compare them. | |
452 | ||
453 | How to implement cmpValues? | |
454 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
455 | *Association* is a case of equality for us. We just treat such values as equal. | |
456 | But, in general, we need to implement antisymmetric relation. As it was | |
457 | mentioned above, to understand what is *less*, we can use order in which we | |
458 | meet values. If both of values has the same order in function (met at the same | |
459 | time), then treat values as *associated*. Otherwise – it depends on who was | |
460 | first. | |
461 | ||
462 | Every time we run top-level compare method, we initialize two identical maps | |
463 | (one for the left side, another one for the right side): | |
464 | ||
465 | ``map<Value, int> sn_mapL, sn_mapR;`` | |
466 | ||
467 | The key of the map is the *Value* itself, the *value* – is its order (call it | |
468 | *serial number*). | |
469 | ||
470 | To add value *V* we need to perform the next procedure: | |
471 | ||
472 | ``sn_map.insert(std::make_pair(V, sn_map.size()));`` | |
473 | ||
474 | For the first *Value*, map will return *0*, for second *Value* map will return | |
475 | *1*, and so on. | |
476 | ||
477 | Then we can check whether left and right values met at the same time with simple | |
478 | comparison: | |
479 | ||
480 | ``cmpNumbers(sn_mapL[Left], sn_mapR[Right]);`` | |
481 | ||
482 | Of course, we can combine insertion and comparison: | |
483 | ||
484 | .. code-block:: c++ | |
485 | ||
486 | std::pair<iterator, bool> | |
487 | LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), RightRes | |
488 | = sn_mapR.insert(std::make_pair(Right, sn_mapR.size())); | |
489 | return cmpNumbers(LeftRes.first->second, RightRes.first->second); | |
490 | ||
491 | Let's look, how whole method could be implemented. | |
492 | ||
493 | 1. we have to start from the bad news. Consider function self and | |
494 | cross-referencing cases: | |
495 | ||
496 | .. code-block:: c++ | |
497 | ||
498 | // self-reference unsigned fact0(unsigned n) { return n > 1 ? n | |
499 | * fact0(n-1) : 1; } unsigned fact1(unsigned n) { return n > 1 ? n * | |
500 | fact1(n-1) : 1; } | |
501 | ||
502 | // cross-reference unsigned ping(unsigned n) { return n!= 0 ? pong(n-1) : 0; | |
503 | } unsigned pong(unsigned n) { return n!= 0 ? ping(n-1) : 0; } | |
504 | ||
505 | .. | |
506 | ||
507 | This comparison has been implemented in initial *MergeFunctions* pass | |
508 | version. But, unfortunately, it is not transitive. And this is the only case | |
509 | we can't convert to less-equal-greater comparison. It is a seldom case, 4-5 | |
510 | functions of 10000 (checked on test-suite), and, we hope, reader would | |
511 | forgive us for such a sacrifice in order to get the O(log(N)) pass time. | |
512 | ||
513 | 2. If left/right *Value* is a constant, we have to compare them. Return 0 if it | |
514 | is the same constant, or use ``cmpConstants`` method otherwise. | |
515 | ||
516 | 3. If left/right is *InlineAsm* instance. Return result of *Value* pointers | |
517 | comparison. | |
518 | ||
519 | 4. Explicit association of *L* (left value) and *R* (right value). We need to | |
520 | find out whether values met at the same time, and thus are *associated*. Or we | |
521 | need to put the rule: when we treat *L* < *R*. Now it is easy: just return | |
522 | result of numbers comparison: | |
523 | ||
524 | .. code-block:: c++ | |
525 | ||
526 | std::pair<iterator, bool> | |
527 | LeftRes = sn_mapL.insert(std::make_pair(Left, sn_mapL.size())), | |
528 | RightRes = sn_mapR.insert(std::make_pair(Right, sn_mapR.size())); | |
529 | if (LeftRes.first->second == RightRes.first->second) return 0; | |
530 | if (LeftRes.first->second < RightRes.first->second) return -1; | |
531 | return 1; | |
532 | ||
533 | Now when *cmpValues* returns 0, we can proceed comparison procedure. Otherwise, | |
534 | if we get (-1 or 1), we need to pass this result to the top level, and finish | |
535 | comparison procedure. | |
536 | ||
537 | cmpConstants | |
538 | ------------ | |
539 | Performs constants comparison as follows: | |
540 | ||
541 | 1. Compare constant types using ``cmpType`` method. If result is -1 or 1, goto | |
542 | step 2, otherwise proceed to step 3. | |
543 | ||
544 | 2. If types are different, we still can check whether constants could be | |
545 | losslessly bitcasted to each other. The further explanation is modification of | |
546 | ``canLosslesslyBitCastTo`` method. | |
547 | ||
548 | 2.1 Check whether constants are of the first class types | |
549 | (``isFirstClassType`` check): | |
550 | ||
551 | 2.1.1. If both constants are *not* of the first class type: return result | |
552 | of ``cmpType``. | |
553 | ||
554 | 2.1.2. Otherwise, if left type is not of the first class, return -1. If | |
555 | right type is not of the first class, return 1. | |
556 | ||
557 | 2.1.3. If both types are of the first class type, proceed to the next step | |
558 | (2.1.3.1). | |
559 | ||
560 | 2.1.3.1. If types are vectors, compare their bitwidth using the | |
561 | *cmpNumbers*. If result is not 0, return it. | |
562 | ||
563 | 2.1.3.2. Different types, but not a vectors: | |
564 | ||
565 | * if both of them are pointers, good for us, we can proceed to step 3. | |
566 | * if one of types is pointer, return result of *isPointer* flags | |
567 | comparison (*cmpFlags* operation). | |
568 | * otherwise we have no methods to prove bitcastability, and thus return | |
569 | result of types comparison (-1 or 1). | |
570 | ||
571 | Steps below are for the case when types are equal, or case when constants are | |
572 | bitcastable: | |
573 | ||
574 | 3. One of constants is a "*null*" value. Return the result of | |
575 | ``cmpFlags(L->isNullValue, R->isNullValue)`` comparison. | |
576 | ||
577 | 4. Compare value IDs, and return result if it is not 0: | |
578 | ||
579 | .. code-block:: c++ | |
580 | ||
581 | if (int Res = cmpNumbers(L->getValueID(), R->getValueID())) | |
582 | return Res; | |
583 | ||
584 | 5. Compare the contents of constants. The comparison depends on kind of | |
585 | constants, but on this stage it is just a lexicographical comparison. Just see | |
586 | how it was described in the beginning of "*Functions comparison*" paragraph. | |
587 | Mathematically it is equal to the next case: we encode left constant and right | |
588 | constant (with similar way *bitcode-writer* does). Then compare left code | |
589 | sequence and right code sequence. | |
590 | ||
591 | compare(const BasicBlock*, const BasicBlock*) | |
592 | --------------------------------------------- | |
593 | Compares two *BasicBlock* instances. | |
594 | ||
595 | It enumerates instructions from left *BB* and right *BB*. | |
596 | ||
597 | 1. It assigns serial numbers to the left and right instructions, using | |
598 | ``cmpValues`` method. | |
599 | ||
600 | 2. If one of left or right is *GEP* (``GetElementPtr``), then treat *GEP* as | |
601 | greater than other instructions, if both instructions are *GEPs* use ``cmpGEP`` | |
602 | method for comparison. If result is -1 or 1, pass it to the top-level | |
603 | comparison (return it). | |
604 | ||
605 | 3.1. Compare operations. Call ``cmpOperation`` method. If result is -1 or | |
606 | 1, return it. | |
607 | ||
608 | 3.2. Compare number of operands, if result is -1 or 1, return it. | |
609 | ||
610 | 3.3. Compare operands themselves, use ``cmpValues`` method. Return result | |
611 | if it is -1 or 1. | |
612 | ||
613 | 3.4. Compare type of operands, using ``cmpType`` method. Return result if | |
614 | it is -1 or 1. | |
615 | ||
616 | 3.5. Proceed to the next instruction. | |
617 | ||
618 | 4. We can finish instruction enumeration in 3 cases: | |
619 | ||
620 | 4.1. We reached the end of both left and right basic-blocks. We didn't | |
621 | exit on steps 1-3, so contents is equal, return 0. | |
622 | ||
623 | 4.2. We have reached the end of the left basic-block. Return -1. | |
624 | ||
625 | 4.3. Return 1 (the end of the right basic block). | |
626 | ||
627 | cmpGEP | |
628 | ------ | |
629 | Compares two GEPs (``getelementptr`` instructions). | |
630 | ||
631 | It differs from regular operations comparison with the only thing: possibility | |
632 | to use ``accumulateConstantOffset`` method. | |
633 | ||
634 | So, if we get constant offset for both left and right *GEPs*, then compare it as | |
635 | numbers, and return comparison result. | |
636 | ||
637 | Otherwise treat it like a regular operation (see previous paragraph). | |
638 | ||
639 | cmpOperation | |
640 | ------------ | |
641 | Compares instruction opcodes and some important operation properties. | |
642 | ||
643 | 1. Compare opcodes, if it differs return the result. | |
644 | ||
645 | 2. Compare number of operands. If it differs – return the result. | |
646 | ||
647 | 3. Compare operation types, use *cmpType*. All the same – if types are | |
648 | different, return result. | |
649 | ||
650 | 4. Compare *subclassOptionalData*, get it with ``getRawSubclassOptionalData`` | |
651 | method, and compare it like a numbers. | |
652 | ||
653 | 5. Compare operand types. | |
654 | ||
655 | 6. For some particular instructions check equivalence (relation in our case) of | |
656 | some significant attributes. For example we have to compare alignment for | |
657 | ``load`` instructions. | |
658 | ||
659 | O(log(N)) | |
660 | --------- | |
661 | Methods described above implement order relationship. And latter, could be used | |
662 | for nodes comparison in a binary tree. So we can organize functions set into | |
663 | the binary tree and reduce the cost of lookup procedure from | |
664 | O(N*N) to O(log(N)). | |
665 | ||
666 | Merging process, mergeTwoFunctions | |
667 | ================================== | |
668 | Once *MergeFunctions* detected that current function (*G*) is equal to one that | |
669 | were analyzed before (function *F*) it calls ``mergeTwoFunctions(Function*, | |
670 | Function*)``. | |
671 | ||
672 | Operation affects ``FnTree`` contents with next way: *F* will stay in | |
673 | ``FnTree``. *G* being equal to *F* will not be added to ``FnTree``. Calls of | |
674 | *G* would be replaced with something else. It changes bodies of callers. So, | |
675 | functions that calls *G* would be put into ``Deferred`` set and removed from | |
676 | ``FnTree``, and analyzed again. | |
677 | ||
678 | The approach is next: | |
679 | ||
680 | 1. Most wished case: when we can use alias and both of *F* and *G* are weak. We | |
681 | make both of them with aliases to the third strong function *H*. Actually *H* | |
682 | is *F*. See below how it's made (but it's better to look straight into the | |
683 | source code). Well, this is a case when we can just replace *G* with *F* | |
684 | everywhere, we use ``replaceAllUsesWith`` operation here (*RAUW*). | |
685 | ||
686 | 2. *F* could not be overridden, while *G* could. It would be good to do the | |
687 | next: after merging the places where overridable function were used, still use | |
688 | overridable stub. So try to make *G* alias to *F*, or create overridable tail | |
689 | call wrapper around *F* and replace *G* with that call. | |
690 | ||
691 | 3. Neither *F* nor *G* could be overridden. We can't use *RAUW*. We can just | |
692 | change the callers: call *F* instead of *G*. That's what | |
693 | ``replaceDirectCallers`` does. | |
694 | ||
695 | Below is detailed body description. | |
696 | ||
697 | If “F” may be overridden | |
698 | ------------------------ | |
699 | As follows from ``mayBeOverridden`` comments: “whether the definition of this | |
700 | global may be replaced by something non-equivalent at link time”. If so, thats | |
701 | ok: we can use alias to *F* instead of *G* or change call instructions itself. | |
702 | ||
703 | HasGlobalAliases, removeUsers | |
704 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
705 | First consider the case when we have global aliases of one function name to | |
706 | another. Our purpose is make both of them with aliases to the third strong | |
707 | function. Though if we keep *F* alive and without major changes we can leave it | |
708 | in ``FnTree``. Try to combine these two goals. | |
709 | ||
710 | Do stub replacement of *F* itself with an alias to *F*. | |
711 | ||
712 | 1. Create stub function *H*, with the same name and attributes like function | |
713 | *F*. It takes maximum alignment of *F* and *G*. | |
714 | ||
715 | 2. Replace all uses of function *F* with uses of function *H*. It is the two | |
716 | steps procedure instead. First of all, we must take into account, all functions | |
717 | from whom *F* is called would be changed: since we change the call argument | |
718 | (from *F* to *H*). If so we must to review these caller functions again after | |
719 | this procedure. We remove callers from ``FnTree``, method with name | |
720 | ``removeUsers(F)`` does that (don't confuse with ``replaceAllUsesWith``): | |
721 | ||
722 | 2.1. ``Inside removeUsers(Value* | |
723 | V)`` we go through the all values that use value *V* (or *F* in our context). | |
724 | If value is instruction, we go to function that holds this instruction and | |
725 | mark it as to-be-analyzed-again (put to ``Deferred`` set), we also remove | |
726 | caller from ``FnTree``. | |
727 | ||
728 | 2.2. Now we can do the replacement: call ``F->replaceAllUsesWith(H)``. | |
729 | ||
730 | 3. *H* (that now "officially" plays *F*'s role) is replaced with alias to *F*. | |
731 | Do the same with *G*: replace it with alias to *F*. So finally everywhere *F* | |
732 | was used, we use *H* and it is alias to *F*, and everywhere *G* was used we | |
733 | also have alias to *F*. | |
734 | ||
735 | 4. Set *F* linkage to private. Make it strong :-) | |
736 | ||
737 | No global aliases, replaceDirectCallers | |
738 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
739 | If global aliases are not supported. We call ``replaceDirectCallers`` then. Just | |
740 | go through all calls of *G* and replace it with calls of *F*. If you look into | |
741 | method you will see that it scans all uses of *G* too, and if use is callee (if | |
742 | user is call instruction and *G* is used as what to be called), we replace it | |
743 | with use of *F*. | |
744 | ||
745 | If “F” could not be overridden, fix it! | |
746 | """"""""""""""""""""""""""""""""""""""" | |
747 | ||
748 | We call ``writeThunkOrAlias(Function *F, Function *G)``. Here we try to replace | |
749 | *G* with alias to *F* first. Next conditions are essential: | |
750 | ||
751 | * target should support global aliases, | |
752 | * the address itself of *G* should be not significant, not named and not | |
753 | referenced anywhere, | |
754 | * function should come with external, local or weak linkage. | |
755 | ||
756 | Otherwise we write thunk: some wrapper that has *G's* interface and calls *F*, | |
757 | so *G* could be replaced with this wrapper. | |
758 | ||
759 | *writeAlias* | |
760 | ||
761 | As follows from *llvm* reference: | |
762 | ||
763 | “Aliases act as *second name* for the aliasee value”. So we just want to create | |
764 | second name for *F* and use it instead of *G*: | |
765 | ||
766 | 1. create global alias itself (*GA*), | |
767 | ||
768 | 2. adjust alignment of *F* so it must be maximum of current and *G's* alignment; | |
769 | ||
770 | 3. replace uses of *G*: | |
771 | ||
772 | 3.1. first mark all callers of *G* as to-be-analyzed-again, using | |
773 | ``removeUsers`` method (see chapter above), | |
774 | ||
775 | 3.2. call ``G->replaceAllUsesWith(GA)``. | |
776 | ||
777 | 4. Get rid of *G*. | |
778 | ||
779 | *writeThunk* | |
780 | ||
781 | As it written in method comments: | |
782 | ||
783 | “Replace G with a simple tail call to bitcast(F). Also replace direct uses of G | |
784 | with bitcast(F). Deletes G.” | |
785 | ||
786 | In general it does the same as usual when we want to replace callee, except the | |
787 | first point: | |
788 | ||
789 | 1. We generate tail call wrapper around *F*, but with interface that allows use | |
790 | it instead of *G*. | |
791 | ||
792 | 2. “As-usual”: ``removeUsers`` and ``replaceAllUsesWith`` then. | |
793 | ||
794 | 3. Get rid of *G*. | |
795 | ||
796 | That's it. | |
797 | ========== | |
798 | We have described how to detect equal functions, and how to merge them, and in | |
799 | first chapter we have described how it works all-together. Author hopes, reader | |
800 | have some picture from now, and it helps him improve and debug this pass. | |
801 | ||
802 | Reader is welcomed to send us any questions and proposals ;-) |