X-Git-Url: https://git.proxmox.com/?p=mirror_edk2.git;a=blobdiff_plain;f=AppPkg%2FApplications%2FPython%2FPython-2.7.10%2FObjects%2Fdictnotes.txt;fp=AppPkg%2FApplications%2FPython%2FPython-2.7.10%2FObjects%2Fdictnotes.txt;h=0000000000000000000000000000000000000000;hp=60a04c85f7294f9ebc48caee89f815637f59a371;hb=964f432b9b0afe103c41c7613fade3e699118afe;hpb=e2d3a25f1a3135221a9c8061e1b8f90245d727eb diff --git a/AppPkg/Applications/Python/Python-2.7.10/Objects/dictnotes.txt b/AppPkg/Applications/Python/Python-2.7.10/Objects/dictnotes.txt deleted file mode 100644 index 60a04c85f7..0000000000 --- a/AppPkg/Applications/Python/Python-2.7.10/Objects/dictnotes.txt +++ /dev/null @@ -1,270 +0,0 @@ -NOTES ON OPTIMIZING DICTIONARIES -================================ - - -Principal Use Cases for Dictionaries ------------------------------------- - -Passing keyword arguments - Typically, one read and one write for 1 to 3 elements. - Occurs frequently in normal python code. - -Class method lookup - Dictionaries vary in size with 8 to 16 elements being common. - Usually written once with many lookups. - When base classes are used, there are many failed lookups - followed by a lookup in a base class. - -Instance attribute lookup and Global variables - Dictionaries vary in size. 4 to 10 elements are common. - Both reads and writes are common. - -Builtins - Frequent reads. Almost never written. - Size 126 interned strings (as of Py2.3b1). - A few keys are accessed much more frequently than others. - -Uniquification - Dictionaries of any size. Bulk of work is in creation. - Repeated writes to a smaller set of keys. - Single read of each key. - Some use cases have two consecutive accesses to the same key. - - * Removing duplicates from a sequence. - dict.fromkeys(seqn).keys() - - * Counting elements in a sequence. - for e in seqn: - d[e] = d.get(e,0) + 1 - - * Accumulating references in a dictionary of lists: - - for pagenumber, page in enumerate(pages): - for word in page: - d.setdefault(word, []).append(pagenumber) - - Note, the second example is a use case characterized by a get and set - to the same key. There are similar use cases with a __contains__ - followed by a get, set, or del to the same key. Part of the - justification for d.setdefault is combining the two lookups into one. - -Membership Testing - Dictionaries of any size. Created once and then rarely changes. - Single write to each key. - Many calls to __contains__() or has_key(). - Similar access patterns occur with replacement dictionaries - such as with the % formatting operator. - -Dynamic Mappings - Characterized by deletions interspersed with adds and replacements. - Performance benefits greatly from the re-use of dummy entries. - - -Data Layout (assuming a 32-bit box with 64 bytes per cache line) ----------------------------------------------------------------- - -Smalldicts (8 entries) are attached to the dictobject structure -and the whole group nearly fills two consecutive cache lines. - -Larger dicts use the first half of the dictobject structure (one cache -line) and a separate, continuous block of entries (at 12 bytes each -for a total of 5.333 entries per cache line). - - -Tunable Dictionary Parameters ------------------------------ - -* PyDict_MINSIZE. Currently set to 8. - Must be a power of two. New dicts have to zero-out every cell. - Each additional 8 consumes 1.5 cache lines. Increasing improves - the sparseness of small dictionaries but costs time to read in - the additional cache lines if they are not already in cache. - That case is common when keyword arguments are passed. - -* Maximum dictionary load in PyDict_SetItem. Currently set to 2/3. - Increasing this ratio makes dictionaries more dense resulting - in more collisions. Decreasing it improves sparseness at the - expense of spreading entries over more cache lines and at the - cost of total memory consumed. - - The load test occurs in highly time sensitive code. Efforts - to make the test more complex (for example, varying the load - for different sizes) have degraded performance. - -* Growth rate upon hitting maximum load. Currently set to *2. - Raising this to *4 results in half the number of resizes, - less effort to resize, better sparseness for some (but not - all dict sizes), and potentially doubles memory consumption - depending on the size of the dictionary. Setting to *4 - eliminates every other resize step. - -* Maximum sparseness (minimum dictionary load). What percentage - of entries can be unused before the dictionary shrinks to - free up memory and speed up iteration? (The current CPython - code does not represent this parameter directly.) - -* Shrinkage rate upon exceeding maximum sparseness. The current - CPython code never even checks sparseness when deleting a - key. When a new key is added, it resizes based on the number - of active keys, so that the addition may trigger shrinkage - rather than growth. - -Tune-ups should be measured across a broad range of applications and -use cases. A change to any parameter will help in some situations and -hurt in others. The key is to find settings that help the most common -cases and do the least damage to the less common cases. Results will -vary dramatically depending on the exact number of keys, whether the -keys are all strings, whether reads or writes dominate, the exact -hash values of the keys (some sets of values have fewer collisions than -others). Any one test or benchmark is likely to prove misleading. - -While making a dictionary more sparse reduces collisions, it impairs -iteration and key listing. Those methods loop over every potential -entry. Doubling the size of dictionary results in twice as many -non-overlapping memory accesses for keys(), items(), values(), -__iter__(), iterkeys(), iteritems(), itervalues(), and update(). -Also, every dictionary iterates at least twice, once for the memset() -when it is created and once by dealloc(). - -Dictionary operations involving only a single key can be O(1) unless -resizing is possible. By checking for a resize only when the -dictionary can grow (and may *require* resizing), other operations -remain O(1), and the odds of resize thrashing or memory fragmentation -are reduced. In particular, an algorithm that empties a dictionary -by repeatedly invoking .pop will see no resizing, which might -not be necessary at all because the dictionary is eventually -discarded entirely. - - -Results of Cache Locality Experiments -------------------------------------- - -When an entry is retrieved from memory, 4.333 adjacent entries are also -retrieved into a cache line. Since accessing items in cache is *much* -cheaper than a cache miss, an enticing idea is to probe the adjacent -entries as a first step in collision resolution. Unfortunately, the -introduction of any regularity into collision searches results in more -collisions than the current random chaining approach. - -Exploiting cache locality at the expense of additional collisions fails -to payoff when the entries are already loaded in cache (the expense -is paid with no compensating benefit). This occurs in small dictionaries -where the whole dictionary fits into a pair of cache lines. It also -occurs frequently in large dictionaries which have a common access pattern -where some keys are accessed much more frequently than others. The -more popular entries *and* their collision chains tend to remain in cache. - -To exploit cache locality, change the collision resolution section -in lookdict() and lookdict_string(). Set i^=1 at the top of the -loop and move the i = (i << 2) + i + perturb + 1 to an unrolled -version of the loop. - -This optimization strategy can be leveraged in several ways: - -* If the dictionary is kept sparse (through the tunable parameters), -then the occurrence of additional collisions is lessened. - -* If lookdict() and lookdict_string() are specialized for small dicts -and for largedicts, then the versions for large_dicts can be given -an alternate search strategy without increasing collisions in small dicts -which already have the maximum benefit of cache locality. - -* If the use case for a dictionary is known to have a random key -access pattern (as opposed to a more common pattern with a Zipf's law -distribution), then there will be more benefit for large dictionaries -because any given key is no more likely than another to already be -in cache. - -* In use cases with paired accesses to the same key, the second access -is always in cache and gets no benefit from efforts to further improve -cache locality. - -Optimizing the Search of Small Dictionaries -------------------------------------------- - -If lookdict() and lookdict_string() are specialized for smaller dictionaries, -then a custom search approach can be implemented that exploits the small -search space and cache locality. - -* The simplest example is a linear search of contiguous entries. This is - simple to implement, guaranteed to terminate rapidly, never searches - the same entry twice, and precludes the need to check for dummy entries. - -* A more advanced example is a self-organizing search so that the most - frequently accessed entries get probed first. The organization - adapts if the access pattern changes over time. Treaps are ideally - suited for self-organization with the most common entries at the - top of the heap and a rapid binary search pattern. Most probes and - results are all located at the top of the tree allowing them all to - be located in one or two cache lines. - -* Also, small dictionaries may be made more dense, perhaps filling all - eight cells to take the maximum advantage of two cache lines. - - -Strategy Pattern ----------------- - -Consider allowing the user to set the tunable parameters or to select a -particular search method. Since some dictionary use cases have known -sizes and access patterns, the user may be able to provide useful hints. - -1) For example, if membership testing or lookups dominate runtime and memory - is not at a premium, the user may benefit from setting the maximum load - ratio at 5% or 10% instead of the usual 66.7%. This will sharply - curtail the number of collisions but will increase iteration time. - The builtin namespace is a prime example of a dictionary that can - benefit from being highly sparse. - -2) Dictionary creation time can be shortened in cases where the ultimate - size of the dictionary is known in advance. The dictionary can be - pre-sized so that no resize operations are required during creation. - Not only does this save resizes, but the key insertion will go - more quickly because the first half of the keys will be inserted into - a more sparse environment than before. The preconditions for this - strategy arise whenever a dictionary is created from a key or item - sequence and the number of *unique* keys is known. - -3) If the key space is large and the access pattern is known to be random, - then search strategies exploiting cache locality can be fruitful. - The preconditions for this strategy arise in simulations and - numerical analysis. - -4) If the keys are fixed and the access pattern strongly favors some of - the keys, then the entries can be stored contiguously and accessed - with a linear search or treap. This exploits knowledge of the data, - cache locality, and a simplified search routine. It also eliminates - the need to test for dummy entries on each probe. The preconditions - for this strategy arise in symbol tables and in the builtin dictionary. - - -Readonly Dictionaries ---------------------- -Some dictionary use cases pass through a build stage and then move to a -more heavily exercised lookup stage with no further changes to the -dictionary. - -An idea that emerged on python-dev is to be able to convert a dictionary -to a read-only state. This can help prevent programming errors and also -provide knowledge that can be exploited for lookup optimization. - -The dictionary can be immediately rebuilt (eliminating dummy entries), -resized (to an appropriate level of sparseness), and the keys can be -jostled (to minimize collisions). The lookdict() routine can then -eliminate the test for dummy entries (saving about 1/4 of the time -spent in the collision resolution loop). - -An additional possibility is to insert links into the empty spaces -so that dictionary iteration can proceed in len(d) steps instead of -(mp->mask + 1) steps. Alternatively, a separate tuple of keys can be -kept just for iteration. - - -Caching Lookups ---------------- -The idea is to exploit key access patterns by anticipating future lookups -based on previous lookups. - -The simplest incarnation is to save the most recently accessed entry. -This gives optimal performance for use cases where every get is followed -by a set or del to the same key.