]>
Commit | Line | Data |
---|---|---|
4710c53d | 1 | ________________________________________________________________________\r |
2 | \r | |
3 | PYBENCH - A Python Benchmark Suite\r | |
4 | ________________________________________________________________________\r | |
5 | \r | |
6 | Extendable suite of of low-level benchmarks for measuring\r | |
7 | the performance of the Python implementation \r | |
8 | (interpreter, compiler or VM).\r | |
9 | \r | |
10 | pybench is a collection of tests that provides a standardized way to\r | |
11 | measure the performance of Python implementations. It takes a very\r | |
12 | close look at different aspects of Python programs and let's you\r | |
13 | decide which factors are more important to you than others, rather\r | |
14 | than wrapping everything up in one number, like the other performance\r | |
15 | tests do (e.g. pystone which is included in the Python Standard\r | |
16 | Library).\r | |
17 | \r | |
18 | pybench has been used in the past by several Python developers to\r | |
19 | track down performance bottlenecks or to demonstrate the impact of\r | |
20 | optimizations and new features in Python.\r | |
21 | \r | |
22 | The command line interface for pybench is the file pybench.py. Run\r | |
23 | this script with option '--help' to get a listing of the possible\r | |
24 | options. Without options, pybench will simply execute the benchmark\r | |
25 | and then print out a report to stdout.\r | |
26 | \r | |
27 | \r | |
28 | Micro-Manual\r | |
29 | ------------\r | |
30 | \r | |
31 | Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run\r | |
32 | the benchmark suite using default settings and 'pybench.py -f <file>'\r | |
33 | to have it store the results in a file too.\r | |
34 | \r | |
35 | It is usually a good idea to run pybench.py multiple times to see\r | |
36 | whether the environment, timers and benchmark run-times are suitable\r | |
37 | for doing benchmark tests. \r | |
38 | \r | |
39 | You can use the comparison feature of pybench.py ('pybench.py -c\r | |
40 | <file>') to check how well the system behaves in comparison to a\r | |
41 | reference run. \r | |
42 | \r | |
43 | If the differences are well below 10% for each test, then you have a\r | |
44 | system that is good for doing benchmark testings. Of you get random\r | |
45 | differences of more than 10% or significant differences between the\r | |
46 | values for minimum and average time, then you likely have some\r | |
47 | background processes running which cause the readings to become\r | |
48 | inconsistent. Examples include: web-browsers, email clients, RSS\r | |
49 | readers, music players, backup programs, etc.\r | |
50 | \r | |
51 | If you are only interested in a few tests of the whole suite, you can\r | |
52 | use the filtering option, e.g. 'pybench.py -t string' will only\r | |
53 | run/show the tests that have 'string' in their name.\r | |
54 | \r | |
55 | This is the current output of pybench.py --help:\r | |
56 | \r | |
57 | """\r | |
58 | ------------------------------------------------------------------------\r | |
59 | PYBENCH - a benchmark test suite for Python interpreters/compilers.\r | |
60 | ------------------------------------------------------------------------\r | |
61 | \r | |
62 | Synopsis:\r | |
63 | pybench.py [option] files...\r | |
64 | \r | |
65 | Options and default settings:\r | |
66 | -n arg number of rounds (10)\r | |
67 | -f arg save benchmark to file arg ()\r | |
68 | -c arg compare benchmark with the one in file arg ()\r | |
69 | -s arg show benchmark in file arg, then exit ()\r | |
70 | -w arg set warp factor to arg (10)\r | |
71 | -t arg run only tests with names matching arg ()\r | |
72 | -C arg set the number of calibration runs to arg (20)\r | |
73 | -d hide noise in comparisons (0)\r | |
74 | -v verbose output (not recommended) (0)\r | |
75 | --with-gc enable garbage collection (0)\r | |
76 | --with-syscheck use default sys check interval (0)\r | |
77 | --timer arg use given timer (time.time)\r | |
78 | -h show this help text\r | |
79 | --help show this help text\r | |
80 | --debug enable debugging\r | |
81 | --copyright show copyright\r | |
82 | --examples show examples of usage\r | |
83 | \r | |
84 | Version:\r | |
85 | 2.0\r | |
86 | \r | |
87 | The normal operation is to run the suite and display the\r | |
88 | results. Use -f to save them for later reuse or comparisons.\r | |
89 | \r | |
90 | Available timers:\r | |
91 | \r | |
92 | time.time\r | |
93 | time.clock\r | |
94 | systimes.processtime\r | |
95 | \r | |
96 | Examples:\r | |
97 | \r | |
98 | python2.1 pybench.py -f p21.pybench\r | |
99 | python2.5 pybench.py -f p25.pybench\r | |
100 | python pybench.py -s p25.pybench -c p21.pybench\r | |
101 | """\r | |
102 | \r | |
103 | License\r | |
104 | -------\r | |
105 | \r | |
106 | See LICENSE file.\r | |
107 | \r | |
108 | \r | |
109 | Sample output\r | |
110 | -------------\r | |
111 | \r | |
112 | """\r | |
113 | -------------------------------------------------------------------------------\r | |
114 | PYBENCH 2.0\r | |
115 | -------------------------------------------------------------------------------\r | |
116 | * using Python 2.4.2\r | |
117 | * disabled garbage collection\r | |
118 | * system check interval set to maximum: 2147483647\r | |
119 | * using timer: time.time\r | |
120 | \r | |
121 | Calibrating tests. Please wait...\r | |
122 | \r | |
123 | Running 10 round(s) of the suite at warp factor 10:\r | |
124 | \r | |
125 | * Round 1 done in 6.388 seconds.\r | |
126 | * Round 2 done in 6.485 seconds.\r | |
127 | * Round 3 done in 6.786 seconds.\r | |
128 | ...\r | |
129 | * Round 10 done in 6.546 seconds.\r | |
130 | \r | |
131 | -------------------------------------------------------------------------------\r | |
132 | Benchmark: 2006-06-12 12:09:25\r | |
133 | -------------------------------------------------------------------------------\r | |
134 | \r | |
135 | Rounds: 10\r | |
136 | Warp: 10\r | |
137 | Timer: time.time\r | |
138 | \r | |
139 | Machine Details:\r | |
140 | Platform ID: Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64\r | |
141 | Processor: x86_64\r | |
142 | \r | |
143 | Python:\r | |
144 | Executable: /usr/local/bin/python\r | |
145 | Version: 2.4.2\r | |
146 | Compiler: GCC 3.3.4 (pre 3.3.5 20040809)\r | |
147 | Bits: 64bit\r | |
148 | Build: Oct 1 2005 15:24:35 (#1)\r | |
149 | Unicode: UCS2\r | |
150 | \r | |
151 | \r | |
152 | Test minimum average operation overhead\r | |
153 | -------------------------------------------------------------------------------\r | |
154 | BuiltinFunctionCalls: 126ms 145ms 0.28us 0.274ms\r | |
155 | BuiltinMethodLookup: 124ms 130ms 0.12us 0.316ms\r | |
156 | CompareFloats: 109ms 110ms 0.09us 0.361ms\r | |
157 | CompareFloatsIntegers: 100ms 104ms 0.12us 0.271ms\r | |
158 | CompareIntegers: 137ms 138ms 0.08us 0.542ms\r | |
159 | CompareInternedStrings: 124ms 127ms 0.08us 1.367ms\r | |
160 | CompareLongs: 100ms 104ms 0.10us 0.316ms\r | |
161 | CompareStrings: 111ms 115ms 0.12us 0.929ms\r | |
162 | CompareUnicode: 108ms 128ms 0.17us 0.693ms\r | |
163 | ConcatStrings: 142ms 155ms 0.31us 0.562ms\r | |
164 | ConcatUnicode: 119ms 127ms 0.42us 0.384ms\r | |
165 | CreateInstances: 123ms 128ms 1.14us 0.367ms\r | |
166 | CreateNewInstances: 121ms 126ms 1.49us 0.335ms\r | |
167 | CreateStringsWithConcat: 130ms 135ms 0.14us 0.916ms\r | |
168 | CreateUnicodeWithConcat: 130ms 135ms 0.34us 0.361ms\r | |
169 | DictCreation: 108ms 109ms 0.27us 0.361ms\r | |
170 | DictWithFloatKeys: 149ms 153ms 0.17us 0.678ms\r | |
171 | DictWithIntegerKeys: 124ms 126ms 0.11us 0.915ms\r | |
172 | DictWithStringKeys: 114ms 117ms 0.10us 0.905ms\r | |
173 | ForLoops: 110ms 111ms 4.46us 0.063ms\r | |
174 | IfThenElse: 118ms 119ms 0.09us 0.685ms\r | |
175 | ListSlicing: 116ms 120ms 8.59us 0.103ms\r | |
176 | NestedForLoops: 125ms 137ms 0.09us 0.019ms\r | |
177 | NormalClassAttribute: 124ms 136ms 0.11us 0.457ms\r | |
178 | NormalInstanceAttribute: 110ms 117ms 0.10us 0.454ms\r | |
179 | PythonFunctionCalls: 107ms 113ms 0.34us 0.271ms\r | |
180 | PythonMethodCalls: 140ms 149ms 0.66us 0.141ms\r | |
181 | Recursion: 156ms 166ms 3.32us 0.452ms\r | |
182 | SecondImport: 112ms 118ms 1.18us 0.180ms\r | |
183 | SecondPackageImport: 118ms 127ms 1.27us 0.180ms\r | |
184 | SecondSubmoduleImport: 140ms 151ms 1.51us 0.180ms\r | |
185 | SimpleComplexArithmetic: 128ms 139ms 0.16us 0.361ms\r | |
186 | SimpleDictManipulation: 134ms 136ms 0.11us 0.452ms\r | |
187 | SimpleFloatArithmetic: 110ms 113ms 0.09us 0.571ms\r | |
188 | SimpleIntFloatArithmetic: 106ms 111ms 0.08us 0.548ms\r | |
189 | SimpleIntegerArithmetic: 106ms 109ms 0.08us 0.544ms\r | |
190 | SimpleListManipulation: 103ms 113ms 0.10us 0.587ms\r | |
191 | SimpleLongArithmetic: 112ms 118ms 0.18us 0.271ms\r | |
192 | SmallLists: 105ms 116ms 0.17us 0.366ms\r | |
193 | SmallTuples: 108ms 128ms 0.24us 0.406ms\r | |
194 | SpecialClassAttribute: 119ms 136ms 0.11us 0.453ms\r | |
195 | SpecialInstanceAttribute: 143ms 155ms 0.13us 0.454ms\r | |
196 | StringMappings: 115ms 121ms 0.48us 0.405ms\r | |
197 | StringPredicates: 120ms 129ms 0.18us 2.064ms\r | |
198 | StringSlicing: 111ms 127ms 0.23us 0.781ms\r | |
199 | TryExcept: 125ms 126ms 0.06us 0.681ms\r | |
200 | TryRaiseExcept: 133ms 137ms 2.14us 0.361ms\r | |
201 | TupleSlicing: 117ms 120ms 0.46us 0.066ms\r | |
202 | UnicodeMappings: 156ms 160ms 4.44us 0.429ms\r | |
203 | UnicodePredicates: 117ms 121ms 0.22us 2.487ms\r | |
204 | UnicodeProperties: 115ms 153ms 0.38us 2.070ms\r | |
205 | UnicodeSlicing: 126ms 129ms 0.26us 0.689ms\r | |
206 | -------------------------------------------------------------------------------\r | |
207 | Totals: 6283ms 6673ms\r | |
208 | """\r | |
209 | ________________________________________________________________________\r | |
210 | \r | |
211 | Writing New Tests\r | |
212 | ________________________________________________________________________\r | |
213 | \r | |
214 | pybench tests are simple modules defining one or more pybench.Test\r | |
215 | subclasses.\r | |
216 | \r | |
217 | Writing a test essentially boils down to providing two methods:\r | |
218 | .test() which runs .rounds number of .operations test operations each\r | |
219 | and .calibrate() which does the same except that it doesn't actually\r | |
220 | execute the operations.\r | |
221 | \r | |
222 | \r | |
223 | Here's an example:\r | |
224 | ------------------\r | |
225 | \r | |
226 | from pybench import Test\r | |
227 | \r | |
228 | class IntegerCounting(Test):\r | |
229 | \r | |
230 | # Version number of the test as float (x.yy); this is important\r | |
231 | # for comparisons of benchmark runs - tests with unequal version\r | |
232 | # number will not get compared.\r | |
233 | version = 1.0\r | |
234 | \r | |
235 | # The number of abstract operations done in each round of the\r | |
236 | # test. An operation is the basic unit of what you want to\r | |
237 | # measure. The benchmark will output the amount of run-time per\r | |
238 | # operation. Note that in order to raise the measured timings\r | |
239 | # significantly above noise level, it is often required to repeat\r | |
240 | # sets of operations more than once per test round. The measured\r | |
241 | # overhead per test round should be less than 1 second.\r | |
242 | operations = 20\r | |
243 | \r | |
244 | # Number of rounds to execute per test run. This should be\r | |
245 | # adjusted to a figure that results in a test run-time of between\r | |
246 | # 1-2 seconds (at warp 1).\r | |
247 | rounds = 100000\r | |
248 | \r | |
249 | def test(self):\r | |
250 | \r | |
251 | """ Run the test.\r | |
252 | \r | |
253 | The test needs to run self.rounds executing\r | |
254 | self.operations number of operations each.\r | |
255 | \r | |
256 | """\r | |
257 | # Init the test\r | |
258 | a = 1\r | |
259 | \r | |
260 | # Run test rounds\r | |
261 | #\r | |
262 | # NOTE: Use xrange() for all test loops unless you want to face\r | |
263 | # a 20MB process !\r | |
264 | #\r | |
265 | for i in xrange(self.rounds):\r | |
266 | \r | |
267 | # Repeat the operations per round to raise the run-time\r | |
268 | # per operation significantly above the noise level of the\r | |
269 | # for-loop overhead. \r | |
270 | \r | |
271 | # Execute 20 operations (a += 1):\r | |
272 | a += 1\r | |
273 | a += 1\r | |
274 | a += 1\r | |
275 | a += 1\r | |
276 | a += 1\r | |
277 | a += 1\r | |
278 | a += 1\r | |
279 | a += 1\r | |
280 | a += 1\r | |
281 | a += 1\r | |
282 | a += 1\r | |
283 | a += 1\r | |
284 | a += 1\r | |
285 | a += 1\r | |
286 | a += 1\r | |
287 | a += 1\r | |
288 | a += 1\r | |
289 | a += 1\r | |
290 | a += 1\r | |
291 | a += 1\r | |
292 | \r | |
293 | def calibrate(self):\r | |
294 | \r | |
295 | """ Calibrate the test.\r | |
296 | \r | |
297 | This method should execute everything that is needed to\r | |
298 | setup and run the test - except for the actual operations\r | |
299 | that you intend to measure. pybench uses this method to\r | |
300 | measure the test implementation overhead.\r | |
301 | \r | |
302 | """\r | |
303 | # Init the test\r | |
304 | a = 1\r | |
305 | \r | |
306 | # Run test rounds (without actually doing any operation)\r | |
307 | for i in xrange(self.rounds):\r | |
308 | \r | |
309 | # Skip the actual execution of the operations, since we\r | |
310 | # only want to measure the test's administration overhead.\r | |
311 | pass\r | |
312 | \r | |
313 | Registering a new test module\r | |
314 | -----------------------------\r | |
315 | \r | |
316 | To register a test module with pybench, the classes need to be\r | |
317 | imported into the pybench.Setup module. pybench will then scan all the\r | |
318 | symbols defined in that module for subclasses of pybench.Test and\r | |
319 | automatically add them to the benchmark suite.\r | |
320 | \r | |
321 | \r | |
322 | Breaking Comparability\r | |
323 | ----------------------\r | |
324 | \r | |
325 | If a change is made to any individual test that means it is no\r | |
326 | longer strictly comparable with previous runs, the '.version' class\r | |
327 | variable should be updated. Therefafter, comparisons with previous\r | |
328 | versions of the test will list as "n/a" to reflect the change.\r | |
329 | \r | |
330 | \r | |
331 | Version History\r | |
332 | ---------------\r | |
333 | \r | |
334 | 2.0: rewrote parts of pybench which resulted in more repeatable\r | |
335 | timings:\r | |
336 | - made timer a parameter\r | |
337 | - changed the platform default timer to use high-resolution\r | |
338 | timers rather than process timers (which have a much lower\r | |
339 | resolution)\r | |
340 | - added option to select timer\r | |
341 | - added process time timer (using systimes.py)\r | |
342 | - changed to use min() as timing estimator (average\r | |
343 | is still taken as well to provide an idea of the difference)\r | |
344 | - garbage collection is turned off per default\r | |
345 | - sys check interval is set to the highest possible value\r | |
346 | - calibration is now a separate step and done using\r | |
347 | a different strategy that allows measuring the test\r | |
348 | overhead more accurately\r | |
349 | - modified the tests to each give a run-time of between\r | |
350 | 100-200ms using warp 10\r | |
351 | - changed default warp factor to 10 (from 20)\r | |
352 | - compared results with timeit.py and confirmed measurements\r | |
353 | - bumped all test versions to 2.0\r | |
354 | - updated platform.py to the latest version\r | |
355 | - changed the output format a bit to make it look\r | |
356 | nicer\r | |
357 | - refactored the APIs somewhat\r | |
358 | 1.3+: Steve Holden added the NewInstances test and the filtering \r | |
359 | option during the NeedForSpeed sprint; this also triggered a long \r | |
360 | discussion on how to improve benchmark timing and finally\r | |
361 | resulted in the release of 2.0\r | |
362 | 1.3: initial checkin into the Python SVN repository\r | |
363 | \r | |
364 | \r | |
365 | Have fun,\r | |
366 | --\r | |
367 | Marc-Andre Lemburg\r | |
368 | mal@lemburg.com\r |