]>
Commit | Line | Data |
---|---|---|
92f5a8d4 TL |
1 | <?xml version="1.0"?> <!-- -*- sgml -*- --> |
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN" | |
3 | "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"[ | |
4 | ||
5 | <!-- various strings, dates etc. common to all docs --> | |
6 | <!ENTITY % common-ents SYSTEM "entities.xml"> %common-ents; | |
7 | ]> | |
8 | ||
9 | <book lang="en" id="userman" xreflabel="bzip2 Manual"> | |
10 | ||
11 | <bookinfo> | |
f67539c2 | 12 | <title>bzip2 and libbzip2, version &bz-version;</title> |
92f5a8d4 TL |
13 | <subtitle>A program and library for data compression</subtitle> |
14 | <copyright> | |
15 | <year>&bz-lifespan;</year> | |
16 | <holder>Julian Seward</holder> | |
17 | </copyright> | |
18 | <releaseinfo>Version &bz-version; of &bz-date;</releaseinfo> | |
19 | ||
20 | <authorgroup> | |
21 | <author> | |
22 | <firstname>Julian</firstname> | |
23 | <surname>Seward</surname> | |
24 | <affiliation> | |
25 | <orgname>&bz-url;</orgname> | |
26 | </affiliation> | |
27 | </author> | |
28 | </authorgroup> | |
29 | ||
f67539c2 | 30 | <legalnotice id="legal"> |
92f5a8d4 TL |
31 | |
32 | <para>This program, <computeroutput>bzip2</computeroutput>, the | |
33 | associated library <computeroutput>libbzip2</computeroutput>, and | |
34 | all documentation, are copyright © &bz-lifespan; Julian Seward. | |
35 | All rights reserved.</para> | |
36 | ||
37 | <para>Redistribution and use in source and binary forms, with | |
38 | or without modification, are permitted provided that the | |
39 | following conditions are met:</para> | |
40 | ||
41 | <itemizedlist mark='bullet'> | |
42 | ||
43 | <listitem><para>Redistributions of source code must retain the | |
44 | above copyright notice, this list of conditions and the | |
45 | following disclaimer.</para></listitem> | |
46 | ||
47 | <listitem><para>The origin of this software must not be | |
48 | misrepresented; you must not claim that you wrote the original | |
49 | software. If you use this software in a product, an | |
50 | acknowledgment in the product documentation would be | |
51 | appreciated but is not required.</para></listitem> | |
52 | ||
53 | <listitem><para>Altered source versions must be plainly marked | |
54 | as such, and must not be misrepresented as being the original | |
55 | software.</para></listitem> | |
56 | ||
57 | <listitem><para>The name of the author may not be used to | |
58 | endorse or promote products derived from this software without | |
59 | specific prior written permission.</para></listitem> | |
60 | ||
61 | </itemizedlist> | |
62 | ||
63 | <para>THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY | |
64 | EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, | |
65 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A | |
66 | PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE | |
67 | AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, | |
68 | EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED | |
69 | TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, | |
70 | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND | |
71 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |
72 | LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING | |
73 | IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF | |
74 | THE POSSIBILITY OF SUCH DAMAGE.</para> | |
75 | ||
76 | <para>PATENTS: To the best of my knowledge, | |
77 | <computeroutput>bzip2</computeroutput> and | |
78 | <computeroutput>libbzip2</computeroutput> do not use any patented | |
79 | algorithms. However, I do not have the resources to carry | |
80 | out a patent search. Therefore I cannot give any guarantee of | |
81 | the above statement. | |
82 | </para> | |
83 | ||
84 | </legalnotice> | |
85 | ||
86 | </bookinfo> | |
87 | ||
88 | ||
89 | ||
90 | <chapter id="intro" xreflabel="Introduction"> | |
91 | <title>Introduction</title> | |
92 | ||
93 | <para><computeroutput>bzip2</computeroutput> compresses files | |
94 | using the Burrows-Wheeler block-sorting text compression | |
95 | algorithm, and Huffman coding. Compression is generally | |
96 | considerably better than that achieved by more conventional | |
97 | LZ77/LZ78-based compressors, and approaches the performance of | |
98 | the PPM family of statistical compressors.</para> | |
99 | ||
100 | <para><computeroutput>bzip2</computeroutput> is built on top of | |
101 | <computeroutput>libbzip2</computeroutput>, a flexible library for | |
102 | handling compressed data in the | |
103 | <computeroutput>bzip2</computeroutput> format. This manual | |
104 | describes both how to use the program and how to work with the | |
105 | library interface. Most of the manual is devoted to this | |
106 | library, not the program, which is good news if your interest is | |
107 | only in the program.</para> | |
108 | ||
109 | <itemizedlist mark='bullet'> | |
110 | ||
111 | <listitem><para><xref linkend="using"/> describes how to use | |
112 | <computeroutput>bzip2</computeroutput>; this is the only part | |
113 | you need to read if you just want to know how to operate the | |
114 | program.</para></listitem> | |
115 | ||
116 | <listitem><para><xref linkend="libprog"/> describes the | |
117 | programming interfaces in detail, and</para></listitem> | |
118 | ||
119 | <listitem><para><xref linkend="misc"/> records some | |
120 | miscellaneous notes which I thought ought to be recorded | |
121 | somewhere.</para></listitem> | |
122 | ||
123 | </itemizedlist> | |
124 | ||
125 | </chapter> | |
126 | ||
127 | ||
128 | <chapter id="using" xreflabel="How to use bzip2"> | |
129 | <title>How to use bzip2</title> | |
130 | ||
131 | <para>This chapter contains a copy of the | |
132 | <computeroutput>bzip2</computeroutput> man page, and nothing | |
133 | else.</para> | |
134 | ||
135 | <sect1 id="name" xreflabel="NAME"> | |
136 | <title>NAME</title> | |
137 | ||
138 | <itemizedlist mark='bullet'> | |
139 | ||
140 | <listitem><para><computeroutput>bzip2</computeroutput>, | |
141 | <computeroutput>bunzip2</computeroutput> - a block-sorting file | |
f67539c2 | 142 | compressor, v&bz-version;</para></listitem> |
92f5a8d4 TL |
143 | |
144 | <listitem><para><computeroutput>bzcat</computeroutput> - | |
145 | decompresses files to stdout</para></listitem> | |
146 | ||
147 | <listitem><para><computeroutput>bzip2recover</computeroutput> - | |
148 | recovers data from damaged bzip2 files</para></listitem> | |
149 | ||
150 | </itemizedlist> | |
151 | ||
152 | </sect1> | |
153 | ||
154 | ||
155 | <sect1 id="synopsis" xreflabel="SYNOPSIS"> | |
156 | <title>SYNOPSIS</title> | |
157 | ||
158 | <itemizedlist mark='bullet'> | |
159 | ||
160 | <listitem><para><computeroutput>bzip2</computeroutput> [ | |
161 | -cdfkqstvzVL123456789 ] [ filenames ... ]</para></listitem> | |
162 | ||
163 | <listitem><para><computeroutput>bunzip2</computeroutput> [ | |
164 | -fkvsVL ] [ filenames ... ]</para></listitem> | |
165 | ||
166 | <listitem><para><computeroutput>bzcat</computeroutput> [ -s ] [ | |
167 | filenames ... ]</para></listitem> | |
168 | ||
169 | <listitem><para><computeroutput>bzip2recover</computeroutput> | |
170 | filename</para></listitem> | |
171 | ||
172 | </itemizedlist> | |
173 | ||
174 | </sect1> | |
175 | ||
176 | ||
177 | <sect1 id="description" xreflabel="DESCRIPTION"> | |
178 | <title>DESCRIPTION</title> | |
179 | ||
180 | <para><computeroutput>bzip2</computeroutput> compresses files | |
181 | using the Burrows-Wheeler block sorting text compression | |
182 | algorithm, and Huffman coding. Compression is generally | |
183 | considerably better than that achieved by more conventional | |
184 | LZ77/LZ78-based compressors, and approaches the performance of | |
185 | the PPM family of statistical compressors.</para> | |
186 | ||
187 | <para>The command-line options are deliberately very similar to | |
188 | those of GNU <computeroutput>gzip</computeroutput>, but they are | |
189 | not identical.</para> | |
190 | ||
191 | <para><computeroutput>bzip2</computeroutput> expects a list of | |
192 | file names to accompany the command-line flags. Each file is | |
193 | replaced by a compressed version of itself, with the name | |
194 | <computeroutput>original_name.bz2</computeroutput>. Each | |
195 | compressed file has the same modification date, permissions, and, | |
196 | when possible, ownership as the corresponding original, so that | |
197 | these properties can be correctly restored at decompression time. | |
198 | File name handling is naive in the sense that there is no | |
199 | mechanism for preserving original file names, permissions, | |
200 | ownerships or dates in filesystems which lack these concepts, or | |
201 | have serious file name length restrictions, such as | |
202 | MS-DOS.</para> | |
203 | ||
204 | <para><computeroutput>bzip2</computeroutput> and | |
205 | <computeroutput>bunzip2</computeroutput> will by default not | |
206 | overwrite existing files. If you want this to happen, specify | |
207 | the <computeroutput>-f</computeroutput> flag.</para> | |
208 | ||
209 | <para>If no file names are specified, | |
210 | <computeroutput>bzip2</computeroutput> compresses from standard | |
211 | input to standard output. In this case, | |
212 | <computeroutput>bzip2</computeroutput> will decline to write | |
213 | compressed output to a terminal, as this would be entirely | |
214 | incomprehensible and therefore pointless.</para> | |
215 | ||
216 | <para><computeroutput>bunzip2</computeroutput> (or | |
217 | <computeroutput>bzip2 -d</computeroutput>) decompresses all | |
218 | specified files. Files which were not created by | |
219 | <computeroutput>bzip2</computeroutput> will be detected and | |
220 | ignored, and a warning issued. | |
221 | <computeroutput>bzip2</computeroutput> attempts to guess the | |
222 | filename for the decompressed file from that of the compressed | |
223 | file as follows:</para> | |
224 | ||
225 | <itemizedlist mark='bullet'> | |
226 | ||
227 | <listitem><para><computeroutput>filename.bz2 </computeroutput> | |
228 | becomes | |
229 | <computeroutput>filename</computeroutput></para></listitem> | |
230 | ||
231 | <listitem><para><computeroutput>filename.bz </computeroutput> | |
232 | becomes | |
233 | <computeroutput>filename</computeroutput></para></listitem> | |
234 | ||
235 | <listitem><para><computeroutput>filename.tbz2</computeroutput> | |
236 | becomes | |
237 | <computeroutput>filename.tar</computeroutput></para></listitem> | |
238 | ||
239 | <listitem><para><computeroutput>filename.tbz </computeroutput> | |
240 | becomes | |
241 | <computeroutput>filename.tar</computeroutput></para></listitem> | |
242 | ||
243 | <listitem><para><computeroutput>anyothername </computeroutput> | |
244 | becomes | |
245 | <computeroutput>anyothername.out</computeroutput></para></listitem> | |
246 | ||
247 | </itemizedlist> | |
248 | ||
249 | <para>If the file does not end in one of the recognised endings, | |
250 | <computeroutput>.bz2</computeroutput>, | |
251 | <computeroutput>.bz</computeroutput>, | |
252 | <computeroutput>.tbz2</computeroutput> or | |
253 | <computeroutput>.tbz</computeroutput>, | |
254 | <computeroutput>bzip2</computeroutput> complains that it cannot | |
255 | guess the name of the original file, and uses the original name | |
256 | with <computeroutput>.out</computeroutput> appended.</para> | |
257 | ||
258 | <para>As with compression, supplying no filenames causes | |
259 | decompression from standard input to standard output.</para> | |
260 | ||
261 | <para><computeroutput>bunzip2</computeroutput> will correctly | |
262 | decompress a file which is the concatenation of two or more | |
263 | compressed files. The result is the concatenation of the | |
264 | corresponding uncompressed files. Integrity testing | |
265 | (<computeroutput>-t</computeroutput>) of concatenated compressed | |
266 | files is also supported.</para> | |
267 | ||
268 | <para>You can also compress or decompress files to the standard | |
269 | output by giving the <computeroutput>-c</computeroutput> flag. | |
270 | Multiple files may be compressed and decompressed like this. The | |
271 | resulting outputs are fed sequentially to stdout. Compression of | |
272 | multiple files in this manner generates a stream containing | |
273 | multiple compressed file representations. Such a stream can be | |
274 | decompressed correctly only by | |
275 | <computeroutput>bzip2</computeroutput> version 0.9.0 or later. | |
276 | Earlier versions of <computeroutput>bzip2</computeroutput> will | |
277 | stop after decompressing the first file in the stream.</para> | |
278 | ||
279 | <para><computeroutput>bzcat</computeroutput> (or | |
280 | <computeroutput>bzip2 -dc</computeroutput>) decompresses all | |
281 | specified files to the standard output.</para> | |
282 | ||
283 | <para><computeroutput>bzip2</computeroutput> will read arguments | |
284 | from the environment variables | |
285 | <computeroutput>BZIP2</computeroutput> and | |
286 | <computeroutput>BZIP</computeroutput>, in that order, and will | |
287 | process them before any arguments read from the command line. | |
288 | This gives a convenient way to supply default arguments.</para> | |
289 | ||
290 | <para>Compression is always performed, even if the compressed | |
291 | file is slightly larger than the original. Files of less than | |
292 | about one hundred bytes tend to get larger, since the compression | |
293 | mechanism has a constant overhead in the region of 50 bytes. | |
294 | Random data (including the output of most file compressors) is | |
295 | coded at about 8.05 bits per byte, giving an expansion of around | |
296 | 0.5%.</para> | |
297 | ||
298 | <para>As a self-check for your protection, | |
299 | <computeroutput>bzip2</computeroutput> uses 32-bit CRCs to make | |
300 | sure that the decompressed version of a file is identical to the | |
301 | original. This guards against corruption of the compressed data, | |
302 | and against undetected bugs in | |
303 | <computeroutput>bzip2</computeroutput> (hopefully very unlikely). | |
304 | The chances of data corruption going undetected is microscopic, | |
305 | about one chance in four billion for each file processed. Be | |
306 | aware, though, that the check occurs upon decompression, so it | |
307 | can only tell you that something is wrong. It can't help you | |
308 | recover the original uncompressed data. You can use | |
309 | <computeroutput>bzip2recover</computeroutput> to try to recover | |
310 | data from damaged files.</para> | |
311 | ||
312 | <para>Return values: 0 for a normal exit, 1 for environmental | |
313 | problems (file not found, invalid flags, I/O errors, etc.), 2 | |
314 | to indicate a corrupt compressed file, 3 for an internal | |
315 | consistency error (eg, bug) which caused | |
316 | <computeroutput>bzip2</computeroutput> to panic.</para> | |
317 | ||
318 | </sect1> | |
319 | ||
320 | ||
321 | <sect1 id="options" xreflabel="OPTIONS"> | |
322 | <title>OPTIONS</title> | |
323 | ||
324 | <variablelist> | |
325 | ||
326 | <varlistentry> | |
327 | <term><computeroutput>-c --stdout</computeroutput></term> | |
328 | <listitem><para>Compress or decompress to standard | |
329 | output.</para></listitem> | |
330 | </varlistentry> | |
331 | ||
332 | <varlistentry> | |
333 | <term><computeroutput>-d --decompress</computeroutput></term> | |
334 | <listitem><para>Force decompression. | |
335 | <computeroutput>bzip2</computeroutput>, | |
336 | <computeroutput>bunzip2</computeroutput> and | |
337 | <computeroutput>bzcat</computeroutput> are really the same | |
338 | program, and the decision about what actions to take is done on | |
339 | the basis of which name is used. This flag overrides that | |
340 | mechanism, and forces bzip2 to decompress.</para></listitem> | |
341 | </varlistentry> | |
342 | ||
343 | <varlistentry> | |
344 | <term><computeroutput>-z --compress</computeroutput></term> | |
345 | <listitem><para>The complement to | |
346 | <computeroutput>-d</computeroutput>: forces compression, | |
347 | regardless of the invokation name.</para></listitem> | |
348 | </varlistentry> | |
349 | ||
350 | <varlistentry> | |
351 | <term><computeroutput>-t --test</computeroutput></term> | |
352 | <listitem><para>Check integrity of the specified file(s), but | |
353 | don't decompress them. This really performs a trial | |
354 | decompression and throws away the result.</para></listitem> | |
355 | </varlistentry> | |
356 | ||
357 | <varlistentry> | |
358 | <term><computeroutput>-f --force</computeroutput></term> | |
359 | <listitem><para>Force overwrite of output files. Normally, | |
360 | <computeroutput>bzip2</computeroutput> will not overwrite | |
361 | existing output files. Also forces | |
362 | <computeroutput>bzip2</computeroutput> to break hard links to | |
363 | files, which it otherwise wouldn't do.</para> | |
364 | <para><computeroutput>bzip2</computeroutput> normally declines | |
365 | to decompress files which don't have the correct magic header | |
366 | bytes. If forced (<computeroutput>-f</computeroutput>), | |
367 | however, it will pass such files through unmodified. This is | |
368 | how GNU <computeroutput>gzip</computeroutput> behaves.</para> | |
369 | </listitem> | |
370 | </varlistentry> | |
371 | ||
372 | <varlistentry> | |
373 | <term><computeroutput>-k --keep</computeroutput></term> | |
374 | <listitem><para>Keep (don't delete) input files during | |
375 | compression or decompression.</para></listitem> | |
376 | </varlistentry> | |
377 | ||
378 | <varlistentry> | |
379 | <term><computeroutput>-s --small</computeroutput></term> | |
380 | <listitem><para>Reduce memory usage, for compression, | |
381 | decompression and testing. Files are decompressed and tested | |
382 | using a modified algorithm which only requires 2.5 bytes per | |
383 | block byte. This means any file can be decompressed in 2300k | |
384 | of memory, albeit at about half the normal speed.</para> | |
385 | <para>During compression, <computeroutput>-s</computeroutput> | |
386 | selects a block size of 200k, which limits memory use to around | |
387 | the same figure, at the expense of your compression ratio. In | |
388 | short, if your machine is low on memory (8 megabytes or less), | |
389 | use <computeroutput>-s</computeroutput> for everything. See | |
390 | <xref linkend="memory-management"/> below.</para></listitem> | |
391 | </varlistentry> | |
392 | ||
393 | <varlistentry> | |
394 | <term><computeroutput>-q --quiet</computeroutput></term> | |
395 | <listitem><para>Suppress non-essential warning messages. | |
396 | Messages pertaining to I/O errors and other critical events | |
397 | will not be suppressed.</para></listitem> | |
398 | </varlistentry> | |
399 | ||
400 | <varlistentry> | |
401 | <term><computeroutput>-v --verbose</computeroutput></term> | |
402 | <listitem><para>Verbose mode -- show the compression ratio for | |
403 | each file processed. Further | |
404 | <computeroutput>-v</computeroutput>'s increase the verbosity | |
405 | level, spewing out lots of information which is primarily of | |
406 | interest for diagnostic purposes.</para></listitem> | |
407 | </varlistentry> | |
408 | ||
409 | <varlistentry> | |
410 | <term><computeroutput>-L --license -V --version</computeroutput></term> | |
411 | <listitem><para>Display the software version, license terms and | |
412 | conditions.</para></listitem> | |
413 | </varlistentry> | |
414 | ||
415 | <varlistentry> | |
416 | <term><computeroutput>-1</computeroutput> (or | |
417 | <computeroutput>--fast</computeroutput>) to | |
418 | <computeroutput>-9</computeroutput> (or | |
419 | <computeroutput>-best</computeroutput>)</term> | |
420 | <listitem><para>Set the block size to 100 k, 200 k ... 900 k | |
421 | when compressing. Has no effect when decompressing. See <xref | |
422 | linkend="memory-management" /> below. The | |
423 | <computeroutput>--fast</computeroutput> and | |
424 | <computeroutput>--best</computeroutput> aliases are primarily | |
425 | for GNU <computeroutput>gzip</computeroutput> compatibility. | |
426 | In particular, <computeroutput>--fast</computeroutput> doesn't | |
427 | make things significantly faster. And | |
428 | <computeroutput>--best</computeroutput> merely selects the | |
429 | default behaviour.</para></listitem> | |
430 | </varlistentry> | |
431 | ||
432 | <varlistentry> | |
433 | <term><computeroutput>--</computeroutput></term> | |
434 | <listitem><para>Treats all subsequent arguments as file names, | |
435 | even if they start with a dash. This is so you can handle | |
436 | files with names beginning with a dash, for example: | |
437 | <computeroutput>bzip2 -- | |
438 | -myfilename</computeroutput>.</para></listitem> | |
439 | </varlistentry> | |
440 | ||
441 | <varlistentry> | |
442 | <term><computeroutput>--repetitive-fast</computeroutput></term> | |
443 | <term><computeroutput>--repetitive-best</computeroutput></term> | |
444 | <listitem><para>These flags are redundant in versions 0.9.5 and | |
445 | above. They provided some coarse control over the behaviour of | |
446 | the sorting algorithm in earlier versions, which was sometimes | |
447 | useful. 0.9.5 and above have an improved algorithm which | |
448 | renders these flags irrelevant.</para></listitem> | |
449 | </varlistentry> | |
450 | ||
451 | </variablelist> | |
452 | ||
453 | </sect1> | |
454 | ||
455 | ||
456 | <sect1 id="memory-management" xreflabel="MEMORY MANAGEMENT"> | |
457 | <title>MEMORY MANAGEMENT</title> | |
458 | ||
459 | <para><computeroutput>bzip2</computeroutput> compresses large | |
460 | files in blocks. The block size affects both the compression | |
461 | ratio achieved, and the amount of memory needed for compression | |
462 | and decompression. The flags <computeroutput>-1</computeroutput> | |
463 | through <computeroutput>-9</computeroutput> specify the block | |
464 | size to be 100,000 bytes through 900,000 bytes (the default) | |
465 | respectively. At decompression time, the block size used for | |
466 | compression is read from the header of the compressed file, and | |
467 | <computeroutput>bunzip2</computeroutput> then allocates itself | |
468 | just enough memory to decompress the file. Since block sizes are | |
469 | stored in compressed files, it follows that the flags | |
470 | <computeroutput>-1</computeroutput> to | |
471 | <computeroutput>-9</computeroutput> are irrelevant to and so | |
472 | ignored during decompression.</para> | |
473 | ||
474 | <para>Compression and decompression requirements, in bytes, can be | |
475 | estimated as:</para> | |
476 | <programlisting> | |
477 | Compression: 400k + ( 8 x block size ) | |
478 | ||
479 | Decompression: 100k + ( 4 x block size ), or | |
480 | 100k + ( 2.5 x block size ) | |
481 | </programlisting> | |
482 | ||
483 | <para>Larger block sizes give rapidly diminishing marginal | |
484 | returns. Most of the compression comes from the first two or | |
485 | three hundred k of block size, a fact worth bearing in mind when | |
486 | using <computeroutput>bzip2</computeroutput> on small machines. | |
487 | It is also important to appreciate that the decompression memory | |
488 | requirement is set at compression time by the choice of block | |
489 | size.</para> | |
490 | ||
491 | <para>For files compressed with the default 900k block size, | |
492 | <computeroutput>bunzip2</computeroutput> will require about 3700 | |
493 | kbytes to decompress. To support decompression of any file on a | |
494 | 4 megabyte machine, <computeroutput>bunzip2</computeroutput> has | |
495 | an option to decompress using approximately half this amount of | |
496 | memory, about 2300 kbytes. Decompression speed is also halved, | |
497 | so you should use this option only where necessary. The relevant | |
498 | flag is <computeroutput>-s</computeroutput>.</para> | |
499 | ||
500 | <para>In general, try and use the largest block size memory | |
501 | constraints allow, since that maximises the compression achieved. | |
502 | Compression and decompression speed are virtually unaffected by | |
503 | block size.</para> | |
504 | ||
505 | <para>Another significant point applies to files which fit in a | |
506 | single block -- that means most files you'd encounter using a | |
507 | large block size. The amount of real memory touched is | |
508 | proportional to the size of the file, since the file is smaller | |
509 | than a block. For example, compressing a file 20,000 bytes long | |
510 | with the flag <computeroutput>-9</computeroutput> will cause the | |
511 | compressor to allocate around 7600k of memory, but only touch | |
512 | 400k + 20000 * 8 = 560 kbytes of it. Similarly, the decompressor | |
513 | will allocate 3700k but only touch 100k + 20000 * 4 = 180 | |
514 | kbytes.</para> | |
515 | ||
516 | <para>Here is a table which summarises the maximum memory usage | |
517 | for different block sizes. Also recorded is the total compressed | |
518 | size for 14 files of the Calgary Text Compression Corpus | |
519 | totalling 3,141,622 bytes. This column gives some feel for how | |
520 | compression varies with block size. These figures tend to | |
521 | understate the advantage of larger block sizes for larger files, | |
522 | since the Corpus is dominated by smaller files.</para> | |
523 | ||
524 | <programlisting> | |
525 | Compress Decompress Decompress Corpus | |
526 | Flag usage usage -s usage Size | |
527 | ||
528 | -1 1200k 500k 350k 914704 | |
529 | -2 2000k 900k 600k 877703 | |
530 | -3 2800k 1300k 850k 860338 | |
531 | -4 3600k 1700k 1100k 846899 | |
532 | -5 4400k 2100k 1350k 845160 | |
533 | -6 5200k 2500k 1600k 838626 | |
534 | -7 6100k 2900k 1850k 834096 | |
535 | -8 6800k 3300k 2100k 828642 | |
536 | -9 7600k 3700k 2350k 828642 | |
537 | </programlisting> | |
538 | ||
539 | </sect1> | |
540 | ||
541 | ||
542 | <sect1 id="recovering" xreflabel="RECOVERING DATA FROM DAMAGED FILES"> | |
543 | <title>RECOVERING DATA FROM DAMAGED FILES</title> | |
544 | ||
545 | <para><computeroutput>bzip2</computeroutput> compresses files in | |
546 | blocks, usually 900kbytes long. Each block is handled | |
547 | independently. If a media or transmission error causes a | |
548 | multi-block <computeroutput>.bz2</computeroutput> file to become | |
549 | damaged, it may be possible to recover data from the undamaged | |
550 | blocks in the file.</para> | |
551 | ||
552 | <para>The compressed representation of each block is delimited by | |
553 | a 48-bit pattern, which makes it possible to find the block | |
554 | boundaries with reasonable certainty. Each block also carries | |
555 | its own 32-bit CRC, so damaged blocks can be distinguished from | |
556 | undamaged ones.</para> | |
557 | ||
558 | <para><computeroutput>bzip2recover</computeroutput> is a simple | |
559 | program whose purpose is to search for blocks in | |
560 | <computeroutput>.bz2</computeroutput> files, and write each block | |
561 | out into its own <computeroutput>.bz2</computeroutput> file. You | |
562 | can then use <computeroutput>bzip2 -t</computeroutput> to test | |
563 | the integrity of the resulting files, and decompress those which | |
564 | are undamaged.</para> | |
565 | ||
566 | <para><computeroutput>bzip2recover</computeroutput> takes a | |
567 | single argument, the name of the damaged file, and writes a | |
568 | number of files <computeroutput>rec0001file.bz2</computeroutput>, | |
569 | <computeroutput>rec0002file.bz2</computeroutput>, etc, containing | |
570 | the extracted blocks. The output filenames are designed so that | |
571 | the use of wildcards in subsequent processing -- for example, | |
572 | <computeroutput>bzip2 -dc rec*file.bz2 > | |
573 | recovered_data</computeroutput> -- lists the files in the correct | |
574 | order.</para> | |
575 | ||
576 | <para><computeroutput>bzip2recover</computeroutput> should be of | |
577 | most use dealing with large <computeroutput>.bz2</computeroutput> | |
578 | files, as these will contain many blocks. It is clearly futile | |
579 | to use it on damaged single-block files, since a damaged block | |
580 | cannot be recovered. If you wish to minimise any potential data | |
581 | loss through media or transmission errors, you might consider | |
582 | compressing with a smaller block size.</para> | |
583 | ||
584 | </sect1> | |
585 | ||
586 | ||
587 | <sect1 id="performance" xreflabel="PERFORMANCE NOTES"> | |
588 | <title>PERFORMANCE NOTES</title> | |
589 | ||
590 | <para>The sorting phase of compression gathers together similar | |
591 | strings in the file. Because of this, files containing very long | |
592 | runs of repeated symbols, like "aabaabaabaab ..." (repeated | |
593 | several hundred times) may compress more slowly than normal. | |
594 | Versions 0.9.5 and above fare much better than previous versions | |
595 | in this respect. The ratio between worst-case and average-case | |
596 | compression time is in the region of 10:1. For previous | |
597 | versions, this figure was more like 100:1. You can use the | |
598 | <computeroutput>-vvvv</computeroutput> option to monitor progress | |
599 | in great detail, if you want.</para> | |
600 | ||
601 | <para>Decompression speed is unaffected by these | |
602 | phenomena.</para> | |
603 | ||
604 | <para><computeroutput>bzip2</computeroutput> usually allocates | |
605 | several megabytes of memory to operate in, and then charges all | |
606 | over it in a fairly random fashion. This means that performance, | |
607 | both for compressing and decompressing, is largely determined by | |
608 | the speed at which your machine can service cache misses. | |
609 | Because of this, small changes to the code to reduce the miss | |
610 | rate have been observed to give disproportionately large | |
611 | performance improvements. I imagine | |
612 | <computeroutput>bzip2</computeroutput> will perform best on | |
613 | machines with very large caches.</para> | |
614 | ||
615 | </sect1> | |
616 | ||
617 | ||
618 | ||
619 | <sect1 id="caveats" xreflabel="CAVEATS"> | |
620 | <title>CAVEATS</title> | |
621 | ||
622 | <para>I/O error messages are not as helpful as they could be. | |
623 | <computeroutput>bzip2</computeroutput> tries hard to detect I/O | |
624 | errors and exit cleanly, but the details of what the problem is | |
625 | sometimes seem rather misleading.</para> | |
626 | ||
627 | <para>This manual page pertains to version &bz-version; of | |
628 | <computeroutput>bzip2</computeroutput>. Compressed data created by | |
629 | this version is entirely forwards and backwards compatible with the | |
630 | previous public releases, versions 0.1pl2, 0.9.0 and 0.9.5, 1.0.0, | |
631 | 1.0.1, 1.0.2 and 1.0.3, but with the following exception: 0.9.0 and | |
632 | above can correctly decompress multiple concatenated compressed files. | |
633 | 0.1pl2 cannot do this; it will stop after decompressing just the first | |
634 | file in the stream.</para> | |
635 | ||
636 | <para><computeroutput>bzip2recover</computeroutput> versions | |
637 | prior to 1.0.2 used 32-bit integers to represent bit positions in | |
638 | compressed files, so it could not handle compressed files more | |
639 | than 512 megabytes long. Versions 1.0.2 and above use 64-bit ints | |
640 | on some platforms which support them (GNU supported targets, and | |
641 | Windows). To establish whether or not | |
642 | <computeroutput>bzip2recover</computeroutput> was built with such | |
643 | a limitation, run it without arguments. In any event you can | |
644 | build yourself an unlimited version if you can recompile it with | |
645 | <computeroutput>MaybeUInt64</computeroutput> set to be an | |
646 | unsigned 64-bit integer.</para> | |
647 | ||
648 | </sect1> | |
649 | ||
650 | ||
651 | ||
652 | <sect1 id="author" xreflabel="AUTHOR"> | |
653 | <title>AUTHOR</title> | |
654 | ||
655 | <para>Julian Seward, | |
f67539c2 | 656 | <computeroutput>&bz-author;</computeroutput></para> |
92f5a8d4 TL |
657 | |
658 | <para>The ideas embodied in | |
659 | <computeroutput>bzip2</computeroutput> are due to (at least) the | |
660 | following people: Michael Burrows and David Wheeler (for the | |
661 | block sorting transformation), David Wheeler (again, for the | |
662 | Huffman coder), Peter Fenwick (for the structured coding model in | |
663 | the original <computeroutput>bzip</computeroutput>, and many | |
664 | refinements), and Alistair Moffat, Radford Neal and Ian Witten | |
665 | (for the arithmetic coder in the original | |
666 | <computeroutput>bzip</computeroutput>). I am much indebted for | |
667 | their help, support and advice. See the manual in the source | |
668 | distribution for pointers to sources of documentation. Christian | |
669 | von Roques encouraged me to look for faster sorting algorithms, | |
670 | so as to speed up compression. Bela Lubkin encouraged me to | |
671 | improve the worst-case compression performance. | |
672 | Donna Robinson XMLised the documentation. | |
673 | Many people sent | |
674 | patches, helped with portability problems, lent machines, gave | |
675 | advice and were generally helpful.</para> | |
676 | ||
677 | </sect1> | |
678 | ||
679 | </chapter> | |
680 | ||
681 | ||
682 | ||
683 | <chapter id="libprog" xreflabel="Programming with libbzip2"> | |
684 | <title> | |
685 | Programming with <computeroutput>libbzip2</computeroutput> | |
686 | </title> | |
687 | ||
688 | <para>This chapter describes the programming interface to | |
689 | <computeroutput>libbzip2</computeroutput>.</para> | |
690 | ||
691 | <para>For general background information, particularly about | |
692 | memory use and performance aspects, you'd be well advised to read | |
693 | <xref linkend="using"/> as well.</para> | |
694 | ||
695 | ||
696 | <sect1 id="top-level" xreflabel="Top-level structure"> | |
697 | <title>Top-level structure</title> | |
698 | ||
699 | <para><computeroutput>libbzip2</computeroutput> is a flexible | |
700 | library for compressing and decompressing data in the | |
701 | <computeroutput>bzip2</computeroutput> data format. Although | |
702 | packaged as a single entity, it helps to regard the library as | |
703 | three separate parts: the low level interface, and the high level | |
704 | interface, and some utility functions.</para> | |
705 | ||
706 | <para>The structure of | |
707 | <computeroutput>libbzip2</computeroutput>'s interfaces is similar | |
708 | to that of Jean-loup Gailly's and Mark Adler's excellent | |
709 | <computeroutput>zlib</computeroutput> library.</para> | |
710 | ||
711 | <para>All externally visible symbols have names beginning | |
712 | <computeroutput>BZ2_</computeroutput>. This is new in version | |
713 | 1.0. The intention is to minimise pollution of the namespaces of | |
714 | library clients.</para> | |
715 | ||
716 | <para>To use any part of the library, you need to | |
717 | <computeroutput>#include <bzlib.h></computeroutput> | |
718 | into your sources.</para> | |
719 | ||
720 | ||
721 | ||
722 | <sect2 id="ll-summary" xreflabel="Low-level summary"> | |
723 | <title>Low-level summary</title> | |
724 | ||
725 | <para>This interface provides services for compressing and | |
726 | decompressing data in memory. There's no provision for dealing | |
727 | with files, streams or any other I/O mechanisms, just straight | |
728 | memory-to-memory work. In fact, this part of the library can be | |
729 | compiled without inclusion of | |
730 | <computeroutput>stdio.h</computeroutput>, which may be helpful | |
731 | for embedded applications.</para> | |
732 | ||
733 | <para>The low-level part of the library has no global variables | |
734 | and is therefore thread-safe.</para> | |
735 | ||
736 | <para>Six routines make up the low level interface: | |
737 | <computeroutput>BZ2_bzCompressInit</computeroutput>, | |
738 | <computeroutput>BZ2_bzCompress</computeroutput>, and | |
739 | <computeroutput>BZ2_bzCompressEnd</computeroutput> for | |
740 | compression, and a corresponding trio | |
741 | <computeroutput>BZ2_bzDecompressInit</computeroutput>, | |
742 | <computeroutput>BZ2_bzDecompress</computeroutput> and | |
743 | <computeroutput>BZ2_bzDecompressEnd</computeroutput> for | |
744 | decompression. The <computeroutput>*Init</computeroutput> | |
745 | functions allocate memory for compression/decompression and do | |
746 | other initialisations, whilst the | |
747 | <computeroutput>*End</computeroutput> functions close down | |
748 | operations and release memory.</para> | |
749 | ||
750 | <para>The real work is done by | |
751 | <computeroutput>BZ2_bzCompress</computeroutput> and | |
752 | <computeroutput>BZ2_bzDecompress</computeroutput>. These | |
753 | compress and decompress data from a user-supplied input buffer to | |
754 | a user-supplied output buffer. These buffers can be any size; | |
755 | arbitrary quantities of data are handled by making repeated calls | |
756 | to these functions. This is a flexible mechanism allowing a | |
757 | consumer-pull style of activity, or producer-push, or a mixture | |
758 | of both.</para> | |
759 | ||
760 | </sect2> | |
761 | ||
762 | ||
763 | <sect2 id="hl-summary" xreflabel="High-level summary"> | |
764 | <title>High-level summary</title> | |
765 | ||
766 | <para>This interface provides some handy wrappers around the | |
767 | low-level interface to facilitate reading and writing | |
768 | <computeroutput>bzip2</computeroutput> format files | |
769 | (<computeroutput>.bz2</computeroutput> files). The routines | |
770 | provide hooks to facilitate reading files in which the | |
771 | <computeroutput>bzip2</computeroutput> data stream is embedded | |
772 | within some larger-scale file structure, or where there are | |
773 | multiple <computeroutput>bzip2</computeroutput> data streams | |
774 | concatenated end-to-end.</para> | |
775 | ||
776 | <para>For reading files, | |
777 | <computeroutput>BZ2_bzReadOpen</computeroutput>, | |
778 | <computeroutput>BZ2_bzRead</computeroutput>, | |
779 | <computeroutput>BZ2_bzReadClose</computeroutput> and | |
780 | <computeroutput>BZ2_bzReadGetUnused</computeroutput> are | |
781 | supplied. For writing files, | |
782 | <computeroutput>BZ2_bzWriteOpen</computeroutput>, | |
783 | <computeroutput>BZ2_bzWrite</computeroutput> and | |
784 | <computeroutput>BZ2_bzWriteFinish</computeroutput> are | |
785 | available.</para> | |
786 | ||
787 | <para>As with the low-level library, no global variables are used | |
788 | so the library is per se thread-safe. However, if I/O errors | |
789 | occur whilst reading or writing the underlying compressed files, | |
790 | you may have to consult <computeroutput>errno</computeroutput> to | |
791 | determine the cause of the error. In that case, you'd need a C | |
792 | library which correctly supports | |
793 | <computeroutput>errno</computeroutput> in a multithreaded | |
794 | environment.</para> | |
795 | ||
796 | <para>To make the library a little simpler and more portable, | |
797 | <computeroutput>BZ2_bzReadOpen</computeroutput> and | |
798 | <computeroutput>BZ2_bzWriteOpen</computeroutput> require you to | |
799 | pass them file handles (<computeroutput>FILE*</computeroutput>s) | |
800 | which have previously been opened for reading or writing | |
801 | respectively. That avoids portability problems associated with | |
802 | file operations and file attributes, whilst not being much of an | |
803 | imposition on the programmer.</para> | |
804 | ||
805 | </sect2> | |
806 | ||
807 | ||
808 | <sect2 id="util-fns-summary" xreflabel="Utility functions summary"> | |
809 | <title>Utility functions summary</title> | |
810 | ||
811 | <para>For very simple needs, | |
812 | <computeroutput>BZ2_bzBuffToBuffCompress</computeroutput> and | |
813 | <computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> are | |
814 | provided. These compress data in memory from one buffer to | |
815 | another buffer in a single function call. You should assess | |
816 | whether these functions fulfill your memory-to-memory | |
817 | compression/decompression requirements before investing effort in | |
818 | understanding the more general but more complex low-level | |
819 | interface.</para> | |
820 | ||
821 | <para>Yoshioka Tsuneo | |
822 | (<computeroutput>tsuneo@rr.iij4u.or.jp</computeroutput>) has | |
823 | contributed some functions to give better | |
824 | <computeroutput>zlib</computeroutput> compatibility. These | |
825 | functions are <computeroutput>BZ2_bzopen</computeroutput>, | |
826 | <computeroutput>BZ2_bzread</computeroutput>, | |
827 | <computeroutput>BZ2_bzwrite</computeroutput>, | |
828 | <computeroutput>BZ2_bzflush</computeroutput>, | |
829 | <computeroutput>BZ2_bzclose</computeroutput>, | |
830 | <computeroutput>BZ2_bzerror</computeroutput> and | |
831 | <computeroutput>BZ2_bzlibVersion</computeroutput>. You may find | |
832 | these functions more convenient for simple file reading and | |
833 | writing, than those in the high-level interface. These functions | |
834 | are not (yet) officially part of the library, and are minimally | |
835 | documented here. If they break, you get to keep all the pieces. | |
836 | I hope to document them properly when time permits.</para> | |
837 | ||
838 | <para>Yoshioka also contributed modifications to allow the | |
839 | library to be built as a Windows DLL.</para> | |
840 | ||
841 | </sect2> | |
842 | ||
843 | </sect1> | |
844 | ||
845 | ||
846 | <sect1 id="err-handling" xreflabel="Error handling"> | |
847 | <title>Error handling</title> | |
848 | ||
849 | <para>The library is designed to recover cleanly in all | |
850 | situations, including the worst-case situation of decompressing | |
851 | random data. I'm not 100% sure that it can always do this, so | |
852 | you might want to add a signal handler to catch segmentation | |
853 | violations during decompression if you are feeling especially | |
854 | paranoid. I would be interested in hearing more about the | |
855 | robustness of the library to corrupted compressed data.</para> | |
856 | ||
857 | <para>Version 1.0.3 more robust in this respect than any | |
858 | previous version. Investigations with Valgrind (a tool for detecting | |
859 | problems with memory management) indicate | |
860 | that, at least for the few files I tested, all single-bit errors | |
861 | in the decompressed data are caught properly, with no | |
862 | segmentation faults, no uses of uninitialised data, no out of | |
863 | range reads or writes, and no infinite looping in the decompressor. | |
864 | So it's certainly pretty robust, although | |
865 | I wouldn't claim it to be totally bombproof.</para> | |
866 | ||
867 | <para>The file <computeroutput>bzlib.h</computeroutput> contains | |
868 | all definitions needed to use the library. In particular, you | |
869 | should definitely not include | |
870 | <computeroutput>bzlib_private.h</computeroutput>.</para> | |
871 | ||
872 | <para>In <computeroutput>bzlib.h</computeroutput>, the various | |
873 | return values are defined. The following list is not intended as | |
874 | an exhaustive description of the circumstances in which a given | |
875 | value may be returned -- those descriptions are given later. | |
876 | Rather, it is intended to convey the rough meaning of each return | |
877 | value. The first five actions are normal and not intended to | |
878 | denote an error situation.</para> | |
879 | ||
880 | <variablelist> | |
881 | ||
882 | <varlistentry> | |
883 | <term><computeroutput>BZ_OK</computeroutput></term> | |
884 | <listitem><para>The requested action was completed | |
885 | successfully.</para></listitem> | |
886 | </varlistentry> | |
887 | ||
888 | <varlistentry> | |
889 | <term><computeroutput>BZ_RUN_OK, BZ_FLUSH_OK, | |
890 | BZ_FINISH_OK</computeroutput></term> | |
891 | <listitem><para>In | |
892 | <computeroutput>BZ2_bzCompress</computeroutput>, the requested | |
893 | flush/finish/nothing-special action was completed | |
894 | successfully.</para></listitem> | |
895 | </varlistentry> | |
896 | ||
897 | <varlistentry> | |
898 | <term><computeroutput>BZ_STREAM_END</computeroutput></term> | |
899 | <listitem><para>Compression of data was completed, or the | |
900 | logical stream end was detected during | |
901 | decompression.</para></listitem> | |
902 | </varlistentry> | |
903 | ||
904 | </variablelist> | |
905 | ||
906 | <para>The following return values indicate an error of some | |
907 | kind.</para> | |
908 | ||
909 | <variablelist> | |
910 | ||
911 | <varlistentry> | |
912 | <term><computeroutput>BZ_CONFIG_ERROR</computeroutput></term> | |
913 | <listitem><para>Indicates that the library has been improperly | |
914 | compiled on your platform -- a major configuration error. | |
915 | Specifically, it means that | |
916 | <computeroutput>sizeof(char)</computeroutput>, | |
917 | <computeroutput>sizeof(short)</computeroutput> and | |
918 | <computeroutput>sizeof(int)</computeroutput> are not 1, 2 and | |
919 | 4 respectively, as they should be. Note that the library | |
920 | should still work properly on 64-bit platforms which follow | |
921 | the LP64 programming model -- that is, where | |
922 | <computeroutput>sizeof(long)</computeroutput> and | |
923 | <computeroutput>sizeof(void*)</computeroutput> are 8. Under | |
924 | LP64, <computeroutput>sizeof(int)</computeroutput> is still 4, | |
925 | so <computeroutput>libbzip2</computeroutput>, which doesn't | |
926 | use the <computeroutput>long</computeroutput> type, is | |
927 | OK.</para></listitem> | |
928 | </varlistentry> | |
929 | ||
930 | <varlistentry> | |
931 | <term><computeroutput>BZ_SEQUENCE_ERROR</computeroutput></term> | |
932 | <listitem><para>When using the library, it is important to call | |
933 | the functions in the correct sequence and with data structures | |
934 | (buffers etc) in the correct states. | |
935 | <computeroutput>libbzip2</computeroutput> checks as much as it | |
936 | can to ensure this is happening, and returns | |
937 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput> if not. | |
938 | Code which complies precisely with the function semantics, as | |
939 | detailed below, should never receive this value; such an event | |
940 | denotes buggy code which you should | |
941 | investigate.</para></listitem> | |
942 | </varlistentry> | |
943 | ||
944 | <varlistentry> | |
945 | <term><computeroutput>BZ_PARAM_ERROR</computeroutput></term> | |
946 | <listitem><para>Returned when a parameter to a function call is | |
947 | out of range or otherwise manifestly incorrect. As with | |
948 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput>, this | |
949 | denotes a bug in the client code. The distinction between | |
950 | <computeroutput>BZ_PARAM_ERROR</computeroutput> and | |
951 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput> is a bit | |
952 | hazy, but still worth making.</para></listitem> | |
953 | </varlistentry> | |
954 | ||
955 | <varlistentry> | |
956 | <term><computeroutput>BZ_MEM_ERROR</computeroutput></term> | |
957 | <listitem><para>Returned when a request to allocate memory | |
958 | failed. Note that the quantity of memory needed to decompress | |
959 | a stream cannot be determined until the stream's header has | |
960 | been read. So | |
961 | <computeroutput>BZ2_bzDecompress</computeroutput> and | |
962 | <computeroutput>BZ2_bzRead</computeroutput> may return | |
963 | <computeroutput>BZ_MEM_ERROR</computeroutput> even though some | |
964 | of the compressed data has been read. The same is not true | |
965 | for compression; once | |
966 | <computeroutput>BZ2_bzCompressInit</computeroutput> or | |
967 | <computeroutput>BZ2_bzWriteOpen</computeroutput> have | |
968 | successfully completed, | |
969 | <computeroutput>BZ_MEM_ERROR</computeroutput> cannot | |
970 | occur.</para></listitem> | |
971 | </varlistentry> | |
972 | ||
973 | <varlistentry> | |
974 | <term><computeroutput>BZ_DATA_ERROR</computeroutput></term> | |
975 | <listitem><para>Returned when a data integrity error is | |
976 | detected during decompression. Most importantly, this means | |
977 | when stored and computed CRCs for the data do not match. This | |
978 | value is also returned upon detection of any other anomaly in | |
979 | the compressed data.</para></listitem> | |
980 | </varlistentry> | |
981 | ||
982 | <varlistentry> | |
983 | <term><computeroutput>BZ_DATA_ERROR_MAGIC</computeroutput></term> | |
984 | <listitem><para>As a special case of | |
985 | <computeroutput>BZ_DATA_ERROR</computeroutput>, it is | |
986 | sometimes useful to know when the compressed stream does not | |
987 | start with the correct magic bytes (<computeroutput>'B' 'Z' | |
988 | 'h'</computeroutput>).</para></listitem> | |
989 | </varlistentry> | |
990 | ||
991 | <varlistentry> | |
992 | <term><computeroutput>BZ_IO_ERROR</computeroutput></term> | |
993 | <listitem><para>Returned by | |
994 | <computeroutput>BZ2_bzRead</computeroutput> and | |
995 | <computeroutput>BZ2_bzWrite</computeroutput> when there is an | |
996 | error reading or writing in the compressed file, and by | |
997 | <computeroutput>BZ2_bzReadOpen</computeroutput> and | |
998 | <computeroutput>BZ2_bzWriteOpen</computeroutput> for attempts | |
999 | to use a file for which the error indicator (viz, | |
1000 | <computeroutput>ferror(f)</computeroutput>) is set. On | |
1001 | receipt of <computeroutput>BZ_IO_ERROR</computeroutput>, the | |
1002 | caller should consult <computeroutput>errno</computeroutput> | |
1003 | and/or <computeroutput>perror</computeroutput> to acquire | |
1004 | operating-system specific information about the | |
1005 | problem.</para></listitem> | |
1006 | </varlistentry> | |
1007 | ||
1008 | <varlistentry> | |
1009 | <term><computeroutput>BZ_UNEXPECTED_EOF</computeroutput></term> | |
1010 | <listitem><para>Returned by | |
1011 | <computeroutput>BZ2_bzRead</computeroutput> when the | |
1012 | compressed file finishes before the logical end of stream is | |
1013 | detected.</para></listitem> | |
1014 | </varlistentry> | |
1015 | ||
1016 | <varlistentry> | |
1017 | <term><computeroutput>BZ_OUTBUFF_FULL</computeroutput></term> | |
1018 | <listitem><para>Returned by | |
1019 | <computeroutput>BZ2_bzBuffToBuffCompress</computeroutput> and | |
1020 | <computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> to | |
1021 | indicate that the output data will not fit into the output | |
1022 | buffer provided.</para></listitem> | |
1023 | </varlistentry> | |
1024 | ||
1025 | </variablelist> | |
1026 | ||
1027 | </sect1> | |
1028 | ||
1029 | ||
1030 | ||
1031 | <sect1 id="low-level" xreflabel=">Low-level interface"> | |
1032 | <title>Low-level interface</title> | |
1033 | ||
1034 | ||
1035 | <sect2 id="bzcompress-init" xreflabel="BZ2_bzCompressInit"> | |
1036 | <title>BZ2_bzCompressInit</title> | |
1037 | ||
1038 | <programlisting> | |
1039 | typedef struct { | |
1040 | char *next_in; | |
1041 | unsigned int avail_in; | |
1042 | unsigned int total_in_lo32; | |
1043 | unsigned int total_in_hi32; | |
1044 | ||
1045 | char *next_out; | |
1046 | unsigned int avail_out; | |
1047 | unsigned int total_out_lo32; | |
1048 | unsigned int total_out_hi32; | |
1049 | ||
1050 | void *state; | |
1051 | ||
1052 | void *(*bzalloc)(void *,int,int); | |
1053 | void (*bzfree)(void *,void *); | |
1054 | void *opaque; | |
1055 | } bz_stream; | |
1056 | ||
1057 | int BZ2_bzCompressInit ( bz_stream *strm, | |
1058 | int blockSize100k, | |
1059 | int verbosity, | |
1060 | int workFactor ); | |
1061 | </programlisting> | |
1062 | ||
1063 | <para>Prepares for compression. The | |
1064 | <computeroutput>bz_stream</computeroutput> structure holds all | |
1065 | data pertaining to the compression activity. A | |
1066 | <computeroutput>bz_stream</computeroutput> structure should be | |
1067 | allocated and initialised prior to the call. The fields of | |
1068 | <computeroutput>bz_stream</computeroutput> comprise the entirety | |
1069 | of the user-visible data. <computeroutput>state</computeroutput> | |
1070 | is a pointer to the private data structures required for | |
1071 | compression.</para> | |
1072 | ||
1073 | <para>Custom memory allocators are supported, via fields | |
1074 | <computeroutput>bzalloc</computeroutput>, | |
1075 | <computeroutput>bzfree</computeroutput>, and | |
1076 | <computeroutput>opaque</computeroutput>. The value | |
1077 | <computeroutput>opaque</computeroutput> is passed to as the first | |
1078 | argument to all calls to <computeroutput>bzalloc</computeroutput> | |
1079 | and <computeroutput>bzfree</computeroutput>, but is otherwise | |
1080 | ignored by the library. The call <computeroutput>bzalloc ( | |
1081 | opaque, n, m )</computeroutput> is expected to return a pointer | |
1082 | <computeroutput>p</computeroutput> to <computeroutput>n * | |
1083 | m</computeroutput> bytes of memory, and <computeroutput>bzfree ( | |
1084 | opaque, p )</computeroutput> should free that memory.</para> | |
1085 | ||
1086 | <para>If you don't want to use a custom memory allocator, set | |
1087 | <computeroutput>bzalloc</computeroutput>, | |
1088 | <computeroutput>bzfree</computeroutput> and | |
1089 | <computeroutput>opaque</computeroutput> to | |
1090 | <computeroutput>NULL</computeroutput>, and the library will then | |
1091 | use the standard <computeroutput>malloc</computeroutput> / | |
1092 | <computeroutput>free</computeroutput> routines.</para> | |
1093 | ||
1094 | <para>Before calling | |
1095 | <computeroutput>BZ2_bzCompressInit</computeroutput>, fields | |
1096 | <computeroutput>bzalloc</computeroutput>, | |
1097 | <computeroutput>bzfree</computeroutput> and | |
1098 | <computeroutput>opaque</computeroutput> should be filled | |
1099 | appropriately, as just described. Upon return, the internal | |
1100 | state will have been allocated and initialised, and | |
1101 | <computeroutput>total_in_lo32</computeroutput>, | |
1102 | <computeroutput>total_in_hi32</computeroutput>, | |
1103 | <computeroutput>total_out_lo32</computeroutput> and | |
1104 | <computeroutput>total_out_hi32</computeroutput> will have been | |
1105 | set to zero. These four fields are used by the library to inform | |
1106 | the caller of the total amount of data passed into and out of the | |
1107 | library, respectively. You should not try to change them. As of | |
1108 | version 1.0, 64-bit counts are maintained, even on 32-bit | |
1109 | platforms, using the <computeroutput>_hi32</computeroutput> | |
1110 | fields to store the upper 32 bits of the count. So, for example, | |
1111 | the total amount of data in is <computeroutput>(total_in_hi32 | |
1112 | << 32) + total_in_lo32</computeroutput>.</para> | |
1113 | ||
1114 | <para>Parameter <computeroutput>blockSize100k</computeroutput> | |
1115 | specifies the block size to be used for compression. It should | |
1116 | be a value between 1 and 9 inclusive, and the actual block size | |
1117 | used is 100000 x this figure. 9 gives the best compression but | |
1118 | takes most memory.</para> | |
1119 | ||
1120 | <para>Parameter <computeroutput>verbosity</computeroutput> should | |
1121 | be set to a number between 0 and 4 inclusive. 0 is silent, and | |
1122 | greater numbers give increasingly verbose monitoring/debugging | |
1123 | output. If the library has been compiled with | |
1124 | <computeroutput>-DBZ_NO_STDIO</computeroutput>, no such output | |
1125 | will appear for any verbosity setting.</para> | |
1126 | ||
1127 | <para>Parameter <computeroutput>workFactor</computeroutput> | |
1128 | controls how the compression phase behaves when presented with | |
1129 | worst case, highly repetitive, input data. If compression runs | |
1130 | into difficulties caused by repetitive data, the library switches | |
1131 | from the standard sorting algorithm to a fallback algorithm. The | |
1132 | fallback is slower than the standard algorithm by perhaps a | |
1133 | factor of three, but always behaves reasonably, no matter how bad | |
1134 | the input.</para> | |
1135 | ||
1136 | <para>Lower values of <computeroutput>workFactor</computeroutput> | |
1137 | reduce the amount of effort the standard algorithm will expend | |
1138 | before resorting to the fallback. You should set this parameter | |
1139 | carefully; too low, and many inputs will be handled by the | |
1140 | fallback algorithm and so compress rather slowly, too high, and | |
1141 | your average-to-worst case compression times can become very | |
1142 | large. The default value of 30 gives reasonable behaviour over a | |
1143 | wide range of circumstances.</para> | |
1144 | ||
1145 | <para>Allowable values range from 0 to 250 inclusive. 0 is a | |
1146 | special case, equivalent to using the default value of 30.</para> | |
1147 | ||
1148 | <para>Note that the compressed output generated is the same | |
1149 | regardless of whether or not the fallback algorithm is | |
1150 | used.</para> | |
1151 | ||
1152 | <para>Be aware also that this parameter may disappear entirely in | |
1153 | future versions of the library. In principle it should be | |
1154 | possible to devise a good way to automatically choose which | |
1155 | algorithm to use. Such a mechanism would render the parameter | |
1156 | obsolete.</para> | |
1157 | ||
1158 | <para>Possible return values:</para> | |
1159 | ||
1160 | <programlisting> | |
1161 | BZ_CONFIG_ERROR | |
1162 | if the library has been mis-compiled | |
1163 | BZ_PARAM_ERROR | |
1164 | if strm is NULL | |
1165 | or blockSize < 1 or blockSize > 9 | |
1166 | or verbosity < 0 or verbosity > 4 | |
1167 | or workFactor < 0 or workFactor > 250 | |
1168 | BZ_MEM_ERROR | |
1169 | if not enough memory is available | |
1170 | BZ_OK | |
1171 | otherwise | |
1172 | </programlisting> | |
1173 | ||
1174 | <para>Allowable next actions:</para> | |
1175 | ||
1176 | <programlisting> | |
1177 | BZ2_bzCompress | |
1178 | if BZ_OK is returned | |
1179 | no specific action needed in case of error | |
1180 | </programlisting> | |
1181 | ||
1182 | </sect2> | |
1183 | ||
1184 | ||
1185 | <sect2 id="bzCompress" xreflabel="BZ2_bzCompress"> | |
1186 | <title>BZ2_bzCompress</title> | |
1187 | ||
1188 | <programlisting> | |
1189 | int BZ2_bzCompress ( bz_stream *strm, int action ); | |
1190 | </programlisting> | |
1191 | ||
1192 | <para>Provides more input and/or output buffer space for the | |
1193 | library. The caller maintains input and output buffers, and | |
1194 | calls <computeroutput>BZ2_bzCompress</computeroutput> to transfer | |
1195 | data between them.</para> | |
1196 | ||
1197 | <para>Before each call to | |
1198 | <computeroutput>BZ2_bzCompress</computeroutput>, | |
1199 | <computeroutput>next_in</computeroutput> should point at the data | |
1200 | to be compressed, and <computeroutput>avail_in</computeroutput> | |
1201 | should indicate how many bytes the library may read. | |
1202 | <computeroutput>BZ2_bzCompress</computeroutput> updates | |
1203 | <computeroutput>next_in</computeroutput>, | |
1204 | <computeroutput>avail_in</computeroutput> and | |
1205 | <computeroutput>total_in</computeroutput> to reflect the number | |
1206 | of bytes it has read.</para> | |
1207 | ||
1208 | <para>Similarly, <computeroutput>next_out</computeroutput> should | |
1209 | point to a buffer in which the compressed data is to be placed, | |
1210 | with <computeroutput>avail_out</computeroutput> indicating how | |
1211 | much output space is available. | |
1212 | <computeroutput>BZ2_bzCompress</computeroutput> updates | |
1213 | <computeroutput>next_out</computeroutput>, | |
1214 | <computeroutput>avail_out</computeroutput> and | |
1215 | <computeroutput>total_out</computeroutput> to reflect the number | |
1216 | of bytes output.</para> | |
1217 | ||
1218 | <para>You may provide and remove as little or as much data as you | |
1219 | like on each call of | |
1220 | <computeroutput>BZ2_bzCompress</computeroutput>. In the limit, | |
1221 | it is acceptable to supply and remove data one byte at a time, | |
1222 | although this would be terribly inefficient. You should always | |
1223 | ensure that at least one byte of output space is available at | |
1224 | each call.</para> | |
1225 | ||
1226 | <para>A second purpose of | |
1227 | <computeroutput>BZ2_bzCompress</computeroutput> is to request a | |
1228 | change of mode of the compressed stream.</para> | |
1229 | ||
1230 | <para>Conceptually, a compressed stream can be in one of four | |
1231 | states: IDLE, RUNNING, FLUSHING and FINISHING. Before | |
1232 | initialisation | |
1233 | (<computeroutput>BZ2_bzCompressInit</computeroutput>) and after | |
1234 | termination (<computeroutput>BZ2_bzCompressEnd</computeroutput>), | |
1235 | a stream is regarded as IDLE.</para> | |
1236 | ||
1237 | <para>Upon initialisation | |
1238 | (<computeroutput>BZ2_bzCompressInit</computeroutput>), the stream | |
1239 | is placed in the RUNNING state. Subsequent calls to | |
1240 | <computeroutput>BZ2_bzCompress</computeroutput> should pass | |
1241 | <computeroutput>BZ_RUN</computeroutput> as the requested action; | |
1242 | other actions are illegal and will result in | |
1243 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput>.</para> | |
1244 | ||
1245 | <para>At some point, the calling program will have provided all | |
1246 | the input data it wants to. It will then want to finish up -- in | |
1247 | effect, asking the library to process any data it might have | |
1248 | buffered internally. In this state, | |
1249 | <computeroutput>BZ2_bzCompress</computeroutput> will no longer | |
1250 | attempt to read data from | |
1251 | <computeroutput>next_in</computeroutput>, but it will want to | |
1252 | write data to <computeroutput>next_out</computeroutput>. Because | |
1253 | the output buffer supplied by the user can be arbitrarily small, | |
1254 | the finishing-up operation cannot necessarily be done with a | |
1255 | single call of | |
1256 | <computeroutput>BZ2_bzCompress</computeroutput>.</para> | |
1257 | ||
1258 | <para>Instead, the calling program passes | |
1259 | <computeroutput>BZ_FINISH</computeroutput> as an action to | |
1260 | <computeroutput>BZ2_bzCompress</computeroutput>. This changes | |
1261 | the stream's state to FINISHING. Any remaining input (ie, | |
1262 | <computeroutput>next_in[0 .. avail_in-1]</computeroutput>) is | |
1263 | compressed and transferred to the output buffer. To do this, | |
1264 | <computeroutput>BZ2_bzCompress</computeroutput> must be called | |
1265 | repeatedly until all the output has been consumed. At that | |
1266 | point, <computeroutput>BZ2_bzCompress</computeroutput> returns | |
1267 | <computeroutput>BZ_STREAM_END</computeroutput>, and the stream's | |
1268 | state is set back to IDLE. | |
1269 | <computeroutput>BZ2_bzCompressEnd</computeroutput> should then be | |
1270 | called.</para> | |
1271 | ||
1272 | <para>Just to make sure the calling program does not cheat, the | |
1273 | library makes a note of <computeroutput>avail_in</computeroutput> | |
1274 | at the time of the first call to | |
1275 | <computeroutput>BZ2_bzCompress</computeroutput> which has | |
1276 | <computeroutput>BZ_FINISH</computeroutput> as an action (ie, at | |
1277 | the time the program has announced its intention to not supply | |
1278 | any more input). By comparing this value with that of | |
1279 | <computeroutput>avail_in</computeroutput> over subsequent calls | |
1280 | to <computeroutput>BZ2_bzCompress</computeroutput>, the library | |
1281 | can detect any attempts to slip in more data to compress. Any | |
1282 | calls for which this is detected will return | |
1283 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput>. This | |
1284 | indicates a programming mistake which should be corrected.</para> | |
1285 | ||
1286 | <para>Instead of asking to finish, the calling program may ask | |
1287 | <computeroutput>BZ2_bzCompress</computeroutput> to take all the | |
1288 | remaining input, compress it and terminate the current | |
1289 | (Burrows-Wheeler) compression block. This could be useful for | |
1290 | error control purposes. The mechanism is analogous to that for | |
1291 | finishing: call <computeroutput>BZ2_bzCompress</computeroutput> | |
1292 | with an action of <computeroutput>BZ_FLUSH</computeroutput>, | |
1293 | remove output data, and persist with the | |
1294 | <computeroutput>BZ_FLUSH</computeroutput> action until the value | |
1295 | <computeroutput>BZ_RUN</computeroutput> is returned. As with | |
1296 | finishing, <computeroutput>BZ2_bzCompress</computeroutput> | |
1297 | detects any attempt to provide more input data once the flush has | |
1298 | begun.</para> | |
1299 | ||
1300 | <para>Once the flush is complete, the stream returns to the | |
1301 | normal RUNNING state.</para> | |
1302 | ||
1303 | <para>This all sounds pretty complex, but isn't really. Here's a | |
1304 | table which shows which actions are allowable in each state, what | |
1305 | action will be taken, what the next state is, and what the | |
1306 | non-error return values are. Note that you can't explicitly ask | |
1307 | what state the stream is in, but nor do you need to -- it can be | |
1308 | inferred from the values returned by | |
1309 | <computeroutput>BZ2_bzCompress</computeroutput>.</para> | |
1310 | ||
1311 | <programlisting> | |
1312 | IDLE/any | |
1313 | Illegal. IDLE state only exists after BZ2_bzCompressEnd or | |
1314 | before BZ2_bzCompressInit. | |
1315 | Return value = BZ_SEQUENCE_ERROR | |
1316 | ||
1317 | RUNNING/BZ_RUN | |
1318 | Compress from next_in to next_out as much as possible. | |
1319 | Next state = RUNNING | |
1320 | Return value = BZ_RUN_OK | |
1321 | ||
1322 | RUNNING/BZ_FLUSH | |
1323 | Remember current value of next_in. Compress from next_in | |
1324 | to next_out as much as possible, but do not accept any more input. | |
1325 | Next state = FLUSHING | |
1326 | Return value = BZ_FLUSH_OK | |
1327 | ||
1328 | RUNNING/BZ_FINISH | |
1329 | Remember current value of next_in. Compress from next_in | |
1330 | to next_out as much as possible, but do not accept any more input. | |
1331 | Next state = FINISHING | |
1332 | Return value = BZ_FINISH_OK | |
1333 | ||
1334 | FLUSHING/BZ_FLUSH | |
1335 | Compress from next_in to next_out as much as possible, | |
1336 | but do not accept any more input. | |
1337 | If all the existing input has been used up and all compressed | |
1338 | output has been removed | |
1339 | Next state = RUNNING; Return value = BZ_RUN_OK | |
1340 | else | |
1341 | Next state = FLUSHING; Return value = BZ_FLUSH_OK | |
1342 | ||
1343 | FLUSHING/other | |
1344 | Illegal. | |
1345 | Return value = BZ_SEQUENCE_ERROR | |
1346 | ||
1347 | FINISHING/BZ_FINISH | |
1348 | Compress from next_in to next_out as much as possible, | |
1349 | but to not accept any more input. | |
1350 | If all the existing input has been used up and all compressed | |
1351 | output has been removed | |
1352 | Next state = IDLE; Return value = BZ_STREAM_END | |
1353 | else | |
1354 | Next state = FINISHING; Return value = BZ_FINISH_OK | |
1355 | ||
1356 | FINISHING/other | |
1357 | Illegal. | |
1358 | Return value = BZ_SEQUENCE_ERROR | |
1359 | </programlisting> | |
1360 | ||
1361 | ||
1362 | <para>That still looks complicated? Well, fair enough. The | |
1363 | usual sequence of calls for compressing a load of data is:</para> | |
1364 | ||
1365 | <orderedlist> | |
1366 | ||
1367 | <listitem><para>Get started with | |
1368 | <computeroutput>BZ2_bzCompressInit</computeroutput>.</para></listitem> | |
1369 | ||
1370 | <listitem><para>Shovel data in and shlurp out its compressed form | |
1371 | using zero or more calls of | |
1372 | <computeroutput>BZ2_bzCompress</computeroutput> with action = | |
1373 | <computeroutput>BZ_RUN</computeroutput>.</para></listitem> | |
1374 | ||
1375 | <listitem><para>Finish up. Repeatedly call | |
1376 | <computeroutput>BZ2_bzCompress</computeroutput> with action = | |
1377 | <computeroutput>BZ_FINISH</computeroutput>, copying out the | |
1378 | compressed output, until | |
1379 | <computeroutput>BZ_STREAM_END</computeroutput> is | |
1380 | returned.</para></listitem> <listitem><para>Close up and go home. Call | |
1381 | <computeroutput>BZ2_bzCompressEnd</computeroutput>.</para></listitem> | |
1382 | ||
1383 | </orderedlist> | |
1384 | ||
1385 | <para>If the data you want to compress fits into your input | |
1386 | buffer all at once, you can skip the calls of | |
1387 | <computeroutput>BZ2_bzCompress ( ..., BZ_RUN )</computeroutput> | |
1388 | and just do the <computeroutput>BZ2_bzCompress ( ..., BZ_FINISH | |
1389 | )</computeroutput> calls.</para> | |
1390 | ||
1391 | <para>All required memory is allocated by | |
1392 | <computeroutput>BZ2_bzCompressInit</computeroutput>. The | |
1393 | compression library can accept any data at all (obviously). So | |
1394 | you shouldn't get any error return values from the | |
1395 | <computeroutput>BZ2_bzCompress</computeroutput> calls. If you | |
1396 | do, they will be | |
1397 | <computeroutput>BZ_SEQUENCE_ERROR</computeroutput>, and indicate | |
1398 | a bug in your programming.</para> | |
1399 | ||
1400 | <para>Trivial other possible return values:</para> | |
1401 | ||
1402 | <programlisting> | |
1403 | BZ_PARAM_ERROR | |
1404 | if strm is NULL, or strm->s is NULL | |
1405 | </programlisting> | |
1406 | ||
1407 | </sect2> | |
1408 | ||
1409 | ||
1410 | <sect2 id="bzCompress-end" xreflabel="BZ2_bzCompressEnd"> | |
1411 | <title>BZ2_bzCompressEnd</title> | |
1412 | ||
1413 | <programlisting> | |
1414 | int BZ2_bzCompressEnd ( bz_stream *strm ); | |
1415 | </programlisting> | |
1416 | ||
1417 | <para>Releases all memory associated with a compression | |
1418 | stream.</para> | |
1419 | ||
1420 | <para>Possible return values:</para> | |
1421 | ||
1422 | <programlisting> | |
1423 | BZ_PARAM_ERROR if strm is NULL or strm->s is NULL | |
1424 | BZ_OK otherwise | |
1425 | </programlisting> | |
1426 | ||
1427 | </sect2> | |
1428 | ||
1429 | ||
1430 | <sect2 id="bzDecompress-init" xreflabel="BZ2_bzDecompressInit"> | |
1431 | <title>BZ2_bzDecompressInit</title> | |
1432 | ||
1433 | <programlisting> | |
1434 | int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small ); | |
1435 | </programlisting> | |
1436 | ||
1437 | <para>Prepares for decompression. As with | |
1438 | <computeroutput>BZ2_bzCompressInit</computeroutput>, a | |
1439 | <computeroutput>bz_stream</computeroutput> record should be | |
1440 | allocated and initialised before the call. Fields | |
1441 | <computeroutput>bzalloc</computeroutput>, | |
1442 | <computeroutput>bzfree</computeroutput> and | |
1443 | <computeroutput>opaque</computeroutput> should be set if a custom | |
1444 | memory allocator is required, or made | |
1445 | <computeroutput>NULL</computeroutput> for the normal | |
1446 | <computeroutput>malloc</computeroutput> / | |
1447 | <computeroutput>free</computeroutput> routines. Upon return, the | |
1448 | internal state will have been initialised, and | |
1449 | <computeroutput>total_in</computeroutput> and | |
1450 | <computeroutput>total_out</computeroutput> will be zero.</para> | |
1451 | ||
1452 | <para>For the meaning of parameter | |
1453 | <computeroutput>verbosity</computeroutput>, see | |
1454 | <computeroutput>BZ2_bzCompressInit</computeroutput>.</para> | |
1455 | ||
1456 | <para>If <computeroutput>small</computeroutput> is nonzero, the | |
1457 | library will use an alternative decompression algorithm which | |
1458 | uses less memory but at the cost of decompressing more slowly | |
1459 | (roughly speaking, half the speed, but the maximum memory | |
1460 | requirement drops to around 2300k). See <xref linkend="using"/> | |
1461 | for more information on memory management.</para> | |
1462 | ||
1463 | <para>Note that the amount of memory needed to decompress a | |
1464 | stream cannot be determined until the stream's header has been | |
1465 | read, so even if | |
1466 | <computeroutput>BZ2_bzDecompressInit</computeroutput> succeeds, a | |
1467 | subsequent <computeroutput>BZ2_bzDecompress</computeroutput> | |
1468 | could fail with | |
1469 | <computeroutput>BZ_MEM_ERROR</computeroutput>.</para> | |
1470 | ||
1471 | <para>Possible return values:</para> | |
1472 | ||
1473 | <programlisting> | |
1474 | BZ_CONFIG_ERROR | |
1475 | if the library has been mis-compiled | |
1476 | BZ_PARAM_ERROR | |
1477 | if ( small != 0 && small != 1 ) | |
1478 | or (verbosity <; 0 || verbosity > 4) | |
1479 | BZ_MEM_ERROR | |
1480 | if insufficient memory is available | |
1481 | </programlisting> | |
1482 | ||
1483 | <para>Allowable next actions:</para> | |
1484 | ||
1485 | <programlisting> | |
1486 | BZ2_bzDecompress | |
1487 | if BZ_OK was returned | |
1488 | no specific action required in case of error | |
1489 | </programlisting> | |
1490 | ||
1491 | </sect2> | |
1492 | ||
1493 | ||
1494 | <sect2 id="bzDecompress" xreflabel="BZ2_bzDecompress"> | |
1495 | <title>BZ2_bzDecompress</title> | |
1496 | ||
1497 | <programlisting> | |
1498 | int BZ2_bzDecompress ( bz_stream *strm ); | |
1499 | </programlisting> | |
1500 | ||
1501 | <para>Provides more input and/out output buffer space for the | |
1502 | library. The caller maintains input and output buffers, and uses | |
1503 | <computeroutput>BZ2_bzDecompress</computeroutput> to transfer | |
1504 | data between them.</para> | |
1505 | ||
1506 | <para>Before each call to | |
1507 | <computeroutput>BZ2_bzDecompress</computeroutput>, | |
1508 | <computeroutput>next_in</computeroutput> should point at the | |
1509 | compressed data, and <computeroutput>avail_in</computeroutput> | |
1510 | should indicate how many bytes the library may read. | |
1511 | <computeroutput>BZ2_bzDecompress</computeroutput> updates | |
1512 | <computeroutput>next_in</computeroutput>, | |
1513 | <computeroutput>avail_in</computeroutput> and | |
1514 | <computeroutput>total_in</computeroutput> to reflect the number | |
1515 | of bytes it has read.</para> | |
1516 | ||
1517 | <para>Similarly, <computeroutput>next_out</computeroutput> should | |
1518 | point to a buffer in which the uncompressed output is to be | |
1519 | placed, with <computeroutput>avail_out</computeroutput> | |
1520 | indicating how much output space is available. | |
1521 | <computeroutput>BZ2_bzCompress</computeroutput> updates | |
1522 | <computeroutput>next_out</computeroutput>, | |
1523 | <computeroutput>avail_out</computeroutput> and | |
1524 | <computeroutput>total_out</computeroutput> to reflect the number | |
1525 | of bytes output.</para> | |
1526 | ||
1527 | <para>You may provide and remove as little or as much data as you | |
1528 | like on each call of | |
1529 | <computeroutput>BZ2_bzDecompress</computeroutput>. In the limit, | |
1530 | it is acceptable to supply and remove data one byte at a time, | |
1531 | although this would be terribly inefficient. You should always | |
1532 | ensure that at least one byte of output space is available at | |
1533 | each call.</para> | |
1534 | ||
1535 | <para>Use of <computeroutput>BZ2_bzDecompress</computeroutput> is | |
1536 | simpler than | |
1537 | <computeroutput>BZ2_bzCompress</computeroutput>.</para> | |
1538 | ||
1539 | <para>You should provide input and remove output as described | |
1540 | above, and repeatedly call | |
1541 | <computeroutput>BZ2_bzDecompress</computeroutput> until | |
1542 | <computeroutput>BZ_STREAM_END</computeroutput> is returned. | |
1543 | Appearance of <computeroutput>BZ_STREAM_END</computeroutput> | |
1544 | denotes that <computeroutput>BZ2_bzDecompress</computeroutput> | |
1545 | has detected the logical end of the compressed stream. | |
1546 | <computeroutput>BZ2_bzDecompress</computeroutput> will not | |
1547 | produce <computeroutput>BZ_STREAM_END</computeroutput> until all | |
1548 | output data has been placed into the output buffer, so once | |
1549 | <computeroutput>BZ_STREAM_END</computeroutput> appears, you are | |
1550 | guaranteed to have available all the decompressed output, and | |
1551 | <computeroutput>BZ2_bzDecompressEnd</computeroutput> can safely | |
1552 | be called.</para> | |
1553 | ||
1554 | <para>If case of an error return value, you should call | |
1555 | <computeroutput>BZ2_bzDecompressEnd</computeroutput> to clean up | |
1556 | and release memory.</para> | |
1557 | ||
1558 | <para>Possible return values:</para> | |
1559 | ||
1560 | <programlisting> | |
1561 | BZ_PARAM_ERROR | |
1562 | if strm is NULL or strm->s is NULL | |
1563 | or strm->avail_out < 1 | |
1564 | BZ_DATA_ERROR | |
1565 | if a data integrity error is detected in the compressed stream | |
1566 | BZ_DATA_ERROR_MAGIC | |
1567 | if the compressed stream doesn't begin with the right magic bytes | |
1568 | BZ_MEM_ERROR | |
1569 | if there wasn't enough memory available | |
1570 | BZ_STREAM_END | |
1571 | if the logical end of the data stream was detected and all | |
1572 | output in has been consumed, eg s-->avail_out > 0 | |
1573 | BZ_OK | |
1574 | otherwise | |
1575 | </programlisting> | |
1576 | ||
1577 | <para>Allowable next actions:</para> | |
1578 | ||
1579 | <programlisting> | |
1580 | BZ2_bzDecompress | |
1581 | if BZ_OK was returned | |
1582 | BZ2_bzDecompressEnd | |
1583 | otherwise | |
1584 | </programlisting> | |
1585 | ||
1586 | </sect2> | |
1587 | ||
1588 | ||
1589 | <sect2 id="bzDecompress-end" xreflabel="BZ2_bzDecompressEnd"> | |
1590 | <title>BZ2_bzDecompressEnd</title> | |
1591 | ||
1592 | <programlisting> | |
1593 | int BZ2_bzDecompressEnd ( bz_stream *strm ); | |
1594 | </programlisting> | |
1595 | ||
1596 | <para>Releases all memory associated with a decompression | |
1597 | stream.</para> | |
1598 | ||
1599 | <para>Possible return values:</para> | |
1600 | ||
1601 | <programlisting> | |
1602 | BZ_PARAM_ERROR | |
1603 | if strm is NULL or strm->s is NULL | |
1604 | BZ_OK | |
1605 | otherwise | |
1606 | </programlisting> | |
1607 | ||
1608 | <para>Allowable next actions:</para> | |
1609 | ||
1610 | <programlisting> | |
1611 | None. | |
1612 | </programlisting> | |
1613 | ||
1614 | </sect2> | |
1615 | ||
1616 | </sect1> | |
1617 | ||
1618 | ||
1619 | <sect1 id="hl-interface" xreflabel="High-level interface"> | |
1620 | <title>High-level interface</title> | |
1621 | ||
1622 | <para>This interface provides functions for reading and writing | |
1623 | <computeroutput>bzip2</computeroutput> format files. First, some | |
1624 | general points.</para> | |
1625 | ||
1626 | <itemizedlist mark='bullet'> | |
1627 | ||
1628 | <listitem><para>All of the functions take an | |
1629 | <computeroutput>int*</computeroutput> first argument, | |
1630 | <computeroutput>bzerror</computeroutput>. After each call, | |
1631 | <computeroutput>bzerror</computeroutput> should be consulted | |
1632 | first to determine the outcome of the call. If | |
1633 | <computeroutput>bzerror</computeroutput> is | |
1634 | <computeroutput>BZ_OK</computeroutput>, the call completed | |
1635 | successfully, and only then should the return value of the | |
1636 | function (if any) be consulted. If | |
1637 | <computeroutput>bzerror</computeroutput> is | |
1638 | <computeroutput>BZ_IO_ERROR</computeroutput>, there was an | |
1639 | error reading/writing the underlying compressed file, and you | |
1640 | should then consult <computeroutput>errno</computeroutput> / | |
1641 | <computeroutput>perror</computeroutput> to determine the cause | |
1642 | of the difficulty. <computeroutput>bzerror</computeroutput> | |
1643 | may also be set to various other values; precise details are | |
1644 | given on a per-function basis below.</para></listitem> | |
1645 | ||
1646 | <listitem><para>If <computeroutput>bzerror</computeroutput> indicates | |
1647 | an error (ie, anything except | |
1648 | <computeroutput>BZ_OK</computeroutput> and | |
1649 | <computeroutput>BZ_STREAM_END</computeroutput>), you should | |
1650 | immediately call | |
1651 | <computeroutput>BZ2_bzReadClose</computeroutput> (or | |
1652 | <computeroutput>BZ2_bzWriteClose</computeroutput>, depending on | |
1653 | whether you are attempting to read or to write) to free up all | |
1654 | resources associated with the stream. Once an error has been | |
1655 | indicated, behaviour of all calls except | |
1656 | <computeroutput>BZ2_bzReadClose</computeroutput> | |
1657 | (<computeroutput>BZ2_bzWriteClose</computeroutput>) is | |
1658 | undefined. The implication is that (1) | |
1659 | <computeroutput>bzerror</computeroutput> should be checked | |
1660 | after each call, and (2) if | |
1661 | <computeroutput>bzerror</computeroutput> indicates an error, | |
1662 | <computeroutput>BZ2_bzReadClose</computeroutput> | |
1663 | (<computeroutput>BZ2_bzWriteClose</computeroutput>) should then | |
1664 | be called to clean up.</para></listitem> | |
1665 | ||
1666 | <listitem><para>The <computeroutput>FILE*</computeroutput> arguments | |
1667 | passed to <computeroutput>BZ2_bzReadOpen</computeroutput> / | |
1668 | <computeroutput>BZ2_bzWriteOpen</computeroutput> should be set | |
1669 | to binary mode. Most Unix systems will do this by default, but | |
1670 | other platforms, including Windows and Mac, will not. If you | |
1671 | omit this, you may encounter problems when moving code to new | |
1672 | platforms.</para></listitem> | |
1673 | ||
1674 | <listitem><para>Memory allocation requests are handled by | |
1675 | <computeroutput>malloc</computeroutput> / | |
1676 | <computeroutput>free</computeroutput>. At present there is no | |
1677 | facility for user-defined memory allocators in the file I/O | |
1678 | functions (could easily be added, though).</para></listitem> | |
1679 | ||
1680 | </itemizedlist> | |
1681 | ||
1682 | ||
1683 | ||
1684 | <sect2 id="bzreadopen" xreflabel="BZ2_bzReadOpen"> | |
1685 | <title>BZ2_bzReadOpen</title> | |
1686 | ||
1687 | <programlisting> | |
1688 | typedef void BZFILE; | |
1689 | ||
1690 | BZFILE *BZ2_bzReadOpen( int *bzerror, FILE *f, | |
1691 | int verbosity, int small, | |
1692 | void *unused, int nUnused ); | |
1693 | </programlisting> | |
1694 | ||
1695 | <para>Prepare to read compressed data from file handle | |
1696 | <computeroutput>f</computeroutput>. | |
1697 | <computeroutput>f</computeroutput> should refer to a file which | |
1698 | has been opened for reading, and for which the error indicator | |
1699 | (<computeroutput>ferror(f)</computeroutput>)is not set. If | |
1700 | <computeroutput>small</computeroutput> is 1, the library will try | |
1701 | to decompress using less memory, at the expense of speed.</para> | |
1702 | ||
1703 | <para>For reasons explained below, | |
1704 | <computeroutput>BZ2_bzRead</computeroutput> will decompress the | |
1705 | <computeroutput>nUnused</computeroutput> bytes starting at | |
1706 | <computeroutput>unused</computeroutput>, before starting to read | |
1707 | from the file <computeroutput>f</computeroutput>. At most | |
1708 | <computeroutput>BZ_MAX_UNUSED</computeroutput> bytes may be | |
1709 | supplied like this. If this facility is not required, you should | |
1710 | pass <computeroutput>NULL</computeroutput> and | |
1711 | <computeroutput>0</computeroutput> for | |
1712 | <computeroutput>unused</computeroutput> and | |
1713 | n<computeroutput>Unused</computeroutput> respectively.</para> | |
1714 | ||
1715 | <para>For the meaning of parameters | |
1716 | <computeroutput>small</computeroutput> and | |
1717 | <computeroutput>verbosity</computeroutput>, see | |
1718 | <computeroutput>BZ2_bzDecompressInit</computeroutput>.</para> | |
1719 | ||
1720 | <para>The amount of memory needed to decompress a file cannot be | |
1721 | determined until the file's header has been read. So it is | |
1722 | possible that <computeroutput>BZ2_bzReadOpen</computeroutput> | |
1723 | returns <computeroutput>BZ_OK</computeroutput> but a subsequent | |
1724 | call of <computeroutput>BZ2_bzRead</computeroutput> will return | |
1725 | <computeroutput>BZ_MEM_ERROR</computeroutput>.</para> | |
1726 | ||
1727 | <para>Possible assignments to | |
1728 | <computeroutput>bzerror</computeroutput>:</para> | |
1729 | ||
1730 | <programlisting> | |
1731 | BZ_CONFIG_ERROR | |
1732 | if the library has been mis-compiled | |
1733 | BZ_PARAM_ERROR | |
1734 | if f is NULL | |
1735 | or small is neither 0 nor 1 | |
1736 | or ( unused == NULL && nUnused != 0 ) | |
1737 | or ( unused != NULL && !(0 <= nUnused <= BZ_MAX_UNUSED) ) | |
1738 | BZ_IO_ERROR | |
1739 | if ferror(f) is nonzero | |
1740 | BZ_MEM_ERROR | |
1741 | if insufficient memory is available | |
1742 | BZ_OK | |
1743 | otherwise. | |
1744 | </programlisting> | |
1745 | ||
1746 | <para>Possible return values:</para> | |
1747 | ||
1748 | <programlisting> | |
1749 | Pointer to an abstract BZFILE | |
1750 | if bzerror is BZ_OK | |
1751 | NULL | |
1752 | otherwise | |
1753 | </programlisting> | |
1754 | ||
1755 | <para>Allowable next actions:</para> | |
1756 | ||
1757 | <programlisting> | |
1758 | BZ2_bzRead | |
1759 | if bzerror is BZ_OK | |
1760 | BZ2_bzClose | |
1761 | otherwise | |
1762 | </programlisting> | |
1763 | ||
1764 | </sect2> | |
1765 | ||
1766 | ||
1767 | <sect2 id="bzread" xreflabel="BZ2_bzRead"> | |
1768 | <title>BZ2_bzRead</title> | |
1769 | ||
1770 | <programlisting> | |
1771 | int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len ); | |
1772 | </programlisting> | |
1773 | ||
1774 | <para>Reads up to <computeroutput>len</computeroutput> | |
1775 | (uncompressed) bytes from the compressed file | |
1776 | <computeroutput>b</computeroutput> into the buffer | |
1777 | <computeroutput>buf</computeroutput>. If the read was | |
1778 | successful, <computeroutput>bzerror</computeroutput> is set to | |
1779 | <computeroutput>BZ_OK</computeroutput> and the number of bytes | |
1780 | read is returned. If the logical end-of-stream was detected, | |
1781 | <computeroutput>bzerror</computeroutput> will be set to | |
1782 | <computeroutput>BZ_STREAM_END</computeroutput>, and the number of | |
1783 | bytes read is returned. All other | |
1784 | <computeroutput>bzerror</computeroutput> values denote an | |
1785 | error.</para> | |
1786 | ||
1787 | <para><computeroutput>BZ2_bzRead</computeroutput> will supply | |
1788 | <computeroutput>len</computeroutput> bytes, unless the logical | |
1789 | stream end is detected or an error occurs. Because of this, it | |
1790 | is possible to detect the stream end by observing when the number | |
1791 | of bytes returned is less than the number requested. | |
1792 | Nevertheless, this is regarded as inadvisable; you should instead | |
1793 | check <computeroutput>bzerror</computeroutput> after every call | |
1794 | and watch out for | |
1795 | <computeroutput>BZ_STREAM_END</computeroutput>.</para> | |
1796 | ||
1797 | <para>Internally, <computeroutput>BZ2_bzRead</computeroutput> | |
1798 | copies data from the compressed file in chunks of size | |
1799 | <computeroutput>BZ_MAX_UNUSED</computeroutput> bytes before | |
1800 | decompressing it. If the file contains more bytes than strictly | |
1801 | needed to reach the logical end-of-stream, | |
1802 | <computeroutput>BZ2_bzRead</computeroutput> will almost certainly | |
1803 | read some of the trailing data before signalling | |
1804 | <computeroutput>BZ_SEQUENCE_END</computeroutput>. To collect the | |
1805 | read but unused data once | |
1806 | <computeroutput>BZ_SEQUENCE_END</computeroutput> has appeared, | |
1807 | call <computeroutput>BZ2_bzReadGetUnused</computeroutput> | |
1808 | immediately before | |
1809 | <computeroutput>BZ2_bzReadClose</computeroutput>.</para> | |
1810 | ||
1811 | <para>Possible assignments to | |
1812 | <computeroutput>bzerror</computeroutput>:</para> | |
1813 | ||
1814 | <programlisting> | |
1815 | BZ_PARAM_ERROR | |
1816 | if b is NULL or buf is NULL or len < 0 | |
1817 | BZ_SEQUENCE_ERROR | |
1818 | if b was opened with BZ2_bzWriteOpen | |
1819 | BZ_IO_ERROR | |
1820 | if there is an error reading from the compressed file | |
1821 | BZ_UNEXPECTED_EOF | |
1822 | if the compressed file ended before | |
1823 | the logical end-of-stream was detected | |
1824 | BZ_DATA_ERROR | |
1825 | if a data integrity error was detected in the compressed stream | |
1826 | BZ_DATA_ERROR_MAGIC | |
1827 | if the stream does not begin with the requisite header bytes | |
1828 | (ie, is not a bzip2 data file). This is really | |
1829 | a special case of BZ_DATA_ERROR. | |
1830 | BZ_MEM_ERROR | |
1831 | if insufficient memory was available | |
1832 | BZ_STREAM_END | |
1833 | if the logical end of stream was detected. | |
1834 | BZ_OK | |
1835 | otherwise. | |
1836 | </programlisting> | |
1837 | ||
1838 | <para>Possible return values:</para> | |
1839 | ||
1840 | <programlisting> | |
1841 | number of bytes read | |
1842 | if bzerror is BZ_OK or BZ_STREAM_END | |
1843 | undefined | |
1844 | otherwise | |
1845 | </programlisting> | |
1846 | ||
1847 | <para>Allowable next actions:</para> | |
1848 | ||
1849 | <programlisting> | |
1850 | collect data from buf, then BZ2_bzRead or BZ2_bzReadClose | |
1851 | if bzerror is BZ_OK | |
1852 | collect data from buf, then BZ2_bzReadClose or BZ2_bzReadGetUnused | |
1853 | if bzerror is BZ_SEQUENCE_END | |
1854 | BZ2_bzReadClose | |
1855 | otherwise | |
1856 | </programlisting> | |
1857 | ||
1858 | </sect2> | |
1859 | ||
1860 | ||
1861 | <sect2 id="bzreadgetunused" xreflabel="BZ2_bzReadGetUnused"> | |
1862 | <title>BZ2_bzReadGetUnused</title> | |
1863 | ||
1864 | <programlisting> | |
1865 | void BZ2_bzReadGetUnused( int* bzerror, BZFILE *b, | |
1866 | void** unused, int* nUnused ); | |
1867 | </programlisting> | |
1868 | ||
1869 | <para>Returns data which was read from the compressed file but | |
1870 | was not needed to get to the logical end-of-stream. | |
1871 | <computeroutput>*unused</computeroutput> is set to the address of | |
1872 | the data, and <computeroutput>*nUnused</computeroutput> to the | |
1873 | number of bytes. <computeroutput>*nUnused</computeroutput> will | |
1874 | be set to a value between <computeroutput>0</computeroutput> and | |
1875 | <computeroutput>BZ_MAX_UNUSED</computeroutput> inclusive.</para> | |
1876 | ||
1877 | <para>This function may only be called once | |
1878 | <computeroutput>BZ2_bzRead</computeroutput> has signalled | |
1879 | <computeroutput>BZ_STREAM_END</computeroutput> but before | |
1880 | <computeroutput>BZ2_bzReadClose</computeroutput>.</para> | |
1881 | ||
1882 | <para>Possible assignments to | |
1883 | <computeroutput>bzerror</computeroutput>:</para> | |
1884 | ||
1885 | <programlisting> | |
1886 | BZ_PARAM_ERROR | |
1887 | if b is NULL | |
1888 | or unused is NULL or nUnused is NULL | |
1889 | BZ_SEQUENCE_ERROR | |
1890 | if BZ_STREAM_END has not been signalled | |
1891 | or if b was opened with BZ2_bzWriteOpen | |
1892 | BZ_OK | |
1893 | otherwise | |
1894 | </programlisting> | |
1895 | ||
1896 | <para>Allowable next actions:</para> | |
1897 | ||
1898 | <programlisting> | |
1899 | BZ2_bzReadClose | |
1900 | </programlisting> | |
1901 | ||
1902 | </sect2> | |
1903 | ||
1904 | ||
1905 | <sect2 id="bzreadclose" xreflabel="BZ2_bzReadClose"> | |
1906 | <title>BZ2_bzReadClose</title> | |
1907 | ||
1908 | <programlisting> | |
1909 | void BZ2_bzReadClose ( int *bzerror, BZFILE *b ); | |
1910 | </programlisting> | |
1911 | ||
1912 | <para>Releases all memory pertaining to the compressed file | |
1913 | <computeroutput>b</computeroutput>. | |
1914 | <computeroutput>BZ2_bzReadClose</computeroutput> does not call | |
1915 | <computeroutput>fclose</computeroutput> on the underlying file | |
1916 | handle, so you should do that yourself if appropriate. | |
1917 | <computeroutput>BZ2_bzReadClose</computeroutput> should be called | |
1918 | to clean up after all error situations.</para> | |
1919 | ||
1920 | <para>Possible assignments to | |
1921 | <computeroutput>bzerror</computeroutput>:</para> | |
1922 | ||
1923 | <programlisting> | |
1924 | BZ_SEQUENCE_ERROR | |
1925 | if b was opened with BZ2_bzOpenWrite | |
1926 | BZ_OK | |
1927 | otherwise | |
1928 | </programlisting> | |
1929 | ||
1930 | <para>Allowable next actions:</para> | |
1931 | ||
1932 | <programlisting> | |
1933 | none | |
1934 | </programlisting> | |
1935 | ||
1936 | </sect2> | |
1937 | ||
1938 | ||
1939 | <sect2 id="bzwriteopen" xreflabel="BZ2_bzWriteOpen"> | |
1940 | <title>BZ2_bzWriteOpen</title> | |
1941 | ||
1942 | <programlisting> | |
1943 | BZFILE *BZ2_bzWriteOpen( int *bzerror, FILE *f, | |
1944 | int blockSize100k, int verbosity, | |
1945 | int workFactor ); | |
1946 | </programlisting> | |
1947 | ||
1948 | <para>Prepare to write compressed data to file handle | |
1949 | <computeroutput>f</computeroutput>. | |
1950 | <computeroutput>f</computeroutput> should refer to a file which | |
1951 | has been opened for writing, and for which the error indicator | |
1952 | (<computeroutput>ferror(f)</computeroutput>)is not set.</para> | |
1953 | ||
1954 | <para>For the meaning of parameters | |
1955 | <computeroutput>blockSize100k</computeroutput>, | |
1956 | <computeroutput>verbosity</computeroutput> and | |
1957 | <computeroutput>workFactor</computeroutput>, see | |
1958 | <computeroutput>BZ2_bzCompressInit</computeroutput>.</para> | |
1959 | ||
1960 | <para>All required memory is allocated at this stage, so if the | |
1961 | call completes successfully, | |
1962 | <computeroutput>BZ_MEM_ERROR</computeroutput> cannot be signalled | |
1963 | by a subsequent call to | |
1964 | <computeroutput>BZ2_bzWrite</computeroutput>.</para> | |
1965 | ||
1966 | <para>Possible assignments to | |
1967 | <computeroutput>bzerror</computeroutput>:</para> | |
1968 | ||
1969 | <programlisting> | |
1970 | BZ_CONFIG_ERROR | |
1971 | if the library has been mis-compiled | |
1972 | BZ_PARAM_ERROR | |
1973 | if f is NULL | |
1974 | or blockSize100k < 1 or blockSize100k > 9 | |
1975 | BZ_IO_ERROR | |
1976 | if ferror(f) is nonzero | |
1977 | BZ_MEM_ERROR | |
1978 | if insufficient memory is available | |
1979 | BZ_OK | |
1980 | otherwise | |
1981 | </programlisting> | |
1982 | ||
1983 | <para>Possible return values:</para> | |
1984 | ||
1985 | <programlisting> | |
1986 | Pointer to an abstract BZFILE | |
1987 | if bzerror is BZ_OK | |
1988 | NULL | |
1989 | otherwise | |
1990 | </programlisting> | |
1991 | ||
1992 | <para>Allowable next actions:</para> | |
1993 | ||
1994 | <programlisting> | |
1995 | BZ2_bzWrite | |
1996 | if bzerror is BZ_OK | |
1997 | (you could go directly to BZ2_bzWriteClose, but this would be pretty pointless) | |
1998 | BZ2_bzWriteClose | |
1999 | otherwise | |
2000 | </programlisting> | |
2001 | ||
2002 | </sect2> | |
2003 | ||
2004 | ||
2005 | <sect2 id="bzwrite" xreflabel="BZ2_bzWrite"> | |
2006 | <title>BZ2_bzWrite</title> | |
2007 | ||
2008 | <programlisting> | |
2009 | void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len ); | |
2010 | </programlisting> | |
2011 | ||
2012 | <para>Absorbs <computeroutput>len</computeroutput> bytes from the | |
2013 | buffer <computeroutput>buf</computeroutput>, eventually to be | |
2014 | compressed and written to the file.</para> | |
2015 | ||
2016 | <para>Possible assignments to | |
2017 | <computeroutput>bzerror</computeroutput>:</para> | |
2018 | ||
2019 | <programlisting> | |
2020 | BZ_PARAM_ERROR | |
2021 | if b is NULL or buf is NULL or len < 0 | |
2022 | BZ_SEQUENCE_ERROR | |
2023 | if b was opened with BZ2_bzReadOpen | |
2024 | BZ_IO_ERROR | |
2025 | if there is an error writing the compressed file. | |
2026 | BZ_OK | |
2027 | otherwise | |
2028 | </programlisting> | |
2029 | ||
2030 | </sect2> | |
2031 | ||
2032 | ||
2033 | <sect2 id="bzwriteclose" xreflabel="BZ2_bzWriteClose"> | |
2034 | <title>BZ2_bzWriteClose</title> | |
2035 | ||
2036 | <programlisting> | |
2037 | void BZ2_bzWriteClose( int *bzerror, BZFILE* f, | |
2038 | int abandon, | |
2039 | unsigned int* nbytes_in, | |
2040 | unsigned int* nbytes_out ); | |
2041 | ||
2042 | void BZ2_bzWriteClose64( int *bzerror, BZFILE* f, | |
2043 | int abandon, | |
2044 | unsigned int* nbytes_in_lo32, | |
2045 | unsigned int* nbytes_in_hi32, | |
2046 | unsigned int* nbytes_out_lo32, | |
2047 | unsigned int* nbytes_out_hi32 ); | |
2048 | </programlisting> | |
2049 | ||
2050 | <para>Compresses and flushes to the compressed file all data so | |
2051 | far supplied by <computeroutput>BZ2_bzWrite</computeroutput>. | |
2052 | The logical end-of-stream markers are also written, so subsequent | |
2053 | calls to <computeroutput>BZ2_bzWrite</computeroutput> are | |
2054 | illegal. All memory associated with the compressed file | |
2055 | <computeroutput>b</computeroutput> is released. | |
2056 | <computeroutput>fflush</computeroutput> is called on the | |
2057 | compressed file, but it is not | |
2058 | <computeroutput>fclose</computeroutput>'d.</para> | |
2059 | ||
2060 | <para>If <computeroutput>BZ2_bzWriteClose</computeroutput> is | |
2061 | called to clean up after an error, the only action is to release | |
2062 | the memory. The library records the error codes issued by | |
2063 | previous calls, so this situation will be detected automatically. | |
2064 | There is no attempt to complete the compression operation, nor to | |
2065 | <computeroutput>fflush</computeroutput> the compressed file. You | |
2066 | can force this behaviour to happen even in the case of no error, | |
2067 | by passing a nonzero value to | |
2068 | <computeroutput>abandon</computeroutput>.</para> | |
2069 | ||
2070 | <para>If <computeroutput>nbytes_in</computeroutput> is non-null, | |
2071 | <computeroutput>*nbytes_in</computeroutput> will be set to be the | |
2072 | total volume of uncompressed data handled. Similarly, | |
2073 | <computeroutput>nbytes_out</computeroutput> will be set to the | |
2074 | total volume of compressed data written. For compatibility with | |
2075 | older versions of the library, | |
2076 | <computeroutput>BZ2_bzWriteClose</computeroutput> only yields the | |
2077 | lower 32 bits of these counts. Use | |
2078 | <computeroutput>BZ2_bzWriteClose64</computeroutput> if you want | |
2079 | the full 64 bit counts. These two functions are otherwise | |
2080 | absolutely identical.</para> | |
2081 | ||
2082 | <para>Possible assignments to | |
2083 | <computeroutput>bzerror</computeroutput>:</para> | |
2084 | ||
2085 | <programlisting> | |
2086 | BZ_SEQUENCE_ERROR | |
2087 | if b was opened with BZ2_bzReadOpen | |
2088 | BZ_IO_ERROR | |
2089 | if there is an error writing the compressed file | |
2090 | BZ_OK | |
2091 | otherwise | |
2092 | </programlisting> | |
2093 | ||
2094 | </sect2> | |
2095 | ||
2096 | ||
2097 | <sect2 id="embed" xreflabel="Handling embedded compressed data streams"> | |
2098 | <title>Handling embedded compressed data streams</title> | |
2099 | ||
2100 | <para>The high-level library facilitates use of | |
2101 | <computeroutput>bzip2</computeroutput> data streams which form | |
2102 | some part of a surrounding, larger data stream.</para> | |
2103 | ||
2104 | <itemizedlist mark='bullet'> | |
2105 | ||
2106 | <listitem><para>For writing, the library takes an open file handle, | |
2107 | writes compressed data to it, | |
2108 | <computeroutput>fflush</computeroutput>es it but does not | |
2109 | <computeroutput>fclose</computeroutput> it. The calling | |
2110 | application can write its own data before and after the | |
2111 | compressed data stream, using that same file handle.</para></listitem> | |
2112 | ||
2113 | <listitem><para>Reading is more complex, and the facilities are not as | |
2114 | general as they could be since generality is hard to reconcile | |
2115 | with efficiency. <computeroutput>BZ2_bzRead</computeroutput> | |
2116 | reads from the compressed file in blocks of size | |
2117 | <computeroutput>BZ_MAX_UNUSED</computeroutput> bytes, and in | |
2118 | doing so probably will overshoot the logical end of compressed | |
2119 | stream. To recover this data once decompression has ended, | |
2120 | call <computeroutput>BZ2_bzReadGetUnused</computeroutput> after | |
2121 | the last call of <computeroutput>BZ2_bzRead</computeroutput> | |
2122 | (the one returning | |
2123 | <computeroutput>BZ_STREAM_END</computeroutput>) but before | |
2124 | calling | |
2125 | <computeroutput>BZ2_bzReadClose</computeroutput>.</para></listitem> | |
2126 | ||
2127 | </itemizedlist> | |
2128 | ||
2129 | <para>This mechanism makes it easy to decompress multiple | |
2130 | <computeroutput>bzip2</computeroutput> streams placed end-to-end. | |
2131 | As the end of one stream, when | |
2132 | <computeroutput>BZ2_bzRead</computeroutput> returns | |
2133 | <computeroutput>BZ_STREAM_END</computeroutput>, call | |
2134 | <computeroutput>BZ2_bzReadGetUnused</computeroutput> to collect | |
2135 | the unused data (copy it into your own buffer somewhere). That | |
2136 | data forms the start of the next compressed stream. To start | |
2137 | uncompressing that next stream, call | |
2138 | <computeroutput>BZ2_bzReadOpen</computeroutput> again, feeding in | |
2139 | the unused data via the <computeroutput>unused</computeroutput> / | |
2140 | <computeroutput>nUnused</computeroutput> parameters. Keep doing | |
2141 | this until <computeroutput>BZ_STREAM_END</computeroutput> return | |
2142 | coincides with the physical end of file | |
2143 | (<computeroutput>feof(f)</computeroutput>). In this situation | |
2144 | <computeroutput>BZ2_bzReadGetUnused</computeroutput> will of | |
2145 | course return no data.</para> | |
2146 | ||
2147 | <para>This should give some feel for how the high-level interface | |
2148 | can be used. If you require extra flexibility, you'll have to | |
2149 | bite the bullet and get to grips with the low-level | |
2150 | interface.</para> | |
2151 | ||
2152 | </sect2> | |
2153 | ||
2154 | ||
2155 | <sect2 id="std-rdwr" xreflabel="Standard file-reading/writing code"> | |
2156 | <title>Standard file-reading/writing code</title> | |
2157 | ||
2158 | <para>Here's how you'd write data to a compressed file:</para> | |
2159 | ||
2160 | <programlisting> | |
2161 | FILE* f; | |
2162 | BZFILE* b; | |
2163 | int nBuf; | |
2164 | char buf[ /* whatever size you like */ ]; | |
2165 | int bzerror; | |
2166 | int nWritten; | |
2167 | ||
2168 | f = fopen ( "myfile.bz2", "w" ); | |
2169 | if ( !f ) { | |
2170 | /* handle error */ | |
2171 | } | |
2172 | b = BZ2_bzWriteOpen( &bzerror, f, 9 ); | |
2173 | if (bzerror != BZ_OK) { | |
2174 | BZ2_bzWriteClose ( b ); | |
2175 | /* handle error */ | |
2176 | } | |
2177 | ||
2178 | while ( /* condition */ ) { | |
2179 | /* get data to write into buf, and set nBuf appropriately */ | |
2180 | nWritten = BZ2_bzWrite ( &bzerror, b, buf, nBuf ); | |
2181 | if (bzerror == BZ_IO_ERROR) { | |
2182 | BZ2_bzWriteClose ( &bzerror, b ); | |
2183 | /* handle error */ | |
2184 | } | |
2185 | } | |
2186 | ||
2187 | BZ2_bzWriteClose( &bzerror, b ); | |
2188 | if (bzerror == BZ_IO_ERROR) { | |
2189 | /* handle error */ | |
2190 | } | |
2191 | </programlisting> | |
2192 | ||
2193 | <para>And to read from a compressed file:</para> | |
2194 | ||
2195 | <programlisting> | |
2196 | FILE* f; | |
2197 | BZFILE* b; | |
2198 | int nBuf; | |
2199 | char buf[ /* whatever size you like */ ]; | |
2200 | int bzerror; | |
2201 | int nWritten; | |
2202 | ||
2203 | f = fopen ( "myfile.bz2", "r" ); | |
2204 | if ( !f ) { | |
2205 | /* handle error */ | |
2206 | } | |
2207 | b = BZ2_bzReadOpen ( &bzerror, f, 0, NULL, 0 ); | |
2208 | if ( bzerror != BZ_OK ) { | |
2209 | BZ2_bzReadClose ( &bzerror, b ); | |
2210 | /* handle error */ | |
2211 | } | |
2212 | ||
2213 | bzerror = BZ_OK; | |
2214 | while ( bzerror == BZ_OK && /* arbitrary other conditions */) { | |
2215 | nBuf = BZ2_bzRead ( &bzerror, b, buf, /* size of buf */ ); | |
2216 | if ( bzerror == BZ_OK ) { | |
2217 | /* do something with buf[0 .. nBuf-1] */ | |
2218 | } | |
2219 | } | |
2220 | if ( bzerror != BZ_STREAM_END ) { | |
2221 | BZ2_bzReadClose ( &bzerror, b ); | |
2222 | /* handle error */ | |
2223 | } else { | |
2224 | BZ2_bzReadClose ( &bzerror, b ); | |
2225 | } | |
2226 | </programlisting> | |
2227 | ||
2228 | </sect2> | |
2229 | ||
2230 | </sect1> | |
2231 | ||
2232 | ||
2233 | <sect1 id="util-fns" xreflabel="Utility functions"> | |
2234 | <title>Utility functions</title> | |
2235 | ||
2236 | ||
2237 | <sect2 id="bzbufftobuffcompress" xreflabel="BZ2_bzBuffToBuffCompress"> | |
2238 | <title>BZ2_bzBuffToBuffCompress</title> | |
2239 | ||
2240 | <programlisting> | |
2241 | int BZ2_bzBuffToBuffCompress( char* dest, | |
2242 | unsigned int* destLen, | |
2243 | char* source, | |
2244 | unsigned int sourceLen, | |
2245 | int blockSize100k, | |
2246 | int verbosity, | |
2247 | int workFactor ); | |
2248 | </programlisting> | |
2249 | ||
2250 | <para>Attempts to compress the data in <computeroutput>source[0 | |
2251 | .. sourceLen-1]</computeroutput> into the destination buffer, | |
2252 | <computeroutput>dest[0 .. *destLen-1]</computeroutput>. If the | |
2253 | destination buffer is big enough, | |
2254 | <computeroutput>*destLen</computeroutput> is set to the size of | |
2255 | the compressed data, and <computeroutput>BZ_OK</computeroutput> | |
2256 | is returned. If the compressed data won't fit, | |
2257 | <computeroutput>*destLen</computeroutput> is unchanged, and | |
2258 | <computeroutput>BZ_OUTBUFF_FULL</computeroutput> is | |
2259 | returned.</para> | |
2260 | ||
2261 | <para>Compression in this manner is a one-shot event, done with a | |
2262 | single call to this function. The resulting compressed data is a | |
2263 | complete <computeroutput>bzip2</computeroutput> format data | |
2264 | stream. There is no mechanism for making additional calls to | |
2265 | provide extra input data. If you want that kind of mechanism, | |
2266 | use the low-level interface.</para> | |
2267 | ||
2268 | <para>For the meaning of parameters | |
2269 | <computeroutput>blockSize100k</computeroutput>, | |
2270 | <computeroutput>verbosity</computeroutput> and | |
2271 | <computeroutput>workFactor</computeroutput>, see | |
2272 | <computeroutput>BZ2_bzCompressInit</computeroutput>.</para> | |
2273 | ||
2274 | <para>To guarantee that the compressed data will fit in its | |
2275 | buffer, allocate an output buffer of size 1% larger than the | |
2276 | uncompressed data, plus six hundred extra bytes.</para> | |
2277 | ||
2278 | <para><computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> | |
2279 | will not write data at or beyond | |
2280 | <computeroutput>dest[*destLen]</computeroutput>, even in case of | |
2281 | buffer overflow.</para> | |
2282 | ||
2283 | <para>Possible return values:</para> | |
2284 | ||
2285 | <programlisting> | |
2286 | BZ_CONFIG_ERROR | |
2287 | if the library has been mis-compiled | |
2288 | BZ_PARAM_ERROR | |
2289 | if dest is NULL or destLen is NULL | |
2290 | or blockSize100k < 1 or blockSize100k > 9 | |
2291 | or verbosity < 0 or verbosity > 4 | |
2292 | or workFactor < 0 or workFactor > 250 | |
2293 | BZ_MEM_ERROR | |
2294 | if insufficient memory is available | |
2295 | BZ_OUTBUFF_FULL | |
2296 | if the size of the compressed data exceeds *destLen | |
2297 | BZ_OK | |
2298 | otherwise | |
2299 | </programlisting> | |
2300 | ||
2301 | </sect2> | |
2302 | ||
2303 | ||
2304 | <sect2 id="bzbufftobuffdecompress" xreflabel="BZ2_bzBuffToBuffDecompress"> | |
2305 | <title>BZ2_bzBuffToBuffDecompress</title> | |
2306 | ||
2307 | <programlisting> | |
2308 | int BZ2_bzBuffToBuffDecompress( char* dest, | |
2309 | unsigned int* destLen, | |
2310 | char* source, | |
2311 | unsigned int sourceLen, | |
2312 | int small, | |
2313 | int verbosity ); | |
2314 | </programlisting> | |
2315 | ||
2316 | <para>Attempts to decompress the data in <computeroutput>source[0 | |
2317 | .. sourceLen-1]</computeroutput> into the destination buffer, | |
2318 | <computeroutput>dest[0 .. *destLen-1]</computeroutput>. If the | |
2319 | destination buffer is big enough, | |
2320 | <computeroutput>*destLen</computeroutput> is set to the size of | |
2321 | the uncompressed data, and <computeroutput>BZ_OK</computeroutput> | |
2322 | is returned. If the compressed data won't fit, | |
2323 | <computeroutput>*destLen</computeroutput> is unchanged, and | |
2324 | <computeroutput>BZ_OUTBUFF_FULL</computeroutput> is | |
2325 | returned.</para> | |
2326 | ||
2327 | <para><computeroutput>source</computeroutput> is assumed to hold | |
2328 | a complete <computeroutput>bzip2</computeroutput> format data | |
2329 | stream. | |
2330 | <computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> tries | |
2331 | to decompress the entirety of the stream into the output | |
2332 | buffer.</para> | |
2333 | ||
2334 | <para>For the meaning of parameters | |
2335 | <computeroutput>small</computeroutput> and | |
2336 | <computeroutput>verbosity</computeroutput>, see | |
2337 | <computeroutput>BZ2_bzDecompressInit</computeroutput>.</para> | |
2338 | ||
2339 | <para>Because the compression ratio of the compressed data cannot | |
2340 | be known in advance, there is no easy way to guarantee that the | |
2341 | output buffer will be big enough. You may of course make | |
2342 | arrangements in your code to record the size of the uncompressed | |
2343 | data, but such a mechanism is beyond the scope of this | |
2344 | library.</para> | |
2345 | ||
2346 | <para><computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput> | |
2347 | will not write data at or beyond | |
2348 | <computeroutput>dest[*destLen]</computeroutput>, even in case of | |
2349 | buffer overflow.</para> | |
2350 | ||
2351 | <para>Possible return values:</para> | |
2352 | ||
2353 | <programlisting> | |
2354 | BZ_CONFIG_ERROR | |
2355 | if the library has been mis-compiled | |
2356 | BZ_PARAM_ERROR | |
2357 | if dest is NULL or destLen is NULL | |
2358 | or small != 0 && small != 1 | |
2359 | or verbosity < 0 or verbosity > 4 | |
2360 | BZ_MEM_ERROR | |
2361 | if insufficient memory is available | |
2362 | BZ_OUTBUFF_FULL | |
2363 | if the size of the compressed data exceeds *destLen | |
2364 | BZ_DATA_ERROR | |
2365 | if a data integrity error was detected in the compressed data | |
2366 | BZ_DATA_ERROR_MAGIC | |
2367 | if the compressed data doesn't begin with the right magic bytes | |
2368 | BZ_UNEXPECTED_EOF | |
2369 | if the compressed data ends unexpectedly | |
2370 | BZ_OK | |
2371 | otherwise | |
2372 | </programlisting> | |
2373 | ||
2374 | </sect2> | |
2375 | ||
2376 | </sect1> | |
2377 | ||
2378 | ||
2379 | <sect1 id="zlib-compat" xreflabel="zlib compatibility functions"> | |
2380 | <title>zlib compatibility functions</title> | |
2381 | ||
2382 | <para>Yoshioka Tsuneo has contributed some functions to give | |
2383 | better <computeroutput>zlib</computeroutput> compatibility. | |
2384 | These functions are <computeroutput>BZ2_bzopen</computeroutput>, | |
2385 | <computeroutput>BZ2_bzread</computeroutput>, | |
2386 | <computeroutput>BZ2_bzwrite</computeroutput>, | |
2387 | <computeroutput>BZ2_bzflush</computeroutput>, | |
2388 | <computeroutput>BZ2_bzclose</computeroutput>, | |
2389 | <computeroutput>BZ2_bzerror</computeroutput> and | |
2390 | <computeroutput>BZ2_bzlibVersion</computeroutput>. These | |
2391 | functions are not (yet) officially part of the library. If they | |
2392 | break, you get to keep all the pieces. Nevertheless, I think | |
2393 | they work ok.</para> | |
2394 | ||
2395 | <programlisting> | |
2396 | typedef void BZFILE; | |
2397 | ||
2398 | const char * BZ2_bzlibVersion ( void ); | |
2399 | </programlisting> | |
2400 | ||
2401 | <para>Returns a string indicating the library version.</para> | |
2402 | ||
2403 | <programlisting> | |
2404 | BZFILE * BZ2_bzopen ( const char *path, const char *mode ); | |
2405 | BZFILE * BZ2_bzdopen ( int fd, const char *mode ); | |
2406 | </programlisting> | |
2407 | ||
2408 | <para>Opens a <computeroutput>.bz2</computeroutput> file for | |
2409 | reading or writing, using either its name or a pre-existing file | |
2410 | descriptor. Analogous to <computeroutput>fopen</computeroutput> | |
2411 | and <computeroutput>fdopen</computeroutput>.</para> | |
2412 | ||
2413 | <programlisting> | |
2414 | int BZ2_bzread ( BZFILE* b, void* buf, int len ); | |
2415 | int BZ2_bzwrite ( BZFILE* b, void* buf, int len ); | |
2416 | </programlisting> | |
2417 | ||
2418 | <para>Reads/writes data from/to a previously opened | |
2419 | <computeroutput>BZFILE</computeroutput>. Analogous to | |
2420 | <computeroutput>fread</computeroutput> and | |
2421 | <computeroutput>fwrite</computeroutput>.</para> | |
2422 | ||
2423 | <programlisting> | |
2424 | int BZ2_bzflush ( BZFILE* b ); | |
2425 | void BZ2_bzclose ( BZFILE* b ); | |
2426 | </programlisting> | |
2427 | ||
2428 | <para>Flushes/closes a <computeroutput>BZFILE</computeroutput>. | |
2429 | <computeroutput>BZ2_bzflush</computeroutput> doesn't actually do | |
2430 | anything. Analogous to <computeroutput>fflush</computeroutput> | |
2431 | and <computeroutput>fclose</computeroutput>.</para> | |
2432 | ||
2433 | <programlisting> | |
2434 | const char * BZ2_bzerror ( BZFILE *b, int *errnum ) | |
2435 | </programlisting> | |
2436 | ||
2437 | <para>Returns a string describing the more recent error status of | |
2438 | <computeroutput>b</computeroutput>, and also sets | |
2439 | <computeroutput>*errnum</computeroutput> to its numerical | |
2440 | value.</para> | |
2441 | ||
2442 | </sect1> | |
2443 | ||
2444 | ||
2445 | <sect1 id="stdio-free" | |
2446 | xreflabel="Using the library in a stdio-free environment"> | |
2447 | <title>Using the library in a stdio-free environment</title> | |
2448 | ||
2449 | ||
2450 | <sect2 id="stdio-bye" xreflabel="Getting rid of stdio"> | |
2451 | <title>Getting rid of stdio</title> | |
2452 | ||
2453 | <para>In a deeply embedded application, you might want to use | |
2454 | just the memory-to-memory functions. You can do this | |
2455 | conveniently by compiling the library with preprocessor symbol | |
2456 | <computeroutput>BZ_NO_STDIO</computeroutput> defined. Doing this | |
2457 | gives you a library containing only the following eight | |
2458 | functions:</para> | |
2459 | ||
2460 | <para><computeroutput>BZ2_bzCompressInit</computeroutput>, | |
2461 | <computeroutput>BZ2_bzCompress</computeroutput>, | |
2462 | <computeroutput>BZ2_bzCompressEnd</computeroutput> | |
2463 | <computeroutput>BZ2_bzDecompressInit</computeroutput>, | |
2464 | <computeroutput>BZ2_bzDecompress</computeroutput>, | |
2465 | <computeroutput>BZ2_bzDecompressEnd</computeroutput> | |
2466 | <computeroutput>BZ2_bzBuffToBuffCompress</computeroutput>, | |
2467 | <computeroutput>BZ2_bzBuffToBuffDecompress</computeroutput></para> | |
2468 | ||
2469 | <para>When compiled like this, all functions will ignore | |
2470 | <computeroutput>verbosity</computeroutput> settings.</para> | |
2471 | ||
2472 | </sect2> | |
2473 | ||
2474 | ||
2475 | <sect2 id="critical-error" xreflabel="Critical error handling"> | |
2476 | <title>Critical error handling</title> | |
2477 | ||
2478 | <para><computeroutput>libbzip2</computeroutput> contains a number | |
2479 | of internal assertion checks which should, needless to say, never | |
2480 | be activated. Nevertheless, if an assertion should fail, | |
2481 | behaviour depends on whether or not the library was compiled with | |
2482 | <computeroutput>BZ_NO_STDIO</computeroutput> set.</para> | |
2483 | ||
2484 | <para>For a normal compile, an assertion failure yields the | |
2485 | message:</para> | |
2486 | ||
2487 | <blockquote> | |
2488 | <para>bzip2/libbzip2: internal error number N.</para> | |
2489 | <para>This is a bug in bzip2/libbzip2, &bz-version; of &bz-date;. | |
f67539c2 | 2490 | Please report it to: &bz-email;. If this happened |
92f5a8d4 TL |
2491 | when you were using some program which uses libbzip2 as a |
2492 | component, you should also report this bug to the author(s) | |
2493 | of that program. Please make an effort to report this bug; | |
2494 | timely and accurate bug reports eventually lead to higher | |
f67539c2 | 2495 | quality software. Thanks. |
92f5a8d4 TL |
2496 | </para></blockquote> |
2497 | ||
2498 | <para>where <computeroutput>N</computeroutput> is some error code | |
2499 | number. If <computeroutput>N == 1007</computeroutput>, it also | |
2500 | prints some extra text advising the reader that unreliable memory | |
2501 | is often associated with internal error 1007. (This is a | |
2502 | frequently-observed-phenomenon with versions 1.0.0/1.0.1).</para> | |
2503 | ||
2504 | <para><computeroutput>exit(3)</computeroutput> is then | |
2505 | called.</para> | |
2506 | ||
2507 | <para>For a <computeroutput>stdio</computeroutput>-free library, | |
2508 | assertion failures result in a call to a function declared | |
2509 | as:</para> | |
2510 | ||
2511 | <programlisting> | |
2512 | extern void bz_internal_error ( int errcode ); | |
2513 | </programlisting> | |
2514 | ||
2515 | <para>The relevant code is passed as a parameter. You should | |
2516 | supply such a function.</para> | |
2517 | ||
2518 | <para>In either case, once an assertion failure has occurred, any | |
2519 | <computeroutput>bz_stream</computeroutput> records involved can | |
2520 | be regarded as invalid. You should not attempt to resume normal | |
2521 | operation with them.</para> | |
2522 | ||
2523 | <para>You may, of course, change critical error handling to suit | |
2524 | your needs. As I said above, critical errors indicate bugs in | |
2525 | the library and should not occur. All "normal" error situations | |
2526 | are indicated via error return codes from functions, and can be | |
2527 | recovered from.</para> | |
2528 | ||
2529 | </sect2> | |
2530 | ||
2531 | </sect1> | |
2532 | ||
2533 | ||
2534 | <sect1 id="win-dll" xreflabel="Making a Windows DLL"> | |
2535 | <title>Making a Windows DLL</title> | |
2536 | ||
2537 | <para>Everything related to Windows has been contributed by | |
2538 | Yoshioka Tsuneo | |
2539 | (<computeroutput>tsuneo@rr.iij4u.or.jp</computeroutput>), so | |
f67539c2 | 2540 | you should send your queries to him (but please Cc: |
92f5a8d4 TL |
2541 | <computeroutput>&bz-email;</computeroutput>).</para> |
2542 | ||
2543 | <para>My vague understanding of what to do is: using Visual C++ | |
2544 | 5.0, open the project file | |
2545 | <computeroutput>libbz2.dsp</computeroutput>, and build. That's | |
2546 | all.</para> | |
2547 | ||
2548 | <para>If you can't open the project file for some reason, make a | |
2549 | new one, naming these files: | |
2550 | <computeroutput>blocksort.c</computeroutput>, | |
2551 | <computeroutput>bzlib.c</computeroutput>, | |
2552 | <computeroutput>compress.c</computeroutput>, | |
2553 | <computeroutput>crctable.c</computeroutput>, | |
2554 | <computeroutput>decompress.c</computeroutput>, | |
2555 | <computeroutput>huffman.c</computeroutput>, | |
2556 | <computeroutput>randtable.c</computeroutput> and | |
2557 | <computeroutput>libbz2.def</computeroutput>. You will also need | |
2558 | to name the header files <computeroutput>bzlib.h</computeroutput> | |
2559 | and <computeroutput>bzlib_private.h</computeroutput>.</para> | |
2560 | ||
2561 | <para>If you don't use VC++, you may need to define the | |
2562 | proprocessor symbol | |
2563 | <computeroutput>_WIN32</computeroutput>.</para> | |
2564 | ||
2565 | <para>Finally, <computeroutput>dlltest.c</computeroutput> is a | |
2566 | sample program using the DLL. It has a project file, | |
2567 | <computeroutput>dlltest.dsp</computeroutput>.</para> | |
2568 | ||
2569 | <para>If you just want a makefile for Visual C, have a look at | |
2570 | <computeroutput>makefile.msc</computeroutput>.</para> | |
2571 | ||
2572 | <para>Be aware that if you compile | |
2573 | <computeroutput>bzip2</computeroutput> itself on Win32, you must | |
2574 | set <computeroutput>BZ_UNIX</computeroutput> to 0 and | |
2575 | <computeroutput>BZ_LCCWIN32</computeroutput> to 1, in the file | |
2576 | <computeroutput>bzip2.c</computeroutput>, before compiling. | |
2577 | Otherwise the resulting binary won't work correctly.</para> | |
2578 | ||
2579 | <para>I haven't tried any of this stuff myself, but it all looks | |
2580 | plausible.</para> | |
2581 | ||
2582 | </sect1> | |
2583 | ||
2584 | </chapter> | |
2585 | ||
2586 | ||
2587 | ||
2588 | <chapter id="misc" xreflabel="Miscellanea"> | |
2589 | <title>Miscellanea</title> | |
2590 | ||
2591 | <para>These are just some random thoughts of mine. Your mileage | |
2592 | may vary.</para> | |
2593 | ||
2594 | ||
2595 | <sect1 id="limits" xreflabel="Limitations of the compressed file format"> | |
2596 | <title>Limitations of the compressed file format</title> | |
2597 | ||
2598 | <para><computeroutput>bzip2-1.0.X</computeroutput>, | |
2599 | <computeroutput>0.9.5</computeroutput> and | |
2600 | <computeroutput>0.9.0</computeroutput> use exactly the same file | |
2601 | format as the original version, | |
2602 | <computeroutput>bzip2-0.1</computeroutput>. This decision was | |
2603 | made in the interests of stability. Creating yet another | |
2604 | incompatible compressed file format would create further | |
2605 | confusion and disruption for users.</para> | |
2606 | ||
2607 | <para>Nevertheless, this is not a painless decision. Development | |
2608 | work since the release of | |
2609 | <computeroutput>bzip2-0.1</computeroutput> in August 1997 has | |
2610 | shown complexities in the file format which slow down | |
2611 | decompression and, in retrospect, are unnecessary. These | |
2612 | are:</para> | |
2613 | ||
2614 | <itemizedlist mark='bullet'> | |
2615 | ||
2616 | <listitem><para>The run-length encoder, which is the first of the | |
2617 | compression transformations, is entirely irrelevant. The | |
2618 | original purpose was to protect the sorting algorithm from the | |
2619 | very worst case input: a string of repeated symbols. But | |
2620 | algorithm steps Q6a and Q6b in the original Burrows-Wheeler | |
2621 | technical report (SRC-124) show how repeats can be handled | |
2622 | without difficulty in block sorting.</para></listitem> | |
2623 | ||
2624 | <listitem><para>The randomisation mechanism doesn't really need to be | |
2625 | there. Udi Manber and Gene Myers published a suffix array | |
2626 | construction algorithm a few years back, which can be employed | |
2627 | to sort any block, no matter how repetitive, in O(N log N) | |
2628 | time. Subsequent work by Kunihiko Sadakane has produced a | |
2629 | derivative O(N (log N)^2) algorithm which usually outperforms | |
2630 | the Manber-Myers algorithm.</para> | |
2631 | ||
2632 | <para>I could have changed to Sadakane's algorithm, but I find | |
2633 | it to be slower than <computeroutput>bzip2</computeroutput>'s | |
2634 | existing algorithm for most inputs, and the randomisation | |
2635 | mechanism protects adequately against bad cases. I didn't | |
2636 | think it was a good tradeoff to make. Partly this is due to | |
2637 | the fact that I was not flooded with email complaints about | |
2638 | <computeroutput>bzip2-0.1</computeroutput>'s performance on | |
2639 | repetitive data, so perhaps it isn't a problem for real | |
2640 | inputs.</para> | |
2641 | ||
2642 | <para>Probably the best long-term solution, and the one I have | |
2643 | incorporated into 0.9.5 and above, is to use the existing | |
2644 | sorting algorithm initially, and fall back to a O(N (log N)^2) | |
2645 | algorithm if the standard algorithm gets into | |
2646 | difficulties.</para></listitem> | |
2647 | ||
2648 | <listitem><para>The compressed file format was never designed to be | |
2649 | handled by a library, and I have had to jump though some hoops | |
2650 | to produce an efficient implementation of decompression. It's | |
2651 | a bit hairy. Try passing | |
2652 | <computeroutput>decompress.c</computeroutput> through the C | |
2653 | preprocessor and you'll see what I mean. Much of this | |
2654 | complexity could have been avoided if the compressed size of | |
2655 | each block of data was recorded in the data stream.</para></listitem> | |
2656 | ||
2657 | <listitem><para>An Adler-32 checksum, rather than a CRC32 checksum, | |
2658 | would be faster to compute.</para></listitem> | |
2659 | ||
2660 | </itemizedlist> | |
2661 | ||
2662 | <para>It would be fair to say that the | |
2663 | <computeroutput>bzip2</computeroutput> format was frozen before I | |
2664 | properly and fully understood the performance consequences of | |
2665 | doing so.</para> | |
2666 | ||
2667 | <para>Improvements which I was able to incorporate into 0.9.0, | |
2668 | despite using the same file format, are:</para> | |
2669 | ||
2670 | <itemizedlist mark='bullet'> | |
2671 | ||
2672 | <listitem><para>Single array implementation of the inverse BWT. This | |
2673 | significantly speeds up decompression, presumably because it | |
2674 | reduces the number of cache misses.</para></listitem> | |
2675 | ||
2676 | <listitem><para>Faster inverse MTF transform for large MTF values. | |
2677 | The new implementation is based on the notion of sliding blocks | |
2678 | of values.</para></listitem> | |
2679 | ||
2680 | <listitem><para><computeroutput>bzip2-0.9.0</computeroutput> now reads | |
2681 | and writes files with <computeroutput>fread</computeroutput> | |
2682 | and <computeroutput>fwrite</computeroutput>; version 0.1 used | |
2683 | <computeroutput>putc</computeroutput> and | |
2684 | <computeroutput>getc</computeroutput>. Duh! Well, you live | |
2685 | and learn.</para></listitem> | |
2686 | ||
2687 | </itemizedlist> | |
2688 | ||
2689 | <para>Further ahead, it would be nice to be able to do random | |
2690 | access into files. This will require some careful design of | |
2691 | compressed file formats.</para> | |
2692 | ||
2693 | </sect1> | |
2694 | ||
2695 | ||
2696 | <sect1 id="port-issues" xreflabel="Portability issues"> | |
2697 | <title>Portability issues</title> | |
2698 | ||
2699 | <para>After some consideration, I have decided not to use GNU | |
2700 | <computeroutput>autoconf</computeroutput> to configure 0.9.5 or | |
2701 | 1.0.</para> | |
2702 | ||
2703 | <para><computeroutput>autoconf</computeroutput>, admirable and | |
2704 | wonderful though it is, mainly assists with portability problems | |
2705 | between Unix-like platforms. But | |
2706 | <computeroutput>bzip2</computeroutput> doesn't have much in the | |
2707 | way of portability problems on Unix; most of the difficulties | |
2708 | appear when porting to the Mac, or to Microsoft's operating | |
2709 | systems. <computeroutput>autoconf</computeroutput> doesn't help | |
2710 | in those cases, and brings in a whole load of new | |
2711 | complexity.</para> | |
2712 | ||
2713 | <para>Most people should be able to compile the library and | |
2714 | program under Unix straight out-of-the-box, so to speak, | |
2715 | especially if you have a version of GNU C available.</para> | |
2716 | ||
2717 | <para>There are a couple of | |
2718 | <computeroutput>__inline__</computeroutput> directives in the | |
2719 | code. GNU C (<computeroutput>gcc</computeroutput>) should be | |
2720 | able to handle them. If you're not using GNU C, your C compiler | |
2721 | shouldn't see them at all. If your compiler does, for some | |
2722 | reason, see them and doesn't like them, just | |
2723 | <computeroutput>#define</computeroutput> | |
2724 | <computeroutput>__inline__</computeroutput> to be | |
2725 | <computeroutput>/* */</computeroutput>. One easy way to do this | |
2726 | is to compile with the flag | |
2727 | <computeroutput>-D__inline__=</computeroutput>, which should be | |
2728 | understood by most Unix compilers.</para> | |
2729 | ||
2730 | <para>If you still have difficulties, try compiling with the | |
2731 | macro <computeroutput>BZ_STRICT_ANSI</computeroutput> defined. | |
2732 | This should enable you to build the library in a strictly ANSI | |
2733 | compliant environment. Building the program itself like this is | |
2734 | dangerous and not supported, since you remove | |
2735 | <computeroutput>bzip2</computeroutput>'s checks against | |
2736 | compressing directories, symbolic links, devices, and other | |
2737 | not-really-a-file entities. This could cause filesystem | |
2738 | corruption!</para> | |
2739 | ||
2740 | <para>One other thing: if you create a | |
2741 | <computeroutput>bzip2</computeroutput> binary for public distribution, | |
2742 | please consider linking it statically (<computeroutput>gcc | |
2743 | -static</computeroutput>). This avoids all sorts of library-version | |
2744 | issues that others may encounter later on.</para> | |
2745 | ||
2746 | <para>If you build <computeroutput>bzip2</computeroutput> on | |
2747 | Win32, you must set <computeroutput>BZ_UNIX</computeroutput> to 0 | |
2748 | and <computeroutput>BZ_LCCWIN32</computeroutput> to 1, in the | |
2749 | file <computeroutput>bzip2.c</computeroutput>, before compiling. | |
2750 | Otherwise the resulting binary won't work correctly.</para> | |
2751 | ||
2752 | </sect1> | |
2753 | ||
2754 | ||
2755 | <sect1 id="bugs" xreflabel="Reporting bugs"> | |
2756 | <title>Reporting bugs</title> | |
2757 | ||
2758 | <para>I tried pretty hard to make sure | |
2759 | <computeroutput>bzip2</computeroutput> is bug free, both by | |
2760 | design and by testing. Hopefully you'll never need to read this | |
2761 | section for real.</para> | |
2762 | ||
2763 | <para>Nevertheless, if <computeroutput>bzip2</computeroutput> dies | |
2764 | with a segmentation fault, a bus error or an internal assertion | |
2765 | failure, it will ask you to email me a bug report. Experience from | |
2766 | years of feedback of bzip2 users indicates that almost all these | |
2767 | problems can be traced to either compiler bugs or hardware | |
2768 | problems.</para> | |
2769 | ||
2770 | <itemizedlist mark='bullet'> | |
2771 | ||
2772 | <listitem><para>Recompile the program with no optimisation, and | |
2773 | see if it works. And/or try a different compiler. I heard all | |
2774 | sorts of stories about various flavours of GNU C (and other | |
2775 | compilers) generating bad code for | |
2776 | <computeroutput>bzip2</computeroutput>, and I've run across two | |
2777 | such examples myself.</para> | |
2778 | ||
2779 | <para>2.7.X versions of GNU C are known to generate bad code | |
2780 | from time to time, at high optimisation levels. If you get | |
2781 | problems, try using the flags | |
2782 | <computeroutput>-O2</computeroutput> | |
2783 | <computeroutput>-fomit-frame-pointer</computeroutput> | |
2784 | <computeroutput>-fno-strength-reduce</computeroutput>. You | |
2785 | should specifically <emphasis>not</emphasis> use | |
2786 | <computeroutput>-funroll-loops</computeroutput>.</para> | |
2787 | ||
2788 | <para>You may notice that the Makefile runs six tests as part | |
2789 | of the build process. If the program passes all of these, it's | |
2790 | a pretty good (but not 100%) indication that the compiler has | |
2791 | done its job correctly.</para></listitem> | |
2792 | ||
2793 | <listitem><para>If <computeroutput>bzip2</computeroutput> | |
2794 | crashes randomly, and the crashes are not repeatable, you may | |
2795 | have a flaky memory subsystem. | |
2796 | <computeroutput>bzip2</computeroutput> really hammers your | |
2797 | memory hierarchy, and if it's a bit marginal, you may get these | |
2798 | problems. Ditto if your disk or I/O subsystem is slowly | |
2799 | failing. Yup, this really does happen.</para> | |
2800 | ||
2801 | <para>Try using a different machine of the same type, and see | |
2802 | if you can repeat the problem.</para></listitem> | |
2803 | ||
2804 | <listitem><para>This isn't really a bug, but ... If | |
2805 | <computeroutput>bzip2</computeroutput> tells you your file is | |
2806 | corrupted on decompression, and you obtained the file via FTP, | |
2807 | there is a possibility that you forgot to tell FTP to do a | |
2808 | binary mode transfer. That absolutely will cause the file to | |
2809 | be non-decompressible. You'll have to transfer it | |
2810 | again.</para></listitem> | |
2811 | ||
2812 | </itemizedlist> | |
2813 | ||
2814 | <para>If you've incorporated | |
2815 | <computeroutput>libbzip2</computeroutput> into your own program | |
2816 | and are getting problems, please, please, please, check that the | |
2817 | parameters you are passing in calls to the library, are correct, | |
2818 | and in accordance with what the documentation says is allowable. | |
2819 | I have tried to make the library robust against such problems, | |
2820 | but I'm sure I haven't succeeded.</para> | |
2821 | ||
2822 | <para>Finally, if the above comments don't help, you'll have to | |
2823 | send me a bug report. Now, it's just amazing how many people | |
2824 | will send me a bug report saying something like:</para> | |
2825 | ||
2826 | <programlisting> | |
2827 | bzip2 crashed with segmentation fault on my machine | |
2828 | </programlisting> | |
2829 | ||
2830 | <para>and absolutely nothing else. Needless to say, a such a | |
2831 | report is <emphasis>totally, utterly, completely and | |
2832 | comprehensively 100% useless; a waste of your time, my time, and | |
2833 | net bandwidth</emphasis>. With no details at all, there's no way | |
2834 | I can possibly begin to figure out what the problem is.</para> | |
2835 | ||
2836 | <para>The rules of the game are: facts, facts, facts. Don't omit | |
2837 | them because "oh, they won't be relevant". At the bare | |
2838 | minimum:</para> | |
2839 | ||
2840 | <programlisting> | |
2841 | Machine type. Operating system version. | |
2842 | Exact version of bzip2 (do bzip2 -V). | |
2843 | Exact version of the compiler used. | |
2844 | Flags passed to the compiler. | |
2845 | </programlisting> | |
2846 | ||
2847 | <para>However, the most important single thing that will help me | |
2848 | is the file that you were trying to compress or decompress at the | |
2849 | time the problem happened. Without that, my ability to do | |
2850 | anything more than speculate about the cause, is limited.</para> | |
2851 | ||
2852 | </sect1> | |
2853 | ||
2854 | ||
2855 | <sect1 id="package" xreflabel="Did you get the right package?"> | |
2856 | <title>Did you get the right package?</title> | |
2857 | ||
2858 | <para><computeroutput>bzip2</computeroutput> is a resource hog. | |
2859 | It soaks up large amounts of CPU cycles and memory. Also, it | |
2860 | gives very large latencies. In the worst case, you can feed many | |
2861 | megabytes of uncompressed data into the library before getting | |
2862 | any compressed output, so this probably rules out applications | |
2863 | requiring interactive behaviour.</para> | |
2864 | ||
2865 | <para>These aren't faults of my implementation, I hope, but more | |
2866 | an intrinsic property of the Burrows-Wheeler transform | |
2867 | (unfortunately). Maybe this isn't what you want.</para> | |
2868 | ||
2869 | <para>If you want a compressor and/or library which is faster, | |
2870 | uses less memory but gets pretty good compression, and has | |
2871 | minimal latency, consider Jean-loup Gailly's and Mark Adler's | |
2872 | work, <computeroutput>zlib-1.2.1</computeroutput> and | |
2873 | <computeroutput>gzip-1.2.4</computeroutput>. Look for them at | |
2874 | <ulink url="http://www.zlib.org">http://www.zlib.org</ulink> and | |
2875 | <ulink url="http://www.gzip.org">http://www.gzip.org</ulink> | |
2876 | respectively.</para> | |
2877 | ||
2878 | <para>For something faster and lighter still, you might try Markus F | |
2879 | X J Oberhumer's <computeroutput>LZO</computeroutput> real-time | |
2880 | compression/decompression library, at | |
2881 | <ulink url="http://www.oberhumer.com/opensource">http://www.oberhumer.com/opensource</ulink>.</para> | |
2882 | ||
2883 | </sect1> | |
2884 | ||
2885 | ||
2886 | ||
2887 | <sect1 id="reading" xreflabel="Further Reading"> | |
2888 | <title>Further Reading</title> | |
2889 | ||
2890 | <para><computeroutput>bzip2</computeroutput> is not research | |
2891 | work, in the sense that it doesn't present any new ideas. | |
2892 | Rather, it's an engineering exercise based on existing | |
2893 | ideas.</para> | |
2894 | ||
2895 | <para>Four documents describe essentially all the ideas behind | |
2896 | <computeroutput>bzip2</computeroutput>:</para> | |
2897 | ||
2898 | <literallayout>Michael Burrows and D. J. Wheeler: | |
2899 | "A block-sorting lossless data compression algorithm" | |
2900 | 10th May 1994. | |
2901 | Digital SRC Research Report 124. | |
2902 | ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz | |
2903 | If you have trouble finding it, try searching at the | |
2904 | New Zealand Digital Library, http://www.nzdl.org. | |
2905 | ||
2906 | Daniel S. Hirschberg and Debra A. LeLewer | |
2907 | "Efficient Decoding of Prefix Codes" | |
2908 | Communications of the ACM, April 1990, Vol 33, Number 4. | |
2909 | You might be able to get an electronic copy of this | |
2910 | from the ACM Digital Library. | |
2911 | ||
2912 | David J. Wheeler | |
2913 | Program bred3.c and accompanying document bred3.ps. | |
2914 | This contains the idea behind the multi-table Huffman coding scheme. | |
2915 | ftp://ftp.cl.cam.ac.uk/users/djw3/ | |
2916 | ||
2917 | Jon L. Bentley and Robert Sedgewick | |
2918 | "Fast Algorithms for Sorting and Searching Strings" | |
2919 | Available from Sedgewick's web page, | |
2920 | www.cs.princeton.edu/~rs | |
2921 | </literallayout> | |
2922 | ||
2923 | <para>The following paper gives valuable additional insights into | |
2924 | the algorithm, but is not immediately the basis of any code used | |
2925 | in bzip2.</para> | |
2926 | ||
2927 | <literallayout>Peter Fenwick: | |
2928 | Block Sorting Text Compression | |
2929 | Proceedings of the 19th Australasian Computer Science Conference, | |
2930 | Melbourne, Australia. Jan 31 - Feb 2, 1996. | |
2931 | ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps</literallayout> | |
2932 | ||
2933 | <para>Kunihiko Sadakane's sorting algorithm, mentioned above, is | |
2934 | available from:</para> | |
2935 | ||
2936 | <literallayout>http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz | |
2937 | </literallayout> | |
2938 | ||
2939 | <para>The Manber-Myers suffix array construction algorithm is | |
2940 | described in a paper available from:</para> | |
2941 | ||
2942 | <literallayout>http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps | |
2943 | </literallayout> | |
2944 | ||
2945 | <para>Finally, the following papers document some | |
2946 | investigations I made into the performance of sorting | |
2947 | and decompression algorithms:</para> | |
2948 | ||
2949 | <literallayout>Julian Seward | |
2950 | On the Performance of BWT Sorting Algorithms | |
2951 | Proceedings of the IEEE Data Compression Conference 2000 | |
2952 | Snowbird, Utah. 28-30 March 2000. | |
2953 | ||
2954 | Julian Seward | |
2955 | Space-time Tradeoffs in the Inverse B-W Transform | |
2956 | Proceedings of the IEEE Data Compression Conference 2001 | |
2957 | Snowbird, Utah. 27-29 March 2001. | |
2958 | </literallayout> | |
2959 | ||
2960 | </sect1> | |
2961 | ||
2962 | </chapter> | |
2963 | ||
2964 | </book> |