]> git.proxmox.com Git - mirror_edk2.git/blame - BaseTools/Source/C/LzmaCompress/Sdk/lzma.txt
BaseTools: Fix build on FreeBSD and allow use of non-gcc system compiler
[mirror_edk2.git] / BaseTools / Source / C / LzmaCompress / Sdk / lzma.txt
CommitLineData
30fdf114
LG
1LZMA SDK 4.65\r
2-------------\r
3\r
4LZMA SDK provides the documentation, samples, header files, libraries, \r
5and tools you need to develop applications that use LZMA compression.\r
6\r
7LZMA is default and general compression method of 7z format\r
8in 7-Zip compression program (www.7-zip.org). LZMA provides high \r
9compression ratio and very fast decompression.\r
10\r
11LZMA is an improved version of famous LZ77 compression algorithm. \r
12It was improved in way of maximum increasing of compression ratio,\r
13keeping high decompression speed and low memory requirements for \r
14decompressing.\r
15\r
16\r
17\r
18LICENSE\r
19-------\r
20\r
21LZMA SDK is written and placed in the public domain by Igor Pavlov.\r
22\r
23\r
24LZMA SDK Contents\r
25-----------------\r
26\r
27LZMA SDK includes:\r
28\r
29 - ANSI-C/C++/C#/Java source code for LZMA compressing and decompressing\r
30 - Compiled file->file LZMA compressing/decompressing program for Windows system\r
31\r
32\r
33UNIX/Linux version \r
34------------------\r
35To compile C++ version of file->file LZMA encoding, go to directory\r
36C++/7zip/Compress/LZMA_Alone \r
37and call make to recompile it:\r
38 make -f makefile.gcc clean all\r
39\r
40In some UNIX/Linux versions you must compile LZMA with static libraries.\r
41To compile with static libraries, you can use \r
42LIB = -lm -static\r
43\r
44\r
45Files\r
46---------------------\r
47lzma.txt - LZMA SDK description (this file)\r
487zFormat.txt - 7z Format description\r
497zC.txt - 7z ANSI-C Decoder description\r
50methods.txt - Compression method IDs for .7z\r
51lzma.exe - Compiled file->file LZMA encoder/decoder for Windows\r
52history.txt - history of the LZMA SDK\r
53\r
54\r
55Source code structure\r
56---------------------\r
57\r
58C/ - C files\r
59 7zCrc*.* - CRC code\r
60 Alloc.* - Memory allocation functions\r
61 Bra*.* - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code\r
62 LzFind.* - Match finder for LZ (LZMA) encoders \r
63 LzFindMt.* - Match finder for LZ (LZMA) encoders for multithreading encoding\r
64 LzHash.h - Additional file for LZ match finder\r
65 LzmaDec.* - LZMA decoding\r
66 LzmaEnc.* - LZMA encoding\r
67 LzmaLib.* - LZMA Library for DLL calling\r
68 Types.h - Basic types for another .c files\r
69 Threads.* - The code for multithreading.\r
70\r
71 LzmaLib - LZMA Library (.DLL for Windows)\r
72 \r
73 LzmaUtil - LZMA Utility (file->file LZMA encoder/decoder).\r
74\r
75 Archive - files related to archiving\r
76 7z - 7z ANSI-C Decoder\r
77\r
78CPP/ -- CPP files\r
79\r
80 Common - common files for C++ projects\r
81 Windows - common files for Windows related code\r
82\r
83 7zip - files related to 7-Zip Project\r
84\r
85 Common - common files for 7-Zip\r
86\r
87 Compress - files related to compression/decompression\r
88\r
89 Copy - Copy coder\r
90 RangeCoder - Range Coder (special code of compression/decompression)\r
91 LZMA - LZMA compression/decompression on C++\r
92 LZMA_Alone - file->file LZMA compression/decompression\r
93 Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code\r
94\r
95 Archive - files related to archiving\r
96\r
97 Common - common files for archive handling\r
98 7z - 7z C++ Encoder/Decoder\r
99\r
100 Bundles - Modules that are bundles of other modules\r
101 \r
102 Alone7z - 7zr.exe: Standalone version of 7z.exe that supports only 7z/LZMA/BCJ/BCJ2\r
103 Format7zR - 7zr.dll: Reduced version of 7za.dll: extracting/compressing to 7z/LZMA/BCJ/BCJ2\r
104 Format7zExtractR - 7zxr.dll: Reduced version of 7zxa.dll: extracting from 7z/LZMA/BCJ/BCJ2.\r
105\r
106 UI - User Interface files\r
107 \r
108 Client7z - Test application for 7za.dll, 7zr.dll, 7zxr.dll\r
109 Common - Common UI files\r
110 Console - Code for console archiver\r
111\r
112\r
113\r
114CS/ - C# files\r
115 7zip\r
116 Common - some common files for 7-Zip\r
117 Compress - files related to compression/decompression\r
118 LZ - files related to LZ (Lempel-Ziv) compression algorithm\r
119 LZMA - LZMA compression/decompression\r
120 LzmaAlone - file->file LZMA compression/decompression\r
121 RangeCoder - Range Coder (special code of compression/decompression)\r
122\r
123Java/ - Java files\r
124 SevenZip\r
125 Compression - files related to compression/decompression\r
126 LZ - files related to LZ (Lempel-Ziv) compression algorithm\r
127 LZMA - LZMA compression/decompression\r
128 RangeCoder - Range Coder (special code of compression/decompression)\r
129\r
130\r
131C/C++ source code of LZMA SDK is part of 7-Zip project.\r
1327-Zip source code can be downloaded from 7-Zip's SourceForge page:\r
133\r
134 http://sourceforge.net/projects/sevenzip/\r
135\r
136\r
137\r
138LZMA features\r
139-------------\r
140 - Variable dictionary size (up to 1 GB)\r
141 - Estimated compressing speed: about 2 MB/s on 2 GHz CPU\r
142 - Estimated decompressing speed: \r
143 - 20-30 MB/s on 2 GHz Core 2 or AMD Athlon 64\r
144 - 1-2 MB/s on 200 MHz ARM, MIPS, PowerPC or other simple RISC\r
145 - Small memory requirements for decompressing (16 KB + DictionarySize)\r
146 - Small code size for decompressing: 5-8 KB\r
147\r
148LZMA decoder uses only integer operations and can be \r
149implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions).\r
150\r
151Some critical operations that affect the speed of LZMA decompression:\r
152 1) 32*16 bit integer multiply\r
153 2) Misspredicted branches (penalty mostly depends from pipeline length)\r
154 3) 32-bit shift and arithmetic operations\r
155\r
156The speed of LZMA decompressing mostly depends from CPU speed.\r
157Memory speed has no big meaning. But if your CPU has small data cache, \r
158overall weight of memory speed will slightly increase.\r
159\r
160\r
161How To Use\r
162----------\r
163\r
164Using LZMA encoder/decoder executable\r
165--------------------------------------\r
166\r
167Usage: LZMA <e|d> inputFile outputFile [<switches>...]\r
168\r
169 e: encode file\r
170\r
171 d: decode file\r
172\r
173 b: Benchmark. There are two tests: compressing and decompressing \r
174 with LZMA method. Benchmark shows rating in MIPS (million \r
175 instructions per second). Rating value is calculated from \r
176 measured speed and it is normalized with Intel's Core 2 results.\r
177 Also Benchmark checks possible hardware errors (RAM \r
178 errors in most cases). Benchmark uses these settings:\r
179 (-a1, -d21, -fb32, -mfbt4). You can change only -d parameter. \r
180 Also you can change the number of iterations. Example for 30 iterations:\r
181 LZMA b 30\r
182 Default number of iterations is 10.\r
183\r
184<Switches>\r
185 \r
186\r
187 -a{N}: set compression mode 0 = fast, 1 = normal\r
188 default: 1 (normal)\r
189\r
190 d{N}: Sets Dictionary size - [0, 30], default: 23 (8MB)\r
191 The maximum value for dictionary size is 1 GB = 2^30 bytes.\r
192 Dictionary size is calculated as DictionarySize = 2^N bytes. \r
193 For decompressing file compressed by LZMA method with dictionary \r
194 size D = 2^N you need about D bytes of memory (RAM).\r
195\r
196 -fb{N}: set number of fast bytes - [5, 273], default: 128\r
197 Usually big number gives a little bit better compression ratio \r
198 and slower compression process.\r
199\r
200 -lc{N}: set number of literal context bits - [0, 8], default: 3\r
201 Sometimes lc=4 gives gain for big files.\r
202\r
203 -lp{N}: set number of literal pos bits - [0, 4], default: 0\r
204 lp switch is intended for periodical data when period is \r
205 equal 2^N. For example, for 32-bit (4 bytes) \r
206 periodical data you can use lp=2. Often it's better to set lc0, \r
207 if you change lp switch.\r
208\r
209 -pb{N}: set number of pos bits - [0, 4], default: 2\r
210 pb switch is intended for periodical data \r
211 when period is equal 2^N.\r
212\r
213 -mf{MF_ID}: set Match Finder. Default: bt4. \r
214 Algorithms from hc* group doesn't provide good compression \r
215 ratio, but they often works pretty fast in combination with \r
216 fast mode (-a0).\r
217\r
218 Memory requirements depend from dictionary size \r
219 (parameter "d" in table below). \r
220\r
221 MF_ID Memory Description\r
222\r
223 bt2 d * 9.5 + 4MB Binary Tree with 2 bytes hashing.\r
224 bt3 d * 11.5 + 4MB Binary Tree with 3 bytes hashing.\r
225 bt4 d * 11.5 + 4MB Binary Tree with 4 bytes hashing.\r
226 hc4 d * 7.5 + 4MB Hash Chain with 4 bytes hashing.\r
227\r
228 -eos: write End Of Stream marker. By default LZMA doesn't write \r
229 eos marker, since LZMA decoder knows uncompressed size \r
230 stored in .lzma file header.\r
231\r
232 -si: Read data from stdin (it will write End Of Stream marker).\r
233 -so: Write data to stdout\r
234\r
235\r
236Examples:\r
237\r
2381) LZMA e file.bin file.lzma -d16 -lc0 \r
239\r
240compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K) \r
241and 0 literal context bits. -lc0 allows to reduce memory requirements \r
242for decompression.\r
243\r
244\r
2452) LZMA e file.bin file.lzma -lc0 -lp2\r
246\r
247compresses file.bin to file.lzma with settings suitable \r
248for 32-bit periodical data (for example, ARM or MIPS code).\r
249\r
2503) LZMA d file.lzma file.bin\r
251\r
252decompresses file.lzma to file.bin.\r
253\r
254\r
255Compression ratio hints\r
256-----------------------\r
257\r
258Recommendations\r
259---------------\r
260\r
261To increase the compression ratio for LZMA compressing it's desirable \r
262to have aligned data (if it's possible) and also it's desirable to locate\r
263data in such order, where code is grouped in one place and data is \r
264grouped in other place (it's better than such mixing: code, data, code,\r
265data, ...).\r
266\r
267\r
268Filters\r
269-------\r
270You can increase the compression ratio for some data types, using\r
271special filters before compressing. For example, it's possible to \r
272increase the compression ratio on 5-10% for code for those CPU ISAs: \r
273x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC.\r
274\r
275You can find C source code of such filters in C/Bra*.* files\r
276\r
277You can check the compression ratio gain of these filters with such \r
2787-Zip commands (example for ARM code):\r
279No filter:\r
280 7z a a1.7z a.bin -m0=lzma\r
281\r
282With filter for little-endian ARM code:\r
283 7z a a2.7z a.bin -m0=arm -m1=lzma \r
284\r
285It works in such manner:\r
286Compressing = Filter_encoding + LZMA_encoding\r
287Decompressing = LZMA_decoding + Filter_decoding\r
288\r
289Compressing and decompressing speed of such filters is very high,\r
290so it will not increase decompressing time too much.\r
291Moreover, it reduces decompression time for LZMA_decoding, \r
292since compression ratio with filtering is higher.\r
293\r
294These filters convert CALL (calling procedure) instructions \r
295from relative offsets to absolute addresses, so such data becomes more \r
296compressible.\r
297\r
298For some ISAs (for example, for MIPS) it's impossible to get gain from such filter.\r
299\r
300\r
301LZMA compressed file format\r
302---------------------------\r
303Offset Size Description\r
304 0 1 Special LZMA properties (lc,lp, pb in encoded form)\r
305 1 4 Dictionary size (little endian)\r
306 5 8 Uncompressed size (little endian). -1 means unknown size\r
307 13 Compressed data\r
308\r
309\r
310ANSI-C LZMA Decoder\r
311~~~~~~~~~~~~~~~~~~~\r
312\r
313Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.\r
314If you want to use old interfaces you can download previous version of LZMA SDK\r
315from sourceforge.net site.\r
316\r
317To use ANSI-C LZMA Decoder you need the following files:\r
3181) LzmaDec.h + LzmaDec.c + Types.h\r
319LzmaUtil/LzmaUtil.c is example application that uses these files.\r
320\r
321\r
322Memory requirements for LZMA decoding\r
323-------------------------------------\r
324\r
325Stack usage of LZMA decoding function for local variables is not \r
326larger than 200-400 bytes.\r
327\r
328LZMA Decoder uses dictionary buffer and internal state structure.\r
329Internal state structure consumes\r
330 state_size = (4 + (1.5 << (lc + lp))) KB\r
331by default (lc=3, lp=0), state_size = 16 KB.\r
332\r
333\r
334How To decompress data\r
335----------------------\r
336\r
337LZMA Decoder (ANSI-C version) now supports 2 interfaces:\r
3381) Single-call Decompressing\r
3392) Multi-call State Decompressing (zlib-like interface)\r
340\r
341You must use external allocator:\r
342Example:\r
343void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }\r
344void SzFree(void *p, void *address) { p = p; free(address); }\r
345ISzAlloc alloc = { SzAlloc, SzFree };\r
346\r
347You can use p = p; operator to disable compiler warnings.\r
348\r
349\r
350Single-call Decompressing\r
351-------------------------\r
352When to use: RAM->RAM decompressing\r
353Compile files: LzmaDec.h + LzmaDec.c + Types.h\r
354Compile defines: no defines\r
355Memory Requirements:\r
356 - Input buffer: compressed size\r
357 - Output buffer: uncompressed size\r
358 - LZMA Internal Structures: state_size (16 KB for default settings) \r
359\r
360Interface:\r
361 int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,\r
362 const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, \r
363 ELzmaStatus *status, ISzAlloc *alloc);\r
364 In: \r
365 dest - output data\r
366 destLen - output data size\r
367 src - input data\r
368 srcLen - input data size\r
369 propData - LZMA properties (5 bytes)\r
370 propSize - size of propData buffer (5 bytes)\r
371 finishMode - It has meaning only if the decoding reaches output limit (*destLen).\r
372 LZMA_FINISH_ANY - Decode just destLen bytes.\r
373 LZMA_FINISH_END - Stream must be finished after (*destLen).\r
374 You can use LZMA_FINISH_END, when you know that \r
375 current output buffer covers last bytes of stream. \r
376 alloc - Memory allocator.\r
377\r
378 Out: \r
379 destLen - processed output size \r
380 srcLen - processed input size \r
381\r
382 Output:\r
383 SZ_OK\r
384 status:\r
385 LZMA_STATUS_FINISHED_WITH_MARK\r
386 LZMA_STATUS_NOT_FINISHED \r
387 LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK\r
388 SZ_ERROR_DATA - Data error\r
389 SZ_ERROR_MEM - Memory allocation error\r
390 SZ_ERROR_UNSUPPORTED - Unsupported properties\r
391 SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).\r
392\r
393 If LZMA decoder sees end_marker before reaching output limit, it returns OK result,\r
394 and output value of destLen will be less than output buffer size limit.\r
395\r
396 You can use multiple checks to test data integrity after full decompression:\r
397 1) Check Result and "status" variable.\r
398 2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.\r
399 3) Check that output(srcLen) = compressedSize, if you know real compressedSize. \r
400 You must use correct finish mode in that case. */ \r
401\r
402\r
403Multi-call State Decompressing (zlib-like interface)\r
404----------------------------------------------------\r
405\r
406When to use: file->file decompressing \r
407Compile files: LzmaDec.h + LzmaDec.c + Types.h\r
408\r
409Memory Requirements:\r
410 - Buffer for input stream: any size (for example, 16 KB)\r
411 - Buffer for output stream: any size (for example, 16 KB)\r
412 - LZMA Internal Structures: state_size (16 KB for default settings) \r
413 - LZMA dictionary (dictionary size is encoded in LZMA properties header)\r
414\r
4151) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:\r
416 unsigned char header[LZMA_PROPS_SIZE + 8];\r
417 ReadFile(inFile, header, sizeof(header)\r
418\r
4192) Allocate CLzmaDec structures (state + dictionary) using LZMA properties\r
420\r
421 CLzmaDec state;\r
422 LzmaDec_Constr(&state);\r
423 res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);\r
424 if (res != SZ_OK)\r
425 return res;\r
426\r
4273) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop\r
428\r
429 LzmaDec_Init(&state);\r
430 for (;;)\r
431 {\r
432 ... \r
433 int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, \r
434 const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);\r
435 ...\r
436 }\r
437\r
438\r
4394) Free all allocated structures\r
440 LzmaDec_Free(&state, &g_Alloc);\r
441\r
442For full code example, look at C/LzmaUtil/LzmaUtil.c code.\r
443\r
444\r
445How To compress data\r
446--------------------\r
447\r
448Compile files: LzmaEnc.h + LzmaEnc.c + Types.h +\r
449LzFind.c + LzFind.h + LzFindMt.c + LzFindMt.h + LzHash.h\r
450\r
451Memory Requirements:\r
452 - (dictSize * 11.5 + 6 MB) + state_size\r
453\r
454Lzma Encoder can use two memory allocators:\r
4551) alloc - for small arrays.\r
4562) allocBig - for big arrays.\r
457\r
458For example, you can use Large RAM Pages (2 MB) in allocBig allocator for \r
459better compression speed. Note that Windows has bad implementation for \r
460Large RAM Pages. \r
461It's OK to use same allocator for alloc and allocBig.\r
462\r
463\r
464Single-call Compression with callbacks\r
465--------------------------------------\r
466\r
467Check C/LzmaUtil/LzmaUtil.c as example, \r
468\r
469When to use: file->file decompressing \r
470\r
4711) you must implement callback structures for interfaces:\r
472ISeqInStream\r
473ISeqOutStream\r
474ICompressProgress\r
475ISzAlloc\r
476\r
477static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }\r
478static void SzFree(void *p, void *address) { p = p; MyFree(address); }\r
479static ISzAlloc g_Alloc = { SzAlloc, SzFree };\r
480\r
481 CFileSeqInStream inStream;\r
482 CFileSeqOutStream outStream;\r
483\r
484 inStream.funcTable.Read = MyRead;\r
485 inStream.file = inFile;\r
486 outStream.funcTable.Write = MyWrite;\r
487 outStream.file = outFile;\r
488\r
489\r
4902) Create CLzmaEncHandle object;\r
491\r
492 CLzmaEncHandle enc;\r
493\r
494 enc = LzmaEnc_Create(&g_Alloc);\r
495 if (enc == 0)\r
496 return SZ_ERROR_MEM;\r
497\r
498\r
4993) initialize CLzmaEncProps properties;\r
500\r
501 LzmaEncProps_Init(&props);\r
502\r
503 Then you can change some properties in that structure.\r
504\r
5054) Send LZMA properties to LZMA Encoder\r
506\r
507 res = LzmaEnc_SetProps(enc, &props);\r
508\r
5095) Write encoded properties to header\r
510\r
511 Byte header[LZMA_PROPS_SIZE + 8];\r
512 size_t headerSize = LZMA_PROPS_SIZE;\r
513 UInt64 fileSize;\r
514 int i;\r
515\r
516 res = LzmaEnc_WriteProperties(enc, header, &headerSize);\r
517 fileSize = MyGetFileLength(inFile);\r
518 for (i = 0; i < 8; i++)\r
519 header[headerSize++] = (Byte)(fileSize >> (8 * i));\r
520 MyWriteFileAndCheck(outFile, header, headerSize)\r
521\r
5226) Call encoding function:\r
523 res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, \r
524 NULL, &g_Alloc, &g_Alloc);\r
525\r
5267) Destroy LZMA Encoder Object\r
527 LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);\r
528\r
529\r
530If callback function return some error code, LzmaEnc_Encode also returns that code.\r
531\r
532\r
533Single-call RAM->RAM Compression\r
534--------------------------------\r
535\r
536Single-call RAM->RAM Compression is similar to Compression with callbacks,\r
537but you provide pointers to buffers instead of pointers to stream callbacks:\r
538\r
539HRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,\r
540 CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, \r
541 ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);\r
542\r
543Return code:\r
544 SZ_OK - OK\r
545 SZ_ERROR_MEM - Memory allocation error \r
546 SZ_ERROR_PARAM - Incorrect paramater\r
547 SZ_ERROR_OUTPUT_EOF - output buffer overflow\r
548 SZ_ERROR_THREAD - errors in multithreading functions (only for Mt version)\r
549\r
550\r
551\r
552LZMA Defines\r
553------------\r
554\r
555_LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code.\r
556\r
557_LZMA_PROB32 - It can increase the speed on some 32-bit CPUs, but memory usage for \r
558 some structures will be doubled in that case.\r
559\r
560_LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler and long is 32-bit.\r
561\r
562_LZMA_NO_SYSTEM_SIZE_T - Define it if you don't want to use size_t type.\r
563\r
564\r
565C++ LZMA Encoder/Decoder \r
566~~~~~~~~~~~~~~~~~~~~~~~~\r
567C++ LZMA code use COM-like interfaces. So if you want to use it, \r
568you can study basics of COM/OLE.\r
569C++ LZMA code is just wrapper over ANSI-C code.\r
570\r
571\r
572C++ Notes\r
573~~~~~~~~~~~~~~~~~~~~~~~~\r
574If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),\r
575you must check that you correctly work with "new" operator.\r
5767-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.\r
577So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:\r
578operator new(size_t size)\r
579{\r
580 void *p = ::malloc(size);\r
581 if (p == 0)\r
582 throw CNewException();\r
583 return p;\r
584}\r
585If you use MSCV that throws exception for "new" operator, you can compile without \r
586"NewHandler.cpp". So standard exception will be used. Actually some code of \r
5877-Zip catches any exception in internal code and converts it to HRESULT code.\r
588So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.\r
589\r
590---\r
591\r
592http://www.7-zip.org\r
593http://www.7-zip.org/sdk.html\r
594http://www.7-zip.org/support.html\r