[ceph.git] / ceph / src / isa-l / Release_notes.txt

v2.30 Intel Intelligent Storage Acceleration Library Release Notes
==================================================================

RELEASE NOTE CONTENTS
1. KNOWN ISSUES
2. FIXED ISSUES
3. CHANGE LOG & FEATURES ADDED

1. KNOWN ISSUES
----------------

* Perf tests do not run in Windows environment.

* 32-bit lib is not supported in Windows.

2. FIXED ISSUES
---------------
v2.30

* Intel CET support.
* Windows nasm support fix.

v2.28

* Fix documentation on gf_vect_mad(). Min length listed as 32 instead of
  required min 64 bytes.

v2.27

* Fix lack of install for pkg-config files

v2.26

* Fixes for sanitizer warnings.

v2.25

* Fix for nasm on Mac OS X/darwin.

v2.24

* Fix for crc32_iscsi().  Potential read-over for small buffer.  For an input
  buffer length of less than 8 bytes and aligned to an 8 byte boundary, function
  could read past length.  Previously had the possibility to cause a seg fault
  only for length 0 and invalid buffer passed.  Calculated CRC is unchanged.

* Fix for compression/decompression of > 4GB files.  For streaming compression
  of extremely large files, the total_out parameter would wrap and could
  potentially flag an otherwise valid lookback distance as being invalid.
  Total_out is still 32bit for zlib compatibility.  No inconsistent compressed
  buffers were generated by the issue.

v2.23

* Fix for histogram generation base function.
* Fix library build warnings on macOS.
* Fix igzip to use bsf instruction when tzcnt is not available.

v2.22

* Fix ISA-L builds for other architectures.  Base function and examples
  sanitized for non-IA builds.

* Fix fuzz test script to work with llvm 6.0 builtin libFuzz.

v2.20

* Inflate total_out behavior corrected for in-progress decompression.
  Previously total_out represented the total bytes decompressed into the output
  buffer or temp internal buffer.  This is changed to be only the bytes put into
  the output buffer.

* Fixed issue with isal_create_hufftables_subset.  Affects semi-dynamic
  compression use case when explicitly creating hufftables from histogram.  The
  _hufftables_subset function could fail to generate length symbols for any
  length that were never seen.

v2.19

* Fix erasure code test that violates rs matrix bounds.

* Fix 0 length file and looping errors in igzip_inflate_test.

v2.18

* Mac OS X/darwin systems no longer require the --target=darwin config option.
  The autoconf canonical build should detect.

v2.17

* Fix igzip using 32K window and a shared object

* Fix igzip undefined instruction error on Nehalem.

* Fixed issue in crc performance tests where OS optimizations turned cold cache
  tests into warm tests.

v2.15

* Fix for windows register save in gf_6vect_mad_avx2.asm.  Only affects windows
  versions of ec_encode_data_update() running with AVX2.  A GP register was not
  properly restored resulting in corruption on return.

v2.14

* Building in unit directories is no longer supported removing the issue of
  leftover object files causing the top-level make build to fail.

v2.10

* Fix for windows register save overlap in gf_{3-6}vect_dot_prod_sse.asm. Only
  affects windows versions of erasure code.  GP register saves/restore were
  pushed to same stack area as XMM.

3. CHANGE LOG & FEATURES ADDED
------------------------------
v2.30

* Igzip compression enhancements.
  - New functions for dictionary acceleration. Split dictionary processing and
    resetting can greatly accelerate the performance of compressing many small
    files with a dictionary.
  - New static level 0 header decode tables. Accelerates decompressing small
    files that are level 0 compressed by skipping the known header parsing.
  - New feature for igzip cli tool: support for concatenated .gz files. On
    decompression, igzip will process a series of independent, concatenated .gz
    files into one output stream.

* CRC Improvements
  - New vclmul version of crc32_iscsi().
  - Updates for aarch64.

v2.29

* CRC Improvements
  - New AVX512 vclmul versions of crc16_t10dif(), crc32_ieee(), crc32_gzip_refl.

* Erasure code improvements
  - Added AVX512 ec functions with 5 and 6 outputs. Can improve performance for
    codes with 5 or more parity by running in batches of up to 6 at a time.

v2.28

* New next-arch versions of 64-bit CRC. All norm and reflected 64-bit
  polynomials are expanded to utilize vpclmulqdq.

v2.27

* New multi-threaded compression option for igzip cli tool

v2.26

* Adler32 added to external API.
* Multi-arch improvements.
* Performance test improvements.

v2.25

* Igzip performance improvements and features.
  - Performance improvements for uncompressable files. Random or uncompressable
    files can be up to 3x faster in level 1 or 2 compression.
  - Additional small file performance improvments.
  - New options in igzip cli: use name from header or not, test compressed file.

* Multi-arch autoconf script.
  - Autoconf should detect architecture and run base functions at minimum.

v2.24

* Igzip small file performance improvements and new features.
  - Better performance on small files.
  - New gzip/zlib header and trailer handling.
  - New gzip/zlib header parsing helper functions.
  - New user-space compression/decompression tool igzip.

* New mem unit added with first function isal_zero_detect().

v2.23

* Igzip inflate (decompression) performance improvements.
  - Implemented multi-byte decode for inflate.  Decode can pack up to three
    symbols into the decode table making some compressed streams decompress much
    faster depending on the prevalence of short codes.

v2.22

* Igzip: AVX2 version of level 3 compression added.

* Erasure code examples
  - New examples for standard EC encode and decode.
  - Example of piggyback EC encode and decode.

v2.21

* Igzip improvements
  - New compression levels added.  ISA-L fast deflate now has more levels to
    balance speed vs. target compression level.  Level 0, 1 are as in previous
    generations.  New levels 2 & 3 target higher compression roughly comparable
    to zlib levels 2-3.  Level 3 is currently only optimized for processors with
    AVX512 instructions.

* New T10dif & copy function - crc16_t10dif_copy()
  - CRC and copy was added to emulate T10dif operations such as DIF insert and
    strip.  This function stitches together CRC and memcpy operations
    eliminating an extra data read.

* CRC32 iscsi performance improvements
  - Fixes issue under some distributions where warm cache performance was
    reduced.

v2.20

* Igzip improvements
  - Optimized deflate_hash in compression functions.
    Improves performance of using preset dictionary.
  - Removed alignment restrictions on input structure.

v2.19

* Igzip improvements

  - Add optimized Adler-32 checksum.

  - Implement zlib compression format.

  - Add stateful dictionary support.

  - Add struct reset functions for both deflate and inflate.

* Reflected IEEE format CRC32 is released out. Function interface is named
  crc32_gzip_refl.

* Exact work condition of Erasure Code Reed-Solomon Matrix is determined by new
  added program gen_rs_matrix_limits.

v2.18

* New 2-pass fully-dynamic deflate compression (level -1).  ISA-L fast deflate
  now has two levels.  Level 0 (default) is the same as previous generations.
  Setting to level 1 will switch to the fully-dynamic compression that will
  typically reach higher compression ratios.

* RAID AVX512 functions.

v2.17

* New fast decompression (inflate)

* Compression improvements (deflate)
  - Speed and compression ratio improvements.
  - Fast custom Huffman code generation.
  - New features:
    * Run-time option of gzip crc calculation and headers/trailer.
    * Choice of static header (BTYPE 01) blocks.
    * LARGE_WINDOW, 32K history, now default.
    * Stateless full flush mode.

* CRC64
  - Six new 64-bit polynomials supported. Normal and reflected versions of ECMA,
    ISO and Jones polynomials.

v2.16

* Units added: crc, raid, igzip (deflate compression).

v2.15

* Erasure code updates. New AVX512 versions.

* Nasm support.  ISA-L ported to build with nasm or yasm assembler.

* Windows DLL support.  Windows builds DLL by default.

v2.14

* Autoconf and autotools build allows easier porting to additional systems.
  Previous make system still available to embedded users with Makefile.unx.

* Includes update for building on Mac OS X/darwin systems. Add --target=darwin
  to ./configure step.

v2.13

* Erasure code improvments
  - 32-bit port of optimized gf_vect_dot_prod() functions.  This makes
    ec_encode_data() functions much faster on 32-bit processors.
  - Avoton performance improvements.  Performance on Avoton for
    gf_vect_dot_prod() and ec_encode_data() can improve by as much as 20%.

v2.11

* Incremental erasure code.  New functions added to erasure code to handle
  single source update of code blocks.  The function ec_encode_data_update()
  works with parameters similar to ec_encode_data() but are called incrementally
  with each source block.  These versions are useful when source blocks are not
  all available at once.

v2.10

* Erasure code updates
  - New AVX and AVX2 support functions.
  - Changes min len requirement on gf_vect_dot_prod() to 32 from 16.
  - Tests include both source and parity recovery with ec_encode_data().
  - New encoding examples with Vandermonde or Cauchy matrix.

v2.8

* First open release of erasure code unit that is part of ISA-L.
Commit	Line	Data
20effc67	1	v2.30 Intel Intelligent Storage Acceleration Library Release Notes
224ce89b	2	==================================================================
7c673cae	3
7c673cae	4	RELEASE NOTE CONTENTS
7c673cae FG	5	1. KNOWN ISSUES
	6	2. FIXED ISSUES
	7	3. CHANGE LOG & FEATURES ADDED
	8
224ce89b WB	9	1. KNOWN ISSUES
224ce89b WB	10	----------------
7c673cae FG	11
	12	* Perf tests do not run in Windows environment.
	13
	14	* 32-bit lib is not supported in Windows.
	15
7c673cae	16	2. FIXED ISSUES
224ce89b	17	---------------
20effc67 TL	18	v2.30
	19
	20	* Intel CET support.
	21	* Windows nasm support fix.
	22
f91f0fd5 TL	23	v2.28
	24
	25	* Fix documentation on gf_vect_mad(). Min length listed as 32 instead of
	26	required min 64 bytes.
	27
	28	v2.27
	29
	30	* Fix lack of install for pkg-config files
	31
	32	v2.26
	33
	34	* Fixes for sanitizer warnings.
	35
	36	v2.25
	37
	38	* Fix for nasm on Mac OS X/darwin.
	39
	40	v2.24
	41
	42	* Fix for crc32_iscsi(). Potential read-over for small buffer. For an input
	43	buffer length of less than 8 bytes and aligned to an 8 byte boundary, function
	44	could read past length. Previously had the possibility to cause a seg fault
	45	only for length 0 and invalid buffer passed. Calculated CRC is unchanged.
	46
	47	* Fix for compression/decompression of > 4GB files. For streaming compression
	48	of extremely large files, the total_out parameter would wrap and could
	49	potentially flag an otherwise valid lookback distance as being invalid.
	50	Total_out is still 32bit for zlib compatibility. No inconsistent compressed
	51	buffers were generated by the issue.
	52
	53	v2.23
	54
	55	* Fix for histogram generation base function.
	56	* Fix library build warnings on macOS.
	57	* Fix igzip to use bsf instruction when tzcnt is not available.
	58
	59	v2.22
	60
	61	* Fix ISA-L builds for other architectures. Base function and examples
	62	sanitized for non-IA builds.
	63
	64	* Fix fuzz test script to work with llvm 6.0 builtin libFuzz.
	65
	66	v2.20
	67
	68	* Inflate total_out behavior corrected for in-progress decompression.
	69	Previously total_out represented the total bytes decompressed into the output
	70	buffer or temp internal buffer. This is changed to be only the bytes put into
	71	the output buffer.
	72
	73	* Fixed issue with isal_create_hufftables_subset. Affects semi-dynamic
	74	compression use case when explicitly creating hufftables from histogram. The
	75	_hufftables_subset function could fail to generate length symbols for any
	76	length that were never seen.
	77
	78	v2.19
	79
	80	* Fix erasure code test that violates rs matrix bounds.
	81
	82	* Fix 0 length file and looping errors in igzip_inflate_test.
224ce89b WB	83
	84	v2.18
	85
	86	* Mac OS X/darwin systems no longer require the --target=darwin config option.
	87	The autoconf canonical build should detect.
	88
	89	v2.17
	90
	91	* Fix igzip using 32K window and a shared object
	92
	93	* Fix igzip undefined instruction error on Nehalem.
	94
	95	* Fixed issue in crc performance tests where OS optimizations turned cold cache
	96	tests into warm tests.
	97
7c673cae FG	98	v2.15
	99
	100	* Fix for windows register save in gf_6vect_mad_avx2.asm. Only affects windows
	101	versions of ec_encode_data_update() running with AVX2. A GP register was not
	102	properly restored resulting in corruption on return.
	103
	104	v2.14
	105
	106	* Building in unit directories is no longer supported removing the issue of
	107	leftover object files causing the top-level make build to fail.
	108
	109	v2.10
	110
	111	* Fix for windows register save overlap in gf_{3-6}vect_dot_prod_sse.asm. Only
	112	affects windows versions of erasure code. GP register saves/restore were
	113	pushed to same stack area as XMM.
	114
f91f0fd5	115	3. CHANGE LOG & FEATURES ADDED
224ce89b	116	------------------------------
20effc67 TL	117	v2.30
	118
	119	* Igzip compression enhancements.
	120	- New functions for dictionary acceleration. Split dictionary processing and
	121	resetting can greatly accelerate the performance of compressing many small
	122	files with a dictionary.
	123	- New static level 0 header decode tables. Accelerates decompressing small
	124	files that are level 0 compressed by skipping the known header parsing.
	125	- New feature for igzip cli tool: support for concatenated .gz files. On
	126	decompression, igzip will process a series of independent, concatenated .gz
	127	files into one output stream.
	128
	129	* CRC Improvements
	130	- New vclmul version of crc32_iscsi().
	131	- Updates for aarch64.
	132
f91f0fd5 TL	133	v2.29
	134
	135	* CRC Improvements
	136	- New AVX512 vclmul versions of crc16_t10dif(), crc32_ieee(), crc32_gzip_refl.
	137
	138	* Erasure code improvements
	139	- Added AVX512 ec functions with 5 and 6 outputs. Can improve performance for
	140	codes with 5 or more parity by running in batches of up to 6 at a time.
	141
	142	v2.28
	143
	144	* New next-arch versions of 64-bit CRC. All norm and reflected 64-bit
	145	polynomials are expanded to utilize vpclmulqdq.
	146
	147	v2.27
	148
	149	* New multi-threaded compression option for igzip cli tool
	150
	151	v2.26
	152
	153	* Adler32 added to external API.
	154	* Multi-arch improvements.
	155	* Performance test improvements.
	156
	157	v2.25
	158
	159	* Igzip performance improvements and features.
	160	- Performance improvements for uncompressable files. Random or uncompressable
	161	files can be up to 3x faster in level 1 or 2 compression.
	162	- Additional small file performance improvments.
	163	- New options in igzip cli: use name from header or not, test compressed file.
	164
	165	* Multi-arch autoconf script.
	166	- Autoconf should detect architecture and run base functions at minimum.
	167
	168	v2.24
	169
	170	* Igzip small file performance improvements and new features.
	171	- Better performance on small files.
	172	- New gzip/zlib header and trailer handling.
	173	- New gzip/zlib header parsing helper functions.
	174	- New user-space compression/decompression tool igzip.
	175
	176	* New mem unit added with first function isal_zero_detect().
	177
	178	v2.23
	179
	180	* Igzip inflate (decompression) performance improvements.
	181	- Implemented multi-byte decode for inflate. Decode can pack up to three
	182	symbols into the decode table making some compressed streams decompress much
	183	faster depending on the prevalence of short codes.
	184
	185	v2.22
	186
	187	* Igzip: AVX2 version of level 3 compression added.
	188
	189	* Erasure code examples
	190	- New examples for standard EC encode and decode.
	191	- Example of piggyback EC encode and decode.
	192
	193	v2.21
	194
	195	* Igzip improvements
	196	- New compression levels added. ISA-L fast deflate now has more levels to
197	balance speed vs. target compression level. Level 0, 1 are as in previous
198	generations. New levels 2 & 3 target higher compression roughly comparable
199	to zlib levels 2-3. Level 3 is currently only optimized for processors with
200	AVX512 instructions.
201
202	* New T10dif & copy function - crc16_t10dif_copy()
203	- CRC and copy was added to emulate T10dif operations such as DIF insert and
204	strip. This function stitches together CRC and memcpy operations
205	eliminating an extra data read.
206
207	* CRC32 iscsi performance improvements
208	- Fixes issue under some distributions where warm cache performance was
209	reduced.
210
211	v2.20
212
213	* Igzip improvements
214	- Optimized deflate_hash in compression functions.
215	Improves performance of using preset dictionary.
216	- Removed alignment restrictions on input structure.
217
218	v2.19
219
220	* Igzip improvements
221
222	- Add optimized Adler-32 checksum.
223
224	- Implement zlib compression format.
225
226	- Add stateful dictionary support.
227
228	- Add struct reset functions for both deflate and inflate.
229
230	* Reflected IEEE format CRC32 is released out. Function interface is named
231	crc32_gzip_refl.
232
233	* Exact work condition of Erasure Code Reed-Solomon Matrix is determined by new
234	added program gen_rs_matrix_limits.
224ce89b WB	235
	236	v2.18
	237
	238	* New 2-pass fully-dynamic deflate compression (level -1). ISA-L fast deflate
	239	now has two levels. Level 0 (default) is the same as previous generations.
	240	Setting to level 1 will switch to the fully-dynamic compression that will
	241	typically reach higher compression ratios.
	242
	243	* RAID AVX512 functions.
	244
	245	v2.17
	246
	247	* New fast decompression (inflate)
	248
	249	* Compression improvements (deflate)
	250	- Speed and compression ratio improvements.
	251	- Fast custom Huffman code generation.
	252	- New features:
	253	* Run-time option of gzip crc calculation and headers/trailer.
	254	* Choice of static header (BTYPE 01) blocks.
	255	* LARGE_WINDOW, 32K history, now default.
	256	* Stateless full flush mode.
	257
	258	* CRC64
	259	- Six new 64-bit polynomials supported. Normal and reflected versions of ECMA,
	260	ISO and Jones polynomials.
	261
7c673cae FG	262	v2.16
	263
	264	* Units added: crc, raid, igzip (deflate compression).
	265
	266	v2.15
	267
	268	* Erasure code updates. New AVX512 versions.
	269
	270	* Nasm support. ISA-L ported to build with nasm or yasm assembler.
	271
	272	* Windows DLL support. Windows builds DLL by default.
	273
	274	v2.14
	275
	276	* Autoconf and autotools build allows easier porting to additional systems.
	277	Previous make system still available to embedded users with Makefile.unx.
	278
	279	* Includes update for building on Mac OS X/darwin systems. Add --target=darwin
	280	to ./configure step.
	281
	282	v2.13
	283
	284	* Erasure code improvments
	285	- 32-bit port of optimized gf_vect_dot_prod() functions. This makes
	286	ec_encode_data() functions much faster on 32-bit processors.
	287	- Avoton performance improvements. Performance on Avoton for
	288	gf_vect_dot_prod() and ec_encode_data() can improve by as much as 20%.
	289
	290	v2.11
	291
	292	* Incremental erasure code. New functions added to erasure code to handle
	293	single source update of code blocks. The function ec_encode_data_update()
	294	works with parameters similar to ec_encode_data() but are called incrementally
	295	with each source block. These versions are useful when source blocks are not
	296	all available at once.
	297
	298	v2.10
	299
	300	* Erasure code updates
	301	- New AVX and AVX2 support functions.
	302	- Changes min len requirement on gf_vect_dot_prod() to 32 from 16.
	303	- Tests include both source and parity recovery with ec_encode_data().
	304	- New encoding examples with Vandermonde or Cauchy matrix.
	305
	306	v2.8
	307
	308	* First open release of erasure code unit that is part of ISA-L.