]> git.proxmox.com Git - ceph.git/blob - ceph/src/spdk/intel-ipsec-mb/ReleaseNotes.txt
update sources to ceph Nautilus 14.2.1
[ceph.git] / ceph / src / spdk / intel-ipsec-mb / ReleaseNotes.txt
1 ========================================================================
2 Release Notes for Intel(R) Multi-Buffer Crypto for IPsec Library
3
4 v0.49 March 2018
5 ========================================================================
6
7 21 Mar, 2018
8
9 General
10 - AES-CMAC support added (AES-CMAC-128 and AES-CMAC-96)
11 - 3DES support added
12 - Library compiles to SO/DLL by default
13 - Install/uninstall targets added to makefiles
14 - Multiple API header files consolidated into one (intel-ipsec-mb.h)
15 - Unhalted cycles support added to LibPerfApp (Linux at the moment)
16 - ELF stack execute protection added for assembly files
17 - VZEROUPPER instruction issued after AVX2/AVX512 code to avoid
18 expensive SSE<->AVX transitions
19 - MAN page added
20 - README documentation extensions and updates
21 - AVX512 DES performance smoothed out
22 - Multi-buffer manager instance allocate and free API's added
23 - Core affinity support added in LibPerfApp
24
25 v0.48 December 2017
26 ========================================================================
27
28 12 Dec, 2017
29
30 General
31 - Linux SO compilation option added
32 - Windows DLL compilation option added
33 - AES CCM 128 support added
34 - Multithread command line option added to LibPerfApp
35 - Coding style fixes
36 - Coding style target added to Makefile
37
38 v0.47 October 2017
39 ========================================================================
40
41 Oct 5, 2017
42
43 Intel(R) AVX-512 Instructions
44 - DES CBC AVX512 implementation
45 - DOCSIS DES AVX512 implementation
46 General
47 - DES CBC cipher added (generic x86 implementation)
48 - DOCSIS DES cipher added (generic x86 implementation)
49 - DES and DOCSIS DES tests added
50 - RPM SPEC file created
51
52 v0.46 June 2017
53 ========================================================================
54
55 Jun 27, 2017
56
57 General
58 - AES GCM optimizations for AVX2
59 - Change of AES GCM API: renamed and expanded keys separated from the context
60 - New AES GCM API via job structure and API's
61 - use of the interface may simplify application design at the expense of
62 slightly lower performance vs direct AES GCM API's
63 - AES GCM IV automatically padded with block counter (no need for application to do it)
64 - IV in AES CTR mode can be 12 bytes (no block counter); 16 byte format still allowed
65 - Macros added to ease access to job API for specific architecture
66 - use of these macros can simplify application design but it may produce worse
67 performance than calling architecture job API's directly
68 - Submit_job_nocheck() API added to gain some cycles by not validating job structure
69 - Result stability improvements in LibPerfApp
70
71 v0.45 March 2017
72 ========================================================================
73
74 Mar 29, 2017
75
76 Intel(R) AVX-512 Instructions
77 - Added optimized HMAC-SHA224 and HMAC-SHA256
78 - Added optimized HMAC-SHA384 and HMAC-SHA512
79 General
80 - Windows x64 compilation target
81 - New DOCSIS SEC BPI V3.1 cipher
82 - GCM128 and GCM256 updates (with new API that is scatter gather list friendly)
83 - GCM192 added
84 - Added library API benchmark tool 'ipsec_perf' and
85 script to compare results 'ipsec_diff_tool.py'
86 Bug Fixes (vs v0.44)
87 - AES CTR mode fix to allow message size not to be multiple of AES block size
88 - RSI and RDI registers clobbered when running HMAC-SHA224 or HMAC-SHA256
89 on Windows using SHA extensions
90
91 v0.44 November 2016
92 ========================================================================
93
94 Nov 21, 2016
95
96 Intel(R) AVX-512 Instructions
97 - AVX512 multi buffer manager added (uses AVX2 implementations by default)
98 - Optimized SHA1 implementation added
99 Intel(R) SHA Extensions
100 - SHA1, SHA224 and SHA256 implementations added for Intel(R) SSE
101 General
102 - NULL cipher added
103 - NULL hash added
104 - NASM tool chain compilation added (default)
105
106 =======================================
107 Feb 11, 2015
108
109 Fixed, so that the job auth_tag_output_len_in_bytes takes a different
110 value for different MAC types. In particular, the valid values are(in bytes):
111 SHA1 - 12
112 sha224 - 14
113 SHA256 - 16
114 sha384 - 24
115 SHA512 - 32
116 XCBC - 12
117 MD5 - 12
118
119 =======================================
120 Oct 24, 2011
121
122 SHA_256 added to multibuffer
123 ------------------------
124 12 Aug 2011
125
126 API
127
128 The GCM API is distinct from the Multi-buffer API. This is because
129 the GCM code is an optimized single-buffer implementation. By
130 packaging them separately, the application has the option of where,
131 when, and how to call the GCM code, independent of how it is calling
132 the multi-buffer code.
133
134 For example, the application might be enqueing multi-buffer requests
135 for a separate thread to process. In this scenario, if a particular
136 packet used GCM, then the application could choose whether to call
137 the GCM routines directly, or whether to enqueue those requests and
138 have the compute thread call the GCM routines.
139
140 GCM API
141
142 The GCM functions are defined as described the the header
143 files. They are simple computational routines, with no state
144 associated with them.
145
146 Multi-Buffer API: Two Sets of Functions
147
148 There are two parallel interfaces, one suffixed with "_sse" and one
149 suffixed with "_avx". These are functionally equivalent. The "_sse"
150 functions work on WSM and later processors. The "_avx" functions
151 offer better performance, but they only run on processors after WSM.
152
153 The same interface object structures are used for both sets of
154 interfaces, although one cannot mix the two interfaces on the same
155 initialized object (e.g. it would be wrong to initialize with
156 init_mb_mgr_sse() and then to pass that to submit_job_avx() ). After
157 the MB_MGR structure has been initialized with one of the two
158 initialization functions (init_mb_mgr_sse() or init_mb_mgr_avx()),
159 only the corresponding functions should be used on it.
160
161 There are several ways in which an application could use these
162 interfaces.
163
164 1) Direct
165 If an application is only going to be run on a post-WSM machine,
166 it can just call the "_avx" functions directly. Conversely, if it
167 is just going to be run on WSM machines, it can call the "_sse"
168 functions directly.
169
170 2) Via Branches
171 If an application can run on both WSM and SNB and wants the
172 improved performance on SNB, then it can use some method to
173 determine if it is on SNB, and then use a conditional branch to
174 determine which function to call. E.g. this could be wrapped in a
175 macro along the lines of:
176 #define submit_job(mb_mgr) \
177 if (_use_avx) submit_job_avx(mb_mgr); \
178 else submit_job_sse(mb_mgr)
179
180 3) Via a Function Table
181 One can embed the function addresses into a structure, call them
182 through this structure, and change the structure based on which
183 set of functions one wishes to use, e.g.
184
185 struct funcs_t {
186 init_mb_mgr_t init_mb_mgr;
187 get_next_job_t get_next_job;
188 submit_job_t submit_job;
189 get_completed_job_t get_completed_job;
190 flush_job_t flush_job;
191 };
192
193 funcs_t funcs_sse = {
194 init_mb_mgr_sse,
195 get_next_job_sse,
196 submit_job_sse,
197 get_completed_job_sse,
198 flush_job_sse
199 };
200 funcs_t funcs_avx = {
201 init_mb_mgr_avx,
202 get_next_job_avx,
203 submit_job_avx,
204 get_completed_job_avx,
205 flush_job_avx
206 };
207 funcs_t *funcs = &funcs_sse;
208 ...
209 if (do_avx)
210 funcs = &funcs_avx;
211 ...
212 funcs->init_mb_mgr(&mb_mgr);
213
214 For simplicity in the rest of this document, the functions will be
215 refered to no suffix.
216
217 API: Overview
218
219 The basic unit of work is a "job". It is represented by a
220 JOB_AES_HMAC structure. It contains all of the information needed to
221 perform encryption/decryption and SHA1/HMAC authentication on one
222 buffer for IPSec processing.
223
224 The basic paradigm is that the application needs to be able to
225 provide new jobs before old jobs have completed processing. One
226 might call this an "asynchronous" interface.
227
228 The basic interface is that the application "submits" a job to the
229 multi-buffer manager (MB_MGR), and it may receive a completed job
230 back, or it may receive NULL. The returned job, if there is one,
231 will not be the same as the submitted job, but the jobs will be
232 returned in the same order in which they are submitted.
233
234 Since there can be a semi-arbitrary number of outstanding jobs,
235 management of the job object is handled by the MB_MGR. The
236 application gets a pointer to a new job object by calling
237 get_next_job(). It then fills in the data fields and submits it by
238 calling submit_job(). If a job is returned, then that job has been
239 completed, and the application should do whatever it needs to do in
240 order to further process that buffer.
241
242 The job object is not explicitly returned to the MB_MGR. Rather it
243 is implicitly returned by the next call to get_next_job(). Another
244 way to put this is that the data within the job object is
245 guaranteed to be valid until the next call to get_next_job().
246
247 In order to reduce latency, there is an optional function that may
248 be called, get_completed_job(). This returns the next job if that
249 job has previously been completed. But if that job has not been
250 completed, no processing is done, and the function returns
251 NULL. This may be used to reduce the number of outstanding jobs
252 within the MB_MGR.
253
254 At times, it may be necessary to process the jobs currently within
255 the MB_MGR without providing new jobs as input. This process is
256 called "flushing", and it is invoked by calling flush_job(). If
257 there are any jobs within the MB_MGR, this will complete processing
258 on the earliest job and return it. It will only return NULL if there
259 are no jobs within the MB_MGR.
260
261 Flushing will be described in more detail below.
262
263 The presumption is that the same AES key will apply to a number of
264 buffers. For increased efficiency, it requires that the AES key
265 expansion happens as a distinct step apart from buffer
266 encryption/decryption. The expanded keys are stored in a data
267 structure (array), and this expanded key structure is used by the
268 job object.
269
270 There are two variants provided, MB_MGR and MB_MGR2. They are
271 functionally equivalent. The reason that two are provided is that
272 they differ slightly in their implementation, and so they may have
273 slightly different characteristics in terms of latency and overhead.
274
275 API: Usage Skeleton
276 The basic usage is illustrated in the following pseudo_code:
277
278 init_mb_mgr(&mb_mgr);
279 ...
280 aes_keyexp_128(key, enc_exp_keys, dec_exp_keys);
281 ...
282 while (work_to_be_done) {
283 job = get_next_job(&mb_mgr);
284 // TODO: Fill in job fields
285 job = submit_job(&mb_mgr);
286 while (job) {
287 // TODO: Complete processing on job
288 job = get_completed_job(&mb_mgr);
289 }
290 }
291
292 API: Job Fields
293 The mode is determined by the fields "cipher_direction" and
294 "chain_order". The first specifies encrypt or decrypt, and the
295 second specifies whether whether the hash should be done before or
296 after the cipher operation.
297 In the current implementation, only two combinations of these are
298 supported. For encryption, these should be set to "ENCRYPT" and
299 "CIPHER_HASH", and for decryption, these should be set to "DECRYPT"
300 and "HASH_CIPHER".
301
302 The expanded keys are pointed to by "aes_enc_key_expanded" and
303 "aes_dec_key_expanded". These arrays must be aligned on a 16-byte
304 boundary. Only one of these is necessary (as determined by
305 "cipher_direction").
306
307 One selects AES128 vs AES256 by using the "aes_key_len_in_bytes"
308 field. The only valid values are 16 (AES128) and 32 (AES256).
309
310 One selects the AES mode (CBC versus counter-mode) using
311 "cipher_mode".
312
313 One selects the hash algorith (SHA1-HMAC, AES-XCBC, or MD5-HMAC)
314 using "hash_alg".
315
316 The data to be encrypted/decrypted is defined by
317 "src + cipher_start_src_offset_in_bytes". The length of data is
318 given by "msg_len_to_cipher_in_bytes". It must be a multiple of
319 16 bytes.
320
321 The destination for the cipher operation is given by "dst" (NOT by
322 "dst + cipher_start_src_offset_in_bytes". In many/most applications,
323 the destination pointer may overlap the source pointer. That is,
324 "dst" may be equal to "src + cipher_start_src_offset_in_bytes".
325
326 The IV for the cipher operation is given by "iv". The
327 "iv_len_in_bytes" should be 16. This pointer does not need to be
328 aligned.
329
330 The data to be hashed is defined by
331 "src + hash_start_src_offset_in_bytes". The length of data is
332 given by "msg_len_to_hash_in_bytes".
333
334 The output of the hash operation is defined by
335 "auth_tag_output". The number of bytes written is given by
336 "auth_tag_output_len_in_bytes". Currently the only valid value for
337 this parameter is 12.
338
339 The ipad and opad are given as the result of hashing the HMAC key
340 xor'ed with the appropriate value. That is, rather than passing in
341 the HMAC key and rehashing the initial block for every buffer, the
342 hashing of the initial block is done separately, and the results of
343 this hash are used as input in the job structure.
344
345 Similar to the expanded AES keys, the premise here is that one HMAC
346 key will apply to many buffers, so we want to do that hashing once
347 and not for each buffer.
348
349 The "status" reflects the status of the returned job. It should be
350 "STS_COMPLETED".
351
352 The "user_data" field is ignored. It can be used to attach
353 application data to the job object.
354
355 Flushing Concerns
356 As long as jobs are coming in at a reasonable rate, jobs should be
357 returned at a reasonable rate. However, if there is a lull in the
358 arrival of new jobs, the last few jobs that were submitted tend to
359 stay in the MB_MGR until new jobs arrive. This might result in there
360 being an unreasonable latency for these jobs.
361
362 In this case, flush_job() should be used to complete processing on
363 these outstanding jobs and prevent them from having excessive
364 latency.
365
366 Exactly when and how to use flush_job() is up to the application,
367 and is a balancing act. The processing of flush_job() is less
368 efficient than that of submit_job(), so calling flush_job() too
369 often will lower the system efficiency. Conversely, calling
370 flush_job() too rarely may result in some jobs seeing excessive
371 latency.
372
373 There are several strategies that the application may employ for
374 flushing. One usage model is that there is a (thread-safe) queue
375 containing work items. One or more threads puts work onto this
376 queue, and one or more processing threads removes items from this
377 queue and processes them through the MB_MGR. In this usage, a simple
378 flushing strategy is that when the processing thread wants to do
379 more work, but the queue is empty, it then proceeds to flush jobs
380 until either the queue contains more work, or the MB_MGR no longer
381 contains jobs (i.e. that flush_job() returns NULL). A variation on
382 this is that when the work queue is empty, the processing thread
383 might pause for a short time to see if any new work appears, before
384 it starts flushing.
385
386 In other usage models, there may be no such queue. An alternate
387 flushing strategy is that have a separate "flush thread" hanging
388 around. It wakes up periodically and checks to see if any work has
389 been requested since the last time it woke up. If some period of
390 time has gone by with no new work appearing, it would proceed to
391 flush the MB_MGR.
392
393 AES Key Usage
394 If the AES mode is CBC, then the fields aes_enc_key_expanded or
395 aes_dec_key_expanded are using depending on whether the data is
396 being encrypted or decrypted. However, if the AES mode is CNTR
397 (counter mode), then only aes_enc_key_expanded is used, even for a
398 decrypt operation.
399
400 The application can handle this dichotomy, or it might choose to
401 simply set both fields in all cases.
402
403 Thread Safety
404 The MB_MGR and the associated functions ARE NOT thread safe. If
405 there are multiple threads that may be calling these functions
406 (e.g. a processing thread and a flushing thread), it is the
407 responsibility of the application to put in place sufficient locking
408 so that no two threads will make calls to the same MB_MGR object at
409 the same time.
410
411 XMM Register Usage
412 The current implementation is designed for integration in the Linux
413 Kernel. All of the functions satisfy the Linux ABI with respect to
414 general purpose registers. However, the submit_job() and flush_job()
415 functions use XMM registers without saving/restoring any of them. It
416 is up to the application to manage the saving/restoring of XMM
417 registers itself.
418
419 Auxiliary Functions
420 There are several auxiliary functions packed with MB_MGR. These may
421 be used, or the application may choose to use their own version. Two
422 of these, aes_keyexp_128() and aes_keyexp_256() expand AES keys into
423 a form that is acceptable for reference in the job structure.
424
425 In the case of AES128, the expanded key structure should be an array
426 of 11 128-bit words, aligned on a 16-byte boundary. In the case of
427 AES256, it should be an array of 15 128-bit words, aligned on a
428 16-byte boundary.
429
430 There is also a function, sha1(), which will compute the SHA1 digest
431 of a single 64-byte block. It can be used to compute the ipad and
432 opad digests. There is a similar function, md5(), which can be used
433 when using MD5-HMAC.
434
435 For further details on the usage of these functions, see the sample
436 test application.