]> git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/blame - Documentation/core-api/xarray.rst
dt-bindings: mfd: ti,j721e-system-controller.yaml: Add J721e system controller
[mirror_ubuntu-jammy-kernel.git] / Documentation / core-api / xarray.rst
CommitLineData
992a8e60
MW
1.. SPDX-License-Identifier: GPL-2.0+
2
3======
4XArray
5======
6
7:Author: Matthew Wilcox
8
9Overview
10========
11
12The XArray is an abstract data type which behaves like a very large array
13of pointers. It meets many of the same needs as a hash or a conventional
14resizable array. Unlike a hash, it allows you to sensibly go to the
15next or previous entry in a cache-efficient manner. In contrast to a
16resizable array, there is no need to copy data or change MMU mappings in
17order to grow the array. It is more memory-efficient, parallelisable
18and cache friendly than a doubly-linked list. It takes advantage of
19RCU to perform lookups without locking.
20
21The XArray implementation is efficient when the indices used are densely
22clustered; hashing the object and using the hash as the index will not
23perform well. The XArray is optimised for small indices, but still has
24good performance with large indices. If your index can be larger than
25``ULONG_MAX`` then the XArray is not the data type for you. The most
26important user of the XArray is the page cache.
27
992a8e60 28Normal pointers may be stored in the XArray directly. They must be 4-byte
9c79df7f
JC
29aligned, which is true for any pointer returned from kmalloc() and
30alloc_page(). It isn't true for arbitrary user-space pointers,
992a8e60
MW
31nor for function pointers. You can store pointers to statically allocated
32objects, as long as those objects have an alignment of at least 4.
33
34You can also store integers between 0 and ``LONG_MAX`` in the XArray.
9c79df7f 35You must first convert it into an entry using xa_mk_value().
992a8e60 36When you retrieve an entry from the XArray, you can check whether it is
9c79df7f
JC
37a value entry by calling xa_is_value(), and convert it back to
38an integer by calling xa_to_value().
992a8e60 39
6b81141d
MWO
40Some users want to tag the pointers they store in the XArray. You can
41call xa_tag_pointer() to create an entry with a tag, xa_untag_pointer()
42to turn a tagged entry back into an untagged pointer and xa_pointer_tag()
43to retrieve the tag of an entry. Tagged pointers use the same bits that
44are used to distinguish value entries from normal pointers, so you must
992a8e60
MW
45decide whether they want to store value entries or tagged pointers in
46any particular XArray.
47
9c79df7f 48The XArray does not support storing IS_ERR() pointers as some
992a8e60
MW
49conflict with value entries or internal entries.
50
51An unusual feature of the XArray is the ability to create entries which
52occupy a range of indices. Once stored to, looking up any index in
53the range will return the same entry as looking up any other index in
6b81141d
MWO
54the range. Storing to any index will store to all of them. Multi-index
55entries can be explicitly split into smaller entries, or storing ``NULL``
56into any entry will cause the XArray to forget about the range.
992a8e60
MW
57
58Normal API
59==========
60
9c79df7f
JC
61Start by initialising an XArray, either with DEFINE_XARRAY()
62for statically allocated XArrays or xa_init() for dynamically
992a8e60
MW
63allocated ones. A freshly-initialised XArray contains a ``NULL``
64pointer at every index.
65
9c79df7f
JC
66You can then set entries using xa_store() and get entries
67using xa_load(). xa_store will overwrite any entry with the
992a8e60 68new entry and return the previous entry stored at that index. You can
9c79df7f 69use xa_erase() instead of calling xa_store() with a
992a8e60 70``NULL`` entry. There is no difference between an entry that has never
804dfaf0
MW
71been stored to, one that has been erased and one that has most recently
72had ``NULL`` stored to it.
992a8e60
MW
73
74You can conditionally replace an entry at an index by using
9c79df7f 75xa_cmpxchg(). Like cmpxchg(), it will only succeed if
992a8e60
MW
76the entry at that index has the 'old' value. It also returns the entry
77which was at that index; if it returns the same entry which was passed as
9c79df7f 78'old', then xa_cmpxchg() succeeded.
992a8e60
MW
79
80If you want to only store a new entry to an index if the current entry
9c79df7f 81at that index is ``NULL``, you can use xa_insert() which
fd9dc93e 82returns ``-EBUSY`` if the entry is not empty.
992a8e60 83
992a8e60 84You can copy entries out of the XArray into a plain array by calling
00ed452c
MWO
85xa_extract(). Or you can iterate over the present entries in the XArray
86by calling xa_for_each(), xa_for_each_start() or xa_for_each_range().
87You may prefer to use xa_find() or xa_find_after() to move to the next
88present entry in the XArray.
992a8e60 89
9c79df7f 90Calling xa_store_range() stores the same entry in a range
0e9446c3
MW
91of indices. If you do this, some of the other operations will behave
92in a slightly odd way. For example, marking the entry at one index
93may result in the entry being marked at some, but not all of the other
94indices. Storing into one index may result in the entry retrieved by
95some, but not all of the other indices changing.
96
9c79df7f
JC
97Sometimes you need to ensure that a subsequent call to xa_store()
98will not need to allocate memory. The xa_reserve() function
b0606fed
MW
99will store a reserved entry at the indicated index. Users of the
100normal API will see this entry as containing ``NULL``. If you do
9c79df7f 101not need to use the reserved entry, you can call xa_release()
b0606fed 102to remove the unused entry. If another user has stored to the entry
9c79df7f
JC
103in the meantime, xa_release() will do nothing; if instead you
104want the entry to become ``NULL``, you should use xa_erase().
105Using xa_insert() on a reserved entry will fail.
4c0608f4 106
9c79df7f 107If all entries in the array are ``NULL``, the xa_empty() function
804dfaf0
MW
108will return ``true``.
109
992a8e60 110Finally, you can remove all entries from an XArray by calling
9c79df7f 111xa_destroy(). If the XArray entries are pointers, you may wish
992a8e60 112to free the entries first. You can do this by iterating over all present
9c79df7f 113entries in the XArray using the xa_for_each() iterator.
992a8e60 114
6b81141d
MWO
115Search Marks
116------------
117
118Each entry in the array has three bits associated with it called marks.
119Each mark may be set or cleared independently of the others. You can
120iterate over marked entries by using the xa_for_each_marked() iterator.
121
122You can enquire whether a mark is set on an entry by using
123xa_get_mark(). If the entry is not ``NULL``, you can set a mark on it
124by using xa_set_mark() and remove the mark from an entry by calling
125xa_clear_mark(). You can ask whether any entry in the XArray has a
126particular mark set by calling xa_marked(). Erasing an entry from the
127XArray causes all marks associated with that entry to be cleared.
128
129Setting or clearing a mark on any index of a multi-index entry will
130affect all indices covered by that entry. Querying the mark on any
131index will return the same result.
132
133There is no way to iterate over entries which are not marked; the data
134structure does not allow this to be implemented efficiently. There are
135not currently iterators to search for logical combinations of bits (eg
136iterate over all entries which have both ``XA_MARK_1`` and ``XA_MARK_2``
137set, or iterate over all entries which have ``XA_MARK_0`` or ``XA_MARK_2``
138set). It would be possible to add these if a user arises.
139
d9c48043
MW
140Allocating XArrays
141------------------
142
9c79df7f
JC
143If you use DEFINE_XARRAY_ALLOC() to define the XArray, or
144initialise it by passing ``XA_FLAGS_ALLOC`` to xa_init_flags(),
d9c48043 145the XArray changes to track whether entries are in use or not.
371c752d 146
9c79df7f 147You can call xa_alloc() to store the entry at an unused index
371c752d 148in the XArray. If you need to modify the array from interrupt context,
9c79df7f 149you can use xa_alloc_bh() or xa_alloc_irq() to disable
d9c48043
MW
150interrupts while allocating the ID.
151
9c79df7f 152Using xa_store(), xa_cmpxchg() or xa_insert() will
3ccaf57a 153also mark the entry as being allocated. Unlike a normal XArray, storing
9c79df7f
JC
154``NULL`` will mark the entry as being in use, like xa_reserve().
155To free an entry, use xa_erase() (or xa_release() if
d9c48043
MW
156you only want to free the entry if it's ``NULL``).
157
3ccaf57a
MW
158By default, the lowest free entry is allocated starting from 0. If you
159want to allocate entries starting at 1, it is more efficient to use
9c79df7f 160DEFINE_XARRAY_ALLOC1() or ``XA_FLAGS_ALLOC1``. If you want to
2fa044e5 161allocate IDs up to a maximum, then wrap back around to the lowest free
9c79df7f 162ID, you can use xa_alloc_cyclic().
3ccaf57a 163
d9c48043
MW
164You cannot use ``XA_MARK_0`` with an allocating XArray as this mark
165is used to track whether an entry is free or not. The other marks are
166available for your use.
371c752d 167
992a8e60
MW
168Memory allocation
169-----------------
170
9c79df7f
JC
171The xa_store(), xa_cmpxchg(), xa_alloc(),
172xa_reserve() and xa_insert() functions take a gfp_t
371c752d 173parameter in case the XArray needs to allocate memory to store this entry.
992a8e60
MW
174If the entry is being deleted, no memory allocation needs to be performed,
175and the GFP flags specified will be ignored.
176
177It is possible for no memory to be allocatable, particularly if you pass
178a restrictive set of GFP flags. In that case, the functions return a
9c79df7f 179special value which can be turned into an errno using xa_err().
992a8e60 180If you don't need to know exactly which error occurred, using
9c79df7f 181xa_is_err() is slightly more efficient.
992a8e60
MW
182
183Locking
184-------
185
186When using the Normal API, you do not have to worry about locking.
187The XArray uses RCU and an internal spinlock to synchronise access:
188
189No lock needed:
9c79df7f
JC
190 * xa_empty()
191 * xa_marked()
992a8e60
MW
192
193Takes RCU read lock:
9c79df7f
JC
194 * xa_load()
195 * xa_for_each()
00ed452c
MWO
196 * xa_for_each_start()
197 * xa_for_each_range()
9c79df7f
JC
198 * xa_find()
199 * xa_find_after()
200 * xa_extract()
201 * xa_get_mark()
992a8e60
MW
202
203Takes xa_lock internally:
9c79df7f
JC
204 * xa_store()
205 * xa_store_bh()
206 * xa_store_irq()
207 * xa_insert()
208 * xa_insert_bh()
209 * xa_insert_irq()
210 * xa_erase()
211 * xa_erase_bh()
212 * xa_erase_irq()
213 * xa_cmpxchg()
214 * xa_cmpxchg_bh()
215 * xa_cmpxchg_irq()
216 * xa_store_range()
217 * xa_alloc()
218 * xa_alloc_bh()
219 * xa_alloc_irq()
220 * xa_reserve()
221 * xa_reserve_bh()
222 * xa_reserve_irq()
223 * xa_destroy()
224 * xa_set_mark()
225 * xa_clear_mark()
992a8e60
MW
226
227Assumes xa_lock held on entry:
9c79df7f
JC
228 * __xa_store()
229 * __xa_insert()
230 * __xa_erase()
231 * __xa_cmpxchg()
232 * __xa_alloc()
233 * __xa_set_mark()
234 * __xa_clear_mark()
992a8e60
MW
235
236If you want to take advantage of the lock to protect the data structures
9c79df7f
JC
237that you are storing in the XArray, you can call xa_lock()
238before calling xa_load(), then take a reference count on the
239object you have found before calling xa_unlock(). This will
992a8e60
MW
240prevent stores from removing the object from the array between looking
241up the object and incrementing the refcount. You can also use RCU to
242avoid dereferencing freed memory, but an explanation of that is beyond
243the scope of this document.
244
245The XArray does not disable interrupts or softirqs while modifying
246the array. It is safe to read the XArray from interrupt or softirq
247context as the RCU lock provides enough protection.
248
249If, for example, you want to store entries in the XArray in process
250context and then erase them in softirq context, you can do that this way::
251
252 void foo_init(struct foo *foo)
253 {
254 xa_init_flags(&foo->array, XA_FLAGS_LOCK_BH);
255 }
256
257 int foo_store(struct foo *foo, unsigned long index, void *entry)
258 {
259 int err;
260
261 xa_lock_bh(&foo->array);
262 err = xa_err(__xa_store(&foo->array, index, entry, GFP_KERNEL));
263 if (!err)
264 foo->count++;
265 xa_unlock_bh(&foo->array);
266 return err;
267 }
268
269 /* foo_erase() is only called from softirq context */
270 void foo_erase(struct foo *foo, unsigned long index)
271 {
272 xa_lock(&foo->array);
273 __xa_erase(&foo->array, index);
274 foo->count--;
275 xa_unlock(&foo->array);
276 }
277
278If you are going to modify the XArray from interrupt or softirq context,
9c79df7f 279you need to initialise the array using xa_init_flags(), passing
992a8e60
MW
280``XA_FLAGS_LOCK_IRQ`` or ``XA_FLAGS_LOCK_BH``.
281
282The above example also shows a common pattern of wanting to extend the
283coverage of the xa_lock on the store side to protect some statistics
284associated with the array.
285
286Sharing the XArray with interrupt context is also possible, either
9c79df7f
JC
287using xa_lock_irqsave() in both the interrupt handler and process
288context, or xa_lock_irq() in process context and xa_lock()
992a8e60 289in the interrupt handler. Some of the more common patterns have helper
9c79df7f
JC
290functions such as xa_store_bh(), xa_store_irq(),
291xa_erase_bh(), xa_erase_irq(), xa_cmpxchg_bh()
292and xa_cmpxchg_irq().
992a8e60
MW
293
294Sometimes you need to protect access to the XArray with a mutex because
295that lock sits above another mutex in the locking hierarchy. That does
9c79df7f 296not entitle you to use functions like __xa_erase() without taking
992a8e60
MW
297the xa_lock; the xa_lock is used for lockdep validation and will be used
298for other purposes in the future.
299
9c79df7f 300The __xa_set_mark() and __xa_clear_mark() functions are also
992a8e60
MW
301available for situations where you look up an entry and want to atomically
302set or clear a mark. It may be more efficient to use the advanced API
303in this case, as it will save you from walking the tree twice.
304
305Advanced API
306============
307
308The advanced API offers more flexibility and better performance at the
309cost of an interface which can be harder to use and has fewer safeguards.
310No locking is done for you by the advanced API, and you are required
311to use the xa_lock while modifying the array. You can choose whether
312to use the xa_lock or the RCU lock while doing read-only operations on
313the array. You can mix advanced and normal operations on the same array;
314indeed the normal API is implemented in terms of the advanced API. The
315advanced API is only available to modules with a GPL-compatible license.
316
317The advanced API is based around the xa_state. This is an opaque data
9c79df7f 318structure which you declare on the stack using the XA_STATE()
992a8e60
MW
319macro. This macro initialises the xa_state ready to start walking
320around the XArray. It is used as a cursor to maintain the position
321in the XArray and let you compose various operations together without
322having to restart from the top every time.
323
324The xa_state is also used to store errors. You can call
9c79df7f 325xas_error() to retrieve the error. All operations check whether
992a8e60
MW
326the xa_state is in an error state before proceeding, so there's no need
327for you to check for an error after each call; you can make multiple
328calls in succession and only check at a convenient point. The only
329errors currently generated by the XArray code itself are ``ENOMEM`` and
330``EINVAL``, but it supports arbitrary errors in case you want to call
9c79df7f 331xas_set_err() yourself.
992a8e60 332
9c79df7f 333If the xa_state is holding an ``ENOMEM`` error, calling xas_nomem()
992a8e60
MW
334will attempt to allocate more memory using the specified gfp flags and
335cache it in the xa_state for the next attempt. The idea is that you take
336the xa_lock, attempt the operation and drop the lock. The operation
337attempts to allocate memory while holding the lock, but it is more
9c79df7f 338likely to fail. Once you have dropped the lock, xas_nomem()
992a8e60
MW
339can try harder to allocate more memory. It will return ``true`` if it
340is worth retrying the operation (i.e. that there was a memory error *and*
341more memory was allocated). If it has previously allocated memory, and
342that memory wasn't used, and there is no error (or some error that isn't
343``ENOMEM``), then it will free the memory previously allocated.
344
345Internal Entries
346----------------
347
348The XArray reserves some entries for its own purposes. These are never
349exposed through the normal API, but when using the advanced API, it's
350possible to see them. Usually the best way to handle them is to pass them
9c79df7f 351to xas_retry(), and retry the operation if it returns ``true``.
992a8e60
MW
352
353.. flat-table::
354 :widths: 1 1 6
355
356 * - Name
357 - Test
358 - Usage
359
360 * - Node
9c79df7f 361 - xa_is_node()
992a8e60
MW
362 - An XArray node. May be visible when using a multi-index xa_state.
363
364 * - Sibling
9c79df7f 365 - xa_is_sibling()
992a8e60
MW
366 - A non-canonical entry for a multi-index entry. The value indicates
367 which slot in this node has the canonical entry.
368
369 * - Retry
9c79df7f 370 - xa_is_retry()
992a8e60
MW
371 - This entry is currently being modified by a thread which has the
372 xa_lock. The node containing this entry may be freed at the end
373 of this RCU period. You should restart the lookup from the head
374 of the array.
375
9f14d4f1 376 * - Zero
9c79df7f 377 - xa_is_zero()
9f14d4f1
MW
378 - Zero entries appear as ``NULL`` through the Normal API, but occupy
379 an entry in the XArray which can be used to reserve the index for
d9c48043
MW
380 future use. This is used by allocating XArrays for allocated entries
381 which are ``NULL``.
9f14d4f1 382
992a8e60 383Other internal entries may be added in the future. As far as possible, they
9c79df7f 384will be handled by xas_retry().
992a8e60
MW
385
386Additional functionality
387------------------------
388
9c79df7f 389The xas_create_range() function allocates all the necessary memory
992a8e60
MW
390to store every entry in a range. It will set ENOMEM in the xa_state if
391it cannot allocate memory.
392
9c79df7f 393You can use xas_init_marks() to reset the marks on an entry
992a8e60
MW
394to their default state. This is usually all marks clear, unless the
395XArray is marked with ``XA_FLAGS_TRACK_FREE``, in which case mark 0 is set
396and all other marks are clear. Replacing one entry with another using
9c79df7f 397xas_store() will not reset the marks on that entry; if you want
992a8e60
MW
398the marks reset, you should do that explicitly.
399
9c79df7f 400The xas_load() will walk the xa_state as close to the entry
992a8e60
MW
401as it can. If you know the xa_state has already been walked to the
402entry and need to check that the entry hasn't changed, you can use
9c79df7f 403xas_reload() to save a function call.
992a8e60
MW
404
405If you need to move to a different index in the XArray, call
9c79df7f 406xas_set(). This resets the cursor to the top of the tree, which
992a8e60
MW
407will generally make the next operation walk the cursor to the desired
408spot in the tree. If you want to move to the next or previous index,
9c79df7f 409call xas_next() or xas_prev(). Setting the index does
992a8e60
MW
410not walk the cursor around the array so does not require a lock to be
411held, while moving to the next or previous index does.
412
9c79df7f
JC
413You can search for the next present entry using xas_find(). This
414is the equivalent of both xa_find() and xa_find_after();
992a8e60
MW
415if the cursor has been walked to an entry, then it will find the next
416entry after the one currently referenced. If not, it will return the
9c79df7f
JC
417entry at the index of the xa_state. Using xas_next_entry() to
418move to the next present entry instead of xas_find() will save
992a8e60
MW
419a function call in the majority of cases at the expense of emitting more
420inline code.
421
9c79df7f 422The xas_find_marked() function is similar. If the xa_state has
992a8e60
MW
423not been walked, it will return the entry at the index of the xa_state,
424if it is marked. Otherwise, it will return the first marked entry after
9c79df7f
JC
425the entry referenced by the xa_state. The xas_next_marked()
426function is the equivalent of xas_next_entry().
992a8e60 427
9c79df7f
JC
428When iterating over a range of the XArray using xas_for_each()
429or xas_for_each_marked(), it may be necessary to temporarily stop
430the iteration. The xas_pause() function exists for this purpose.
992a8e60
MW
431After you have done the necessary work and wish to resume, the xa_state
432is in an appropriate state to continue the iteration after the entry
433you last processed. If you have interrupts disabled while iterating,
434then it is good manners to pause the iteration and reenable interrupts
435every ``XA_CHECK_SCHED`` entries.
436
6b81141d
MWO
437The xas_get_mark(), xas_set_mark() and xas_clear_mark() functions require
438the xa_state cursor to have been moved to the appropriate location in the
439XArray; they will do nothing if you have called xas_pause() or xas_set()
992a8e60
MW
440immediately before.
441
9c79df7f 442You can call xas_set_update() to have a callback function
992a8e60
MW
443called each time the XArray updates a node. This is used by the page
444cache workingset code to maintain its list of nodes which contain only
445shadow entries.
446
447Multi-Index Entries
448-------------------
449
450The XArray has the ability to tie multiple indices together so that
451operations on one index affect all indices. For example, storing into
452any index will change the value of the entry retrieved from any index.
453Setting or clearing a mark on any index will set or clear the mark
454on every index that is tied together. The current implementation
455only allows tying ranges which are aligned powers of two together;
456eg indices 64-127 may be tied together, but 2-6 may not be. This may
457save substantial quantities of memory; for example tying 512 entries
458together will save over 4kB.
459
9c79df7f
JC
460You can create a multi-index entry by using XA_STATE_ORDER()
461or xas_set_order() followed by a call to xas_store().
462Calling xas_load() with a multi-index xa_state will walk the
992a8e60
MW
463xa_state to the right location in the tree, but the return value is not
464meaningful, potentially being an internal entry or ``NULL`` even when there
9c79df7f 465is an entry stored within the range. Calling xas_find_conflict()
992a8e60 466will return the first entry within the range or ``NULL`` if there are no
9c79df7f 467entries in the range. The xas_for_each_conflict() iterator will
992a8e60
MW
468iterate over every entry which overlaps the specified range.
469
9c79df7f 470If xas_load() encounters a multi-index entry, the xa_index
992a8e60 471in the xa_state will not be changed. When iterating over an XArray
9c79df7f 472or calling xas_find(), if the initial index is in the middle
992a8e60
MW
473of a multi-index entry, it will not be altered. Subsequent calls
474or iterations will move the index to the first index in the range.
475Each entry will only be returned once, no matter how many indices it
476occupies.
477
9c79df7f 478Using xas_next() or xas_prev() with a multi-index xa_state
992a8e60
MW
479is not supported. Using either of these functions on a multi-index entry
480will reveal sibling entries; these should be skipped over by the caller.
481
482Storing ``NULL`` into any index of a multi-index entry will set the entry
483at every index to ``NULL`` and dissolve the tie. Splitting a multi-index
484entry into entries occupying smaller ranges is not yet supported.
485
486Functions and structures
487========================
488
489.. kernel-doc:: include/linux/xarray.h
490.. kernel-doc:: lib/xarray.c