1 // This module provides a relatively simple thread-safe pool of reusable
2 // objects. For the most part, it's implemented by a stack represented by a
3 // Mutex<Vec<T>>. It has one small trick: because unlocking a mutex is somewhat
4 // costly, in the case where a pool is accessed by the first thread that tried
5 // to get a value, we bypass the mutex. Here are some benchmarks showing the
8 // 1) misc::anchored_literal_long_non_match 21 (18571 MB/s)
9 // 2) misc::anchored_literal_long_non_match 107 (3644 MB/s)
10 // 3) misc::anchored_literal_long_non_match 45 (8666 MB/s)
11 // 4) misc::anchored_literal_long_non_match 19 (20526 MB/s)
13 // (1) represents our baseline: the master branch at the time of writing when
14 // using the 'thread_local' crate to implement the pool below.
16 // (2) represents a naive pool implemented completely via Mutex<Vec<T>>. There
17 // is no special trick for bypassing the mutex.
19 // (3) is the same as (2), except it uses Mutex<Vec<Box<T>>>. It is twice as
20 // fast because a Box<T> is much smaller than the T we use with a Pool in this
21 // crate. So pushing and popping a Box<T> from a Vec is quite a bit faster
24 // (4) is the same as (3), but with the trick for bypassing the mutex in the
25 // case of the first-to-get thread.
27 // Why move off of thread_local? Even though (4) is a hair faster than (1)
28 // above, this was not the main goal. The main goal was to move off of
29 // thread_local and find a way to *simply* re-capture some of its speed for
30 // regex's specific case. So again, why move off of it? The *primary* reason is
31 // because of memory leaks. See https://github.com/rust-lang/regex/issues/362
32 // for example. (Why do I want it to be simple? Well, I suppose what I mean is,
33 // "use as much safe code as possible to minimize risk and be as sure as I can
34 // be that it is correct.")
36 // My guess is that the thread_local design is probably not appropriate for
37 // regex since its memory usage scales to the number of active threads that
38 // have used a regex, where as the pool below scales to the number of threads
39 // that simultaneously use a regex. While neither case permits contraction,
40 // since we own the pool data structure below, we can add contraction if a
41 // clear use case pops up in the wild. More pressingly though, it seems that
42 // there are at least some use case patterns where one might have many threads
43 // sitting around that might have used a regex at one point. While thread_local
44 // does try to reuse space previously used by a thread that has since stopped,
45 // its maximal memory usage still scales with the total number of active
46 // threads. In contrast, the pool below scales with the total number of threads
47 // *simultaneously* using the pool. The hope is that this uses less memory
48 // overall. And if it doesn't, we can hopefully tune it somehow.
50 // It seems that these sort of conditions happen frequently
51 // in FFI inside of other more "managed" languages. This was
52 // mentioned in the issue linked above, and also mentioned here:
53 // https://github.com/BurntSushi/rure-go/issues/3. And in particular, users
54 // confirm that disabling the use of thread_local resolves the leak.
56 // There were other weaker reasons for moving off of thread_local as well.
57 // Namely, at the time, I was looking to reduce dependencies. And for something
58 // like regex, maintenance can be simpler when we own the full dependency tree.
60 use std
::panic
::{RefUnwindSafe, UnwindSafe}
;
61 use std
::sync
::atomic
::{AtomicUsize, Ordering}
;
64 /// An atomic counter used to allocate thread IDs.
65 static COUNTER
: AtomicUsize
= AtomicUsize
::new(1);
68 /// A thread local used to assign an ID to a thread.
69 static THREAD_ID
: usize = {
70 let next
= COUNTER
.fetch_add(1, Ordering
::Relaxed
);
71 // SAFETY: We cannot permit the reuse of thread IDs since reusing a
72 // thread ID might result in more than one thread "owning" a pool,
73 // and thus, permit accessing a mutable value from multiple threads
74 // simultaneously without synchronization. The intent of this panic is
75 // to be a sanity check. It is not expected that the thread ID space
76 // will actually be exhausted in practice.
78 // This checks that the counter never wraps around, since atomic
79 // addition wraps around on overflow.
81 panic
!("regex: thread ID allocation space exhausted");
87 /// The type of the function used to create values in a pool when the pool is
88 /// empty and the caller requests one.
90 Box
<dyn Fn() -> T
+ Send
+ Sync
+ UnwindSafe
+ RefUnwindSafe
+ '
static>;
92 /// A simple thread safe pool for reusing values.
94 /// Getting a value out comes with a guard. When that guard is dropped, the
95 /// value is automatically put back in the pool.
97 /// A Pool<T> impls Sync when T is Send (even if it's not Sync). This means
98 /// that T can use interior mutability. This is possible because a pool is
99 /// guaranteed to provide a value to exactly one thread at any time.
101 /// Currently, a pool never contracts in size. Its size is proportional to the
102 /// number of simultaneous uses.
104 /// A stack of T values to hand out. These are used when a Pool is
105 /// accessed by a thread that didn't create it.
106 stack
: Mutex
<Vec
<Box
<T
>>>,
107 /// A function to create more T values when stack is empty and a caller
108 /// has requested a T.
110 /// The ID of the thread that owns this pool. The owner is the thread
111 /// that makes the first call to 'get'. When the owner calls 'get', it
112 /// gets 'owner_val' directly instead of returning a T from 'stack'.
113 /// See comments elsewhere for details, but this is intended to be an
114 /// optimization for the common case that makes getting a T faster.
116 /// It is initialized to a value of zero (an impossible thread ID) as a
117 /// sentinel to indicate that it is unowned.
119 /// A value to return when the caller is in the same thread that created
124 // SAFETY: Since we want to use a Pool from multiple threads simultaneously
125 // behind an Arc, we need for it to be Sync. In cases where T is sync, Pool<T>
126 // would be Sync. However, since we use a Pool to store mutable scratch space,
127 // we wind up using a T that has interior mutability and is thus itself not
128 // Sync. So what we *really* want is for our Pool<T> to by Sync even when T is
129 // not Sync (but is at least Send).
131 // The only non-sync aspect of a Pool is its 'owner_val' field, which is used
132 // to implement faster access to a pool value in the common case of a pool
133 // being accessed in the same thread in which it was created. The 'stack' field
134 // is also shared, but a Mutex<T> where T: Send is already Sync. So we only
135 // need to worry about 'owner_val'.
137 // The key is to guarantee that 'owner_val' can only ever be accessed from one
138 // thread. In our implementation below, we guarantee this by only returning the
139 // 'owner_val' when the ID of the current thread matches the ID of the thread
140 // that created the Pool. Since this can only ever be one thread, it follows
141 // that only one thread can access 'owner_val' at any point in time. Thus, it
142 // is safe to declare that Pool<T> is Sync when T is Send.
144 // NOTE: It would also be possible to make the owning thread be the *first*
145 // thread that tries to get a value out of a Pool. However, the current
146 // implementation is a little simpler and it's not clear if making the first
147 // thread (rather than the creating thread) is meaningfully better.
149 // If there is a way to achieve our performance goals using safe code, then
150 // I would very much welcome a patch. As it stands, the implementation below
151 // tries to balance safety with performance. The case where a Regex is used
152 // from multiple threads simultaneously will suffer a bit since getting a cache
153 // will require unlocking a mutex.
154 unsafe impl<T
: Send
> Sync
for Pool
<T
> {}
156 impl<T
: ::std
::fmt
::Debug
> ::std
::fmt
::Debug
for Pool
<T
> {
157 fn fmt(&self, f
: &mut ::std
::fmt
::Formatter
<'_
>) -> ::std
::fmt
::Result
{
158 f
.debug_struct("Pool")
159 .field("stack", &self.stack
)
160 .field("owner", &self.owner
)
161 .field("owner_val", &self.owner_val
)
166 /// A guard that is returned when a caller requests a value from the pool.
168 /// The purpose of the guard is to use RAII to automatically put the value back
169 /// in the pool once it's dropped.
171 pub struct PoolGuard
<'a
, T
: Send
> {
172 /// The pool that this guard is attached to.
174 /// This is None when the guard represents the special "owned" value. In
175 /// which case, the value is retrieved from 'pool.owner_val'.
176 value
: Option
<Box
<T
>>,
179 impl<T
: Send
> Pool
<T
> {
180 /// Create a new pool. The given closure is used to create values in the
181 /// pool when necessary.
182 pub fn new(create
: CreateFn
<T
>) -> Pool
<T
> {
183 let owner
= AtomicUsize
::new(0);
184 let owner_val
= create();
185 Pool { stack: Mutex::new(vec![]), create, owner, owner_val }
188 /// Get a value from the pool. The caller is guaranteed to have exclusive
189 /// access to the given value.
191 /// Note that there is no guarantee provided about which value in the
192 /// pool is returned. That is, calling get, dropping the guard (causing
193 /// the value to go back into the pool) and then calling get again is NOT
194 /// guaranteed to return the same value received in the first get call.
195 #[cfg_attr(feature = "perf-inline", inline(always))]
196 pub fn get(&self) -> PoolGuard
<'_
, T
> {
197 // Our fast path checks if the caller is the thread that "owns" this
198 // pool. Or stated differently, whether it is the first thread that
199 // tried to extract a value from the pool. If it is, then we can return
200 // a T to the caller without going through a mutex.
202 // SAFETY: We must guarantee that only one thread gets access to this
203 // value. Since a thread is uniquely identified by the THREAD_ID thread
204 // local, it follows that is the caller's thread ID is equal to the
205 // owner, then only one thread may receive this value.
206 let caller
= THREAD_ID
.with(|id
| *id
);
207 let owner
= self.owner
.load(Ordering
::Relaxed
);
209 return self.guard_owned();
211 self.get_slow(caller
, owner
)
214 /// This is the "slow" version that goes through a mutex to pop an
215 /// allocated value off a stack to return to the caller. (Or, if the stack
216 /// is empty, a new value is created.)
218 /// If the pool has no owner, then this will set the owner.
220 fn get_slow(&self, caller
: usize, owner
: usize) -> PoolGuard
<'_
, T
> {
221 use std
::sync
::atomic
::Ordering
::Relaxed
;
224 // The sentinel 0 value means this pool is not yet owned. We
225 // try to atomically set the owner. If we do, then this thread
226 // becomes the owner and we can return a guard that represents
227 // the special T for the owner.
228 let res
= self.owner
.compare_exchange(0, caller
, Relaxed
, Relaxed
);
230 return self.guard_owned();
233 let mut stack
= self.stack
.lock().unwrap();
234 let value
= match stack
.pop() {
235 None
=> Box
::new((self.create
)()),
236 Some(value
) => value
,
238 self.guard_stack(value
)
241 /// Puts a value back into the pool. Callers don't need to call this. Once
242 /// the guard that's returned by 'get' is dropped, it is put back into the
243 /// pool automatically.
244 fn put(&self, value
: Box
<T
>) {
245 let mut stack
= self.stack
.lock().unwrap();
249 /// Create a guard that represents the special owned T.
250 fn guard_owned(&self) -> PoolGuard
<'_
, T
> {
251 PoolGuard { pool: self, value: None }
254 /// Create a guard that contains a value from the pool's stack.
255 fn guard_stack(&self, value
: Box
<T
>) -> PoolGuard
<'_
, T
> {
256 PoolGuard { pool: self, value: Some(value) }
260 impl<'a
, T
: Send
> PoolGuard
<'a
, T
> {
261 /// Return the underlying value.
262 pub fn value(&self) -> &T
{
264 None
=> &self.pool
.owner_val
,
270 impl<'a
, T
: Send
> Drop
for PoolGuard
<'a
, T
> {
271 #[cfg_attr(feature = "perf-inline", inline(always))]
273 if let Some(value
) = self.value
.take() {
274 self.pool
.put(value
);
281 use std
::panic
::{RefUnwindSafe, UnwindSafe}
;
287 use crate::exec
::ProgramCache
;
289 fn has_oibits
<T
: Send
+ Sync
+ UnwindSafe
+ RefUnwindSafe
>() {}
290 has_oibits
::<Pool
<ProgramCache
>>();
293 // Tests that Pool implements the "single owner" optimization. That is, the
294 // thread that first accesses the pool gets its own copy, while all other
295 // threads get distinct copies.
297 fn thread_owner_optimization() {
298 use std
::cell
::RefCell
;
301 let pool
: Arc
<Pool
<RefCell
<Vec
<char>>>> =
302 Arc
::new(Pool
::new(Box
::new(|| RefCell
::new(vec
!['a'
]))));
303 pool
.get().value().borrow_mut().push('x'
);
305 let pool1
= pool
.clone();
306 let t1
= std
::thread
::spawn(move || {
307 let guard
= pool1
.get();
308 let v
= guard
.value();
309 v
.borrow_mut().push('y'
);
312 let pool2
= pool
.clone();
313 let t2
= std
::thread
::spawn(move || {
314 let guard
= pool2
.get();
315 let v
= guard
.value();
316 v
.borrow_mut().push('z'
);
322 // If we didn't implement the single owner optimization, then one of
323 // the threads above is likely to have mutated the [a, x] vec that
324 // we stuffed in the pool before spawning the threads. But since
325 // neither thread was first to access the pool, and because of the
326 // optimization, we should be guaranteed that neither thread mutates
327 // the special owned pool value.
329 // (Technically this is an implementation detail and not a contract of
331 assert_eq
!(vec
!['a'
, 'x'
], *pool
.get().value().borrow());