This patch tries to reduce the amount of cmpxchg calls in the writer
failed path by checking the counter value first before issuing the
instruction. If ->count is not set to RWSEM_WAITING_BIAS then there is
no point wasting a cmpxchg call.
Furthermore, Michel states "I suppose it helps due to the case where
someone else steals the lock while we're trying to acquire
sem->wait_lock."
Two very different workloads and machines were used to see how this
patch improves throughput: pgbench on a quad-core laptop and aim7 on a
large 8 socket box with 80 cores.
Some results comparing Michel's fast-path write lock stealing
(tps-rwsem) on a quad-core laptop running pgbench:
For the aim7 worloads, they overall improved on top of Michel's
patchset. For full graphs on how the rwsem series plus this patch
behaves on a large 8 socket machine against a vanilla kernel: