Call slowpath __slab_alloc() from within the bulk loop, as the side-effect
of this call likely repopulates c->freelist.
Choose to reenable local IRQs while calling slowpath.
Saving some optimizations for later. E.g. it is possible to extract
parts of __slab_alloc() and avoid the unnecessary and expensive (37
cycles) local_irq_{save,restore}. For now, be happy calling
__slab_alloc() this lower icache impact of this func and I don't have to
worry about correctness.
Measurements on CPU CPU i7-4790K @ 4.00GHz
Baseline normal fastpath (alloc+free cost): 42 cycles(tsc) 10.601 ns