The only sane way to clean up the current 3 lock_kernel() variants seems to
be to remove the spinlock-based BKL implementations altogether, and to keep
the semaphore-based one only. If we dont want to do that for whatever
reason then i'm afraid we have to live with the current complexity. (but
i'm open for other cleanup suggestions as well.)
To explore this possibility we'll (at a minimum) have to know whether the
semaphore-based BKL works fine on plain SMP too. The patch below enables
this.
The patch may make sense in isolation as well, as it might bring
performance benefits: code that would formerly spin on the BKL spinlock
will now schedule away and give up the CPU. It might introduce performance
regressions as well, if any performance-critical code uses the BKL heavily
and gets overscheduled due to the semaphore. I very much hope there is no
such performance-critical codepath left though.