UBUNTU: SAUCE: fscache: Fix race in decrementing refcount of op->npages
BugLink: https://bugs.launchpad.net/bugs/1797314
[Trace]
seen this in 4.4.x kernels and the same bug affects fscache in latest upstreams kernels.
Jun 25 11:32:08 kernel: [
4740718.880898] FS-Cache:
Jun 25 11:32:08 kernel: [
4740718.880920] FS-Cache: Assertion failed
Jun 25 11:32:08 kernel: [
4740718.880934] FS-Cache: 0 > 0 is false
Jun 25 11:32:08 kernel: [
4740718.881001] ------------[ cut here ]------------
Jun 25 11:32:08 kernel: [
4740718.881017] kernel BUG at /usr/src/linux-4.4.0/fs/fscache/operation.c:449!
Jun 25 11:32:08 kernel: [
4740718.881040] invalid opcode: 0000 [#1] SMP
...
Jun 25 11:32:08 kernel: [
4740718.892659] Call Trace:
Jun 25 11:32:08 kernel: [
4740718.893506] [<
ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 [cachefiles]
Jun 25 11:32:08 kernel: [
4740718.894374] [<
ffffffffc037e272>] fscache_op_work_func+0x22/0x50 [fscache]
Jun 25 11:32:08 kernel: [
4740718.895180] [<
ffffffff81096da0>] process_one_work+0x150/0x3f0
Jun 25 11:32:08 kernel: [
4740718.895966] [<
ffffffff8109751a>] worker_thread+0x11a/0x470
Jun 25 11:32:08 kernel: [
4740718.896753] [<
ffffffff81808e59>] ? __schedule+0x359/0x980
Jun 25 11:32:08 kernel: [
4740718.897783] [<
ffffffff81097400>] ? rescuer_thread+0x310/0x310
Jun 25 11:32:08 kernel: [
4740718.898581] [<
ffffffff8109cdd6>] kthread+0xd6/0xf0
Jun 25 11:32:08 kernel: [
4740718.899469] [<
ffffffff8109cd00>] ? kthread_park+0x60/0x60
Jun 25 11:32:08 kernel: [
4740718.900477] [<
ffffffff8180d0cf>] ret_from_fork+0x3f/0x70
Jun 25 11:32:08 kernel: [
4740718.901514] [<
ffffffff8109cd00>] ? kthread_park+0x60/0x60
[Problem]
atomic_sub(n_pages, &op->n_pages);
if (atomic_read(&op->n_pages) <= 0)
fscache_op_complete(&op->op, true);
The code in fscache_retrieval_complete is using atomic_sub followed by an atomic_read.
This causes two threads doing a decrement of pages to race with each other seeing the op->refcount 0 at same time,
and end up calling fscache_op_complete in both the threads leading to the OOPs.
[Fix]
The fix is trivial to use atomic_sub_return instead of two calls.
Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
(backported from
https://www.redhat.com/archives/linux-cachefs/2018-September/msg00001.html
The message has been on-list since 21 September 2018 and has
received no feedback whatsoever.
I have cleaned up the commit message a little bit and dropped a
whitespace change.)
Signed-off-by: Daniel Axtens <daniel.axtens@canonical.com>
Acked-by: Khalid Elmously <khalid.elmously@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>