]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/commit - block/blk-mq.c
blk-mq: Fix failed allocation path when mapping queues
authorGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Wed, 14 Dec 2016 20:48:36 +0000 (18:48 -0200)
committerJens Axboe <axboe@fb.com>
Wed, 14 Dec 2016 20:57:47 +0000 (13:57 -0700)
commitd1b1cea1e58477dad88ff769f54c0d2dfa56d923
tree716f78a926f7fc60886338ae97ff4d074bba52ee
parent36e1f3d107867b25c616c2fd294f5a1c9d4e5d09
blk-mq: Fix failed allocation path when mapping queues

In blk_mq_map_swqueue, there is a memory optimization that frees the
tags of a queue that has gone unmapped.  Later, if that hctx is remapped
after another topology change, the tags need to be reallocated.

If this allocation fails, a simple WARN_ON triggers, but the block layer
ends up with an active hctx without any corresponding set of tags.
Then, any income IO to that hctx can trigger an Oops.

I can reproduce it consistently by running IO, flipping CPUs on and off
and eventually injecting a memory allocation failure in that path.

In the fix below, if the system experiences a failed allocation of any
hctx's tags, we remap all the ctxs of that queue to the hctx_0, which
should always keep it's tags.  There is a minor performance hit, since
our mapping just got worse after the error path, but this is
the simplest solution to handle this error path.  The performance hit
will disappear after another successful remap.

I considered dropping the memory optimization all together, but it
seemed a bad trade-off to handle this very specific error case.

This should apply cleanly on top of Jens' for-next branch.

The Oops is the one below:

SP (3fff935ce4d0) is in userspace
1:mon> e
cpu 0x1: Vector: 300 (Data Access) at [c000000fe99eb110]
    pc: c0000000005e868c: __sbitmap_queue_get+0x2c/0x180
    lr: c000000000575328: __bt_get+0x48/0xd0
    sp: c000000fe99eb390
   msr: 900000010280b033
   dar: 28
 dsisr: 40000000
  current = 0xc000000fe9966800
  paca    = 0xc000000007e80300   softe: 0        irq_happened: 0x01
    pid   = 11035, comm = aio-stress
Linux version 4.8.0-rc6+ (root@bean) (gcc version 5.4.0 20160609
(Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2) ) #3 SMP Mon Oct 10 20:16:53 CDT 2016
1:mon> s
[c000000fe99eb3d0c000000000575328 __bt_get+0x48/0xd0
[c000000fe99eb400c000000000575838 bt_get.isra.1+0x78/0x2d0
[c000000fe99eb480c000000000575cb4 blk_mq_get_tag+0x44/0x100
[c000000fe99eb4b0c00000000056f6f4 __blk_mq_alloc_request+0x44/0x220
[c000000fe99eb500c000000000570050 blk_mq_map_request+0x100/0x1f0
[c000000fe99eb580c000000000574650 blk_mq_make_request+0xf0/0x540
[c000000fe99eb640c000000000561c44 generic_make_request+0x144/0x230
[c000000fe99eb690c000000000561e00 submit_bio+0xd0/0x200
[c000000fe99eb740c0000000003ef740 ext4_io_submit+0x90/0xb0
[c000000fe99eb770c0000000003e95d8 ext4_writepages+0x588/0xdd0
[c000000fe99eb910c00000000025a9f0 do_writepages+0x60/0xc0
[c000000fe99eb940c000000000246c88 __filemap_fdatawrite_range+0xf8/0x180
[c000000fe99eb9e0c000000000246f90 filemap_write_and_wait_range+0x70/0xf0
[c000000fe99eba20c0000000003dd844 ext4_sync_file+0x214/0x540
[c000000fe99eba80c000000000364718 vfs_fsync_range+0x78/0x130
[c000000fe99ebad0c0000000003dd46c ext4_file_write_iter+0x35c/0x430
[c000000fe99ebb90c00000000038c280 aio_run_iocb+0x3b0/0x450
[c000000fe99ebce0c00000000038dc28 do_io_submit+0x368/0x730
[c000000fe99ebe30c000000000009404 system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
block/blk-mq.c