bnxt_en: reliably allocate IRQ table on reset to avoid crash
The following trace excerpt corresponds with a NULL pointer dereference
of 'bp->irq_tbl' in bnxt_setup_inta() on an Aarch64 system after many
device resets:
Unable to handle kernel NULL pointer dereference at ...
000000d
...
pc : string+0x3c/0x80
lr : vsnprintf+0x294/0x7e0
sp :
ffff00000f61ba70 pstate :
20000145
x29:
ffff00000f61ba70 x28:
000000000000000d
x27:
ffff0000009c8b5a x26:
ffff00000f61bb80
x25:
ffff0000009c8b5a x24:
0000000000000012
x23:
00000000ffffffe0 x22:
ffff000008990428
x21:
ffff00000f61bb80 x20:
000000000000000d
x19:
000000000000001f x18:
0000000000000000
x17:
0000000000000000 x16:
ffff800b6d0fb400
x15:
0000000000000000 x14:
ffff800b7fe31ae8
x13:
00001ed16472c920 x12:
ffff000008c6b1c9
x11:
ffff000008cf0580 x10:
ffff00000f61bb80
x9 :
00000000ffffffd8 x8 :
000000000000000c
x7 :
ffff800b684b8000 x6 :
0000000000000000
x5 :
0000000000000065 x4 :
0000000000000001
x3 :
ffff0a00ffffff04 x2 :
000000000000001f
x1 :
0000000000000000 x0 :
000000000000000d
Call trace:
string+0x3c/0x80
vsnprintf+0x294/0x7e0
snprintf+0x44/0x50
__bnxt_open_nic+0x34c/0x928 [bnxt_en]
bnxt_open+0xe8/0x238 [bnxt_en]
__dev_open+0xbc/0x130
__dev_change_flags+0x12c/0x168
dev_change_flags+0x20/0x60
...
Ordinarily, a call to bnxt_setup_inta() (not in trace due to inlining)
would not be expected on a system supporting MSIX at all. However, if
bnxt_init_int_mode() does not end up being called after the call to
bnxt_clear_int_mode() in bnxt_fw_reset_close(), then the driver will
think that only INTA is supported and bp->irq_tbl will be NULL,
causing the above crash.
In the error recovery scenario, we call bnxt_clear_int_mode() in
bnxt_fw_reset_close() early in the sequence. Ordinarily, we will
call bnxt_init_int_mode() in bnxt_hwrm_if_change() after we
reestablish communication with the firmware after reset. However,
if the sequence has to abort before we call bnxt_init_int_mode() and
if the user later attempts to re-open the device, then it will cause
the crash above.
We fix it in 2 ways:
1. Check for bp->irq_tbl in bnxt_setup_int_mode(). If it is NULL, call
bnxt_init_init_mode().
2. If we need to abort in bnxt_hwrm_if_change() and cannot complete
the error recovery sequence, set the BNXT_STATE_ABORT_ERR flag. This
will cause more drastic recovery at the next attempt to re-open the
device, including a call to bnxt_init_int_mode().
Fixes: 3bc7d4a352ef ("bnxt_en: Add BNXT_STATE_IN_FW_RESET state.")
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>