neofb_probe() calls neo_scan_monitor() that can successfully allocate a
memory for info->monspecs.modedb and proceed to case 0x03. There it does
not free the memory and returns -1. neofb_probe() goes to label
err_scan_monitor, thus, it does not free this memory through calling
fb_destroy_modedb() as well. We can not go to label err_init_hw since
neo_scan_monitor() can fail during memory allocation. So, the patch frees
the memory directly for case 0x03.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Evgeny Novikov <novikov@ispras.ru> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200630195451.18675-1-novikov@ispras.ru Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
savagefb_probe() calls savage_init_fb_info() that can successfully
allocate memory for info->pixmap.addr but then fail when
fb_alloc_cmap() fails. savagefb_probe() goes to label failed_init and
does not free allocated memory. It is not valid to go to label
failed_mmio since savage_init_fb_info() can fail during memory
allocation as well. So, the patch free allocated memory on the error
handling path in savage_init_fb_info() itself.
Found by Linux Driver Verification project (linuxtesting.org).
When building with LLVM_IAS=1 means using Clang's Integrated Assembly (IAS)
from LLVM/Clang >= v10.0.1-rc1+ instead of GNU/as from GNU/binutils
I see the following breakage in Debian/testing AMD64:
<instantiation>:15:74: error: too many positional arguments
PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
^
arch/x86/crypto/aesni-intel_asm.S:1598:2: note: while in macro instantiation
GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp)
^
<instantiation>:47:2: error: unknown use of instruction mnemonic without a size suffix
GHASH_4_ENCRYPT_4_PARALLEL_dec %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc
^
arch/x86/crypto/aesni-intel_asm.S:1599:2: note: while in macro instantiation
GCM_ENC_DEC dec
^
<instantiation>:15:74: error: too many positional arguments
PRECOMPUTE 8*3+8(%rsp), %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7,
^
arch/x86/crypto/aesni-intel_asm.S:1686:2: note: while in macro instantiation
GCM_INIT %r9, 8*3 +8(%rsp), 8*3 +16(%rsp), 8*3 +24(%rsp)
^
<instantiation>:47:2: error: unknown use of instruction mnemonic without a size suffix
GHASH_4_ENCRYPT_4_PARALLEL_enc %xmm9, %xmm10, %xmm11, %xmm12, %xmm13, %xmm14, %xmm0, %xmm1, %xmm2, %xmm3, %xmm4, %xmm5, %xmm6, %xmm7, %xmm8, enc
^
arch/x86/crypto/aesni-intel_asm.S:1687:2: note: while in macro instantiation
GCM_ENC_DEC enc
Craig Topper suggested me in ClangBuiltLinux issue #1050:
> I think the "too many positional arguments" is because the parser isn't able
> to handle the trailing commas.
>
> The "unknown use of instruction mnemonic" is because the macro was named
> GHASH_4_ENCRYPT_4_PARALLEL_DEC but its being instantiated with
> GHASH_4_ENCRYPT_4_PARALLEL_dec I guess gas ignores case on the
> macro instantiation, but llvm doesn't.
First, I removed the trailing comma in the PRECOMPUTE line.
On calling pm_runtime_get_sync() the reference count of the device
is incremented. In case of failure, decrement the
reference count before returning the error.
Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
On a PREEMPT=n kernel, the try_release_extent_mapping() function's
"while" loop might run for a very long time on a large I/O. This commit
therefore adds a cond_resched() to this loop, providing RCU any needed
quiescent states.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
In the case we set or free the global value listen_chan in
different threads, we can encounter the UAF problems because
the method is not protected by any lock, add one to avoid
this bug.
BUG: KASAN: use-after-free in l2cap_chan_close+0x48/0x990
net/bluetooth/l2cap_core.c:730
Read of size 8 at addr ffff888096950000 by task kworker/1:102/2868
rpmh-rsc driver is fairly core to system and should not be removable
once its probed. However it allows to unbind driver from sysfs using
below command which results into a crash on sc7180.
When nvme_round_robin_path() finds a valid namespace we should be using it;
falling back to __nvme_find_path() for non-optimized paths will cause the
result from nvme_round_robin_path() to be ignored for non-optimized paths.
Fixes: 75c10e732724 ("nvme-multipath: round-robin I/O policy") Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
commit fe35ec58f0d3 ("block: update hctx map when use multiple maps")
exposed an issue where we may hang trying to wait for queue freeze
during I/O. We call blk_mq_update_nr_hw_queues which in case of multiple
queue maps (which we have now for default/read/poll) is attempting to
freeze the queue. However we never started queue freeze when starting the
reset, which means that we have inflight pending requests that entered the
queue that we will not complete once the queue is quiesced.
So start a freeze before we quiesce the queue, and unfreeze the queue
after we successfully connected the I/O queues (and make sure to call
blk_mq_update_nr_hw_queues only after we are sure that the queue was
already frozen).
This follows to how the pci driver handles resets.
Fixes: fe35ec58f0d3 ("block: update hctx map when use multiple maps") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
commit fe35ec58f0d3 ("block: update hctx map when use multiple maps")
exposed an issue where we may hang trying to wait for queue freeze
during I/O. We call blk_mq_update_nr_hw_queues which in case of multiple
queue maps (which we have now for default/read/poll) is attempting to
freeze the queue. However we never started queue freeze when starting the
reset, which means that we have inflight pending requests that entered the
queue that we will not complete once the queue is quiesced.
So start a freeze before we quiesce the queue, and unfreeze the queue
after we successfully connected the I/O queues (and make sure to call
blk_mq_update_nr_hw_queues only after we are sure that the queue was
already frozen).
This follows to how the pci driver handles resets.
Fixes: fe35ec58f0d3 ("block: update hctx map when use multiple maps") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Pointer mddev is being dereferenced with a test_bit call before mddev
is being null checked, this may cause a null pointer dereference. Fix
this by moving the null pointer checks to sanity check mddev before
it is dereferenced.
Addresses-Coverity: ("Dereference before null check") Fixes: 62f7b1989c02 ("md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
When SECCOMP_IOCTL_NOTIF_ID_VALID was first introduced it had the wrong
direction flag set. While this isn't a big deal as nothing currently
enforces these bits in the kernel, it should be defined correctly. Fix
the define and provide support for the old command until it is no longer
needed for backward compatibility.
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace") Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
if of_find_device_by_node() succeed, socfpga_setup_ocram_self_refresh
doesn't have a corresponding put_device(). Thus add a jump target to
fix the exception handling for this function implementation.
Fixes: 44fd8c7d4005 ("ARM: socfpga: support suspend to ram") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
The RXFLR is possible larger than rx_left in Rockchip SPI, fix it.
Fixes: 01b59ce5dac8 ("spi: rockchip: use irq rather than polling") Signed-off-by: Jon Lin <jon.lin@rock-chips.com> Tested-by: Emil Renner Berthing <kernel@esmil.dk> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Emil Renner Berthing <kernel@esmil.dk> Link: https://lore.kernel.org/r/20200723004356.6390-3-jon.lin@rock-chips.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
rings_size() sets sq_offset to the total size of the rings (the returned
value which is used for memory allocation). This is wrong: sq array should
be located within the rings, not after them. Set sq_offset to where it
should be.
The change corrects registration and deregistration on error path
of a regulator, the problem was manifested by a reported memory
leak on deferred probe:
The memory leak problem was introduced as a side ef another fix in
regulator_register() error path, I believe that the proper fix is
to decouple device_register() function into its two compounds and
initialize a struct device before assigning any values to its fields
and then using it before actual registration of a device happens.
This lets to call put_device() safely after initialization, and, since
now a release callback is called, kfree(rdev->constraints) shall be
removed to exclude a double free condition.
Fixes: a3cde9534ebd ("regulator: core: fix regulator_register() error paths to properly release rdev") Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Cc: Wen Yang <wenyang@linux.alibaba.com> Link: https://lore.kernel.org/r/20200724005013.23278-1-vz@mleia.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Currently, if a section has a relocation to '_mcount' symbol, a new
__mcount_loc entry will be added whatever the relocation type is.
This is problematic when a relocation to '_mcount' is in the middle of a
section and is not a call for ftrace use.
Such relocation could be generated with below code for example:
bool is_mcount(unsigned long addr)
{
return (target == (unsigned long) &_mcount);
}
With this snippet of code, ftrace will try to patch the mcount location
generated by this code on module load and fail with:
Require that the TCG_PCR_EVENT2.digests.count value strictly matches the
value of TCG_EfiSpecIdEvent.numberOfAlgorithms in the event field of the
TCG_PCClientPCREvent event log header. Also require that
TCG_EfiSpecIdEvent.numberOfAlgorithms is non-zero.
The TCG PC Client Platform Firmware Profile Specification section 9.1
(Family "2.0", Level 00 Revision 1.04) states:
For each Hash algorithm enumerated in the TCG_PCClientPCREvent entry,
there SHALL be a corresponding digest in all TCG_PCR_EVENT2 structures.
Note: This includes EV_NO_ACTION events which do not extend the PCR.
Section 9.4.5.1 provides this description of
TCG_EfiSpecIdEvent.numberOfAlgorithms:
The number of Hash algorithms in the digestSizes field. This field MUST
be set to a value of 0x01 or greater.
Enforce these restrictions, as required by the above specification, in
order to better identify and ignore invalid sequences of bytes at the
end of an otherwise valid TPM2 event log. Firmware doesn't always have
the means necessary to inform the kernel of the actual event log size so
the kernel's event log parsing code should be stringent when parsing the
event log for resiliency against firmware bugs. This is true, for
example, when firmware passes the event log to the kernel via a reserved
memory region described in device tree.
POWER and some ARM systems use the "linux,sml-base" and "linux,sml-size"
device tree properties to describe the memory region used to pass the
event log from firmware to the kernel. Unfortunately, the
"linux,sml-size" property describes the size of the entire reserved
memory region rather than the size of the event long within the memory
region and the event log format does not include information describing
the size of the event log.
tpm_read_log_of(), in drivers/char/tpm/eventlog/of.c, is where the
"linux,sml-size" property is used. At the end of that function,
log->bios_event_log_end is pointing at the end of the reserved memory
region. That's typically 0x10000 bytes offset from "linux,sml-base",
depending on what's defined in the device tree source.
The firmware event log only fills a portion of those 0x10000 bytes and
the rest of the memory region should be zeroed out by firmware. Even in
the case of a properly zeroed bytes in the remainder of the memory
region, the only thing allowing the kernel's event log parser to detect
the end of the event log is the following conditional in
__calc_tpm2_event_size():
If that wasn't there, __calc_tpm2_event_size() would think that a 16
byte sequence of zeroes, following an otherwise valid event log, was
a valid event.
However, problems can occur if a single bit is set in the offset
corresponding to either the TCG_PCR_EVENT2.eventType or
TCG_PCR_EVENT2.eventSize fields, after the last valid event log entry.
This could confuse the parser into thinking that an additional entry is
present in the event log and exposing this invalid entry to userspace in
the /sys/kernel/security/tpm0/binary_bios_measurements file. Such
problems have been seen if firmware does not fully zero the memory
region upon a warm reboot.
This patch significantly raises the bar on how difficult it is for
stale/invalid memory to confuse the kernel's event log parser but
there's still, ultimately, a reliance on firmware to properly initialize
the remainder of the memory region reserved for the event log as the
parser cannot be expected to detect a stale but otherwise properly
formatted firmware event log entry.
Fixes: fd5c78694f3f ("tpm: fix handling of the TPM 2.0 event logs") Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
The Bananapi M2+ uses a GPIO line to change the effective resistance of
the CPU supply regulator's feedback resistor network. The voltages
described in the device tree were given directly by the vendor. This
turns out to be slightly off compared to the real values.
The updated voltages are based on calculations of the feedback resistor
network, and verified down to three decimal places with a multi-meter.
The device tree currently only assigns the a supply for the first CPU
core, when in reality the regulator supply is shared by all four cores.
This might cause an issue if the implementation does not realize the
sharing of the supply.
Assign the same regulator supply to the remaining CPU cores to address
this.
if of_find_device_by_node() succeed, at91_pm_sram_init() doesn't have
a corresponding put_device(). Thus add a jump target to fix the exception
handling for this function implementation.
Fixes: d2e467905596 ("ARM: at91: pm: use the mmio-sram pool to access SRAM") Signed-off-by: yu kuai <yukuai3@huawei.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20200604123301.3905837-1-yukuai3@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
In the function check_acpi_dev(), if it fails to create
platform device, the return value is ERR_PTR() or NULL.
Thus it must use IS_ERR_OR_NULL() to check return value.
Fixes: 332e081225fc ("intel-vbtn: new driver for Intel Virtual Button") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
In the function check_acpi_dev(), if it fails to create
platform device, the return value is ERR_PTR() or NULL.
Thus it must use IS_ERR_OR_NULL() to check return value.
Fixes: ecc83e52b28c ("intel-hid: new hid event driver for hotkeys") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
When writing values to the IOP status/control register make sure those
values do not have any extraneous bits that will clear interrupt flags.
To place the SCC IOP into bypass mode would be desirable but this is not
achieved by writing IOP_DMAINACTIVE | IOP_RUN | IOP_AUTOINC | IOP_BYPASS
to the control register. Drop this ineffective register write.
Remove the flawed and unused iop_bypass() function. Make use of the
unused iop_stop() function.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Cc: Joshua Thompson <funaho@jurai.org> Link: https://lore.kernel.org/r/09bcb7359a1719a18b551ee515da3c4c3cf709e6.1590880333.git.fthain@telegraphics.com.au Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Avoid this by testing the channel state before calling iop_do_send().
When sending, and iop_send_queue is empty, call iop_do_send() because
the channel is idle. If iop_send_queue is not empty, iop_do_send() will
get called later by iop_handle_send().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Stan Johnson <userm57@yahoo.com> Cc: Joshua Thompson <funaho@jurai.org> Link: https://lore.kernel.org/r/6d667c39e53865661fa5a48f16829d18ed8abe54.1590880333.git.fthain@telegraphics.com.au Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Currently we are not initializing the scmi clock with discrete rates
correctly. We fetch the min_rate and max_rate value only for clocks with
ranges and ignore the ones with discrete rates. This will lead to wrong
initialization of rate range when clock supports discrete rate.
Fix this by using the first and the last rate in the sorted list of the
discrete clock rates while registering the clock.
Link: https://lore.kernel.org/r/20200709081705.46084-2-sudeep.holla@arm.com Fixes: 6d6a1d82eaef7 ("clk: add support for clocks provided by SCMI") Reviewed-by: Stephen Boyd <sboyd@kernel.org> Reported-and-tested-by: Dien Pham <dien.pham.ry@renesas.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
struct uclamp_rq was zeroed out entirely in assumption that in the first
call to uclamp_rq_inc() they'd be initialized correctly in accordance to
default settings.
But when next patch introduces a static key to skip
uclamp_rq_{inc,dec}() until userspace opts in to use uclamp, schedutil
will fail to perform any frequency changes because the
rq->uclamp[UCLAMP_MAX].value is zeroed at init and stays as such. Which
means all rqs are capped to 0 by default.
Fix it by making sure we do proper initialization at init without
relying on uclamp_rq_inc() doing it later.
Once regulators are disabled after kernel boot, on Espresso board silent
hang observed because of LDO7 being disabled. LDO7 actually provide
power to CPU cores and non-cpu blocks circuitries. Keep this regulator
always-on to fix this hang.
Fixes: 9589f7721e16 ("arm64: dts: Add S2MPS15 PMIC node on exynos7-espresso") Signed-off-by: Alim Akhtar <alim.akhtar@samsung.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Call exynos_cpu_power_up(cpunr) unconditionally. This is needed by the
big.LITTLE cpuidle driver and has no side-effects on other code paths.
The additional soft-reset call during little core power up has been added
to properly boot all cores on the Exynos5422-based boards with secure
firmware (like Odroid XU3/XU4 family). This however broke big.LITTLE
CPUidle driver, which worked only on boards without secure firmware (like
Peach-Pit/Pi Chromebooks). Apply the workaround only when board is
running under secure firmware.
Fixes: 833b5794e330 ("ARM: EXYNOS: reset Little cores when cpu is up") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
On commit 6ac93117ab00 ("blktrace: use existing disk debugfs directory")
merged on v4.12 Omar fixed the original blktrace code for request-based
drivers (multiqueue). This however left in place a possible crash, if you
happen to abuse blktrace while racing to remove / add a device.
We used to use asynchronous removal of the request_queue, and with that
the issue was easier to reproduce. Now that we have reverted to
synchronous removal of the request_queue, the issue is still possible to
reproduce, its however just a bit more difficult.
We essentially run two instances of break-blktrace which add/remove
a loop device, and setup a blktrace and just never tear the blktrace
down. We do this twice in parallel. This is easily reproduced with the
script run_0004.sh from break-blktrace [0].
We can end up with two types of panics each reflecting where we
race, one a failed blktrace setup:
debugfs: Directory 'loop0' with parent 'block' already present
This crash happens because of how blktrace uses the debugfs directory
where it places its files. Upon init we always create the same directory
which would be needed by blktrace but we only do this for make_request
drivers (multiqueue) block drivers. When you race a removal of these
devices with a blktrace setup you end up in a situation where the
make_request recursive debugfs removal will sweep away the blktrace
files and then later blktrace will also try to remove individual
dentries which are already NULL. The inverse is also possible and hence
the two types of use after frees.
We don't create the block debugfs directory on init for these types of
block devices:
* request-based block driver block devices
* every possible partition
* scsi-generic
And so, this race should in theory only be possible with make_request
drivers.
We can fix the UAF by simply re-using the debugfs directory for
make_request drivers (multiqueue) and only creating the ephemeral
directory for the other type of block devices. The new clarifications
on relying on the q->blk_trace_mutex *and* also checking for q->blk_trace
*prior* to processing a blktrace ensures the debugfs directories are
only created if no possible directory name clashes are possible.
This goes tested with:
o nvme partitions
o ISCSI with tgt, and blktracing against scsi-generic with:
o block
o tape
o cdrom
o media changer
o blktests
This patch is part of the work which disputes the severity of
CVE-2019-19770 which shows this issue is not a core debugfs issue, but
a misuse of debugfs within blktace.
Fixes: 6ac93117ab00 ("blktrace: use existing disk debugfs directory") Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Nicolai Stange <nstange@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: yu kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
msm8916-pins.dtsi specifies "bias-pull-none" for most of the audio
pin configurations. This was likely copied from the qcom kernel fork
where the same property was used for these audio pins.
However, "bias-pull-none" actually does not exist at all - not in
mainline and not in downstream. I can only guess that the original
intention was to configure "no pull", i.e. bias-disable.
The crypto notify call occurs with a read mutex held so you must
not do any substantial work directly. In particular, you cannot
call crypto_alloc_* as they may trigger further notifications
which may dead-lock in the presence of another writer.
This patch fixes this by postponing the work into a work queue and
taking the same lock in the module init function.
While we're at it this patch also ensures that all RCU accesses are
marked appropriately (tested with sparse).
Finally this also reveals a race condition in module param show
function as it may be called prior to the module init function.
It's fixed by testing whether crct10dif_tfm is NULL (this is true
iff the init function has not completed assuming fallback is false).
Fixes: 11dcb1037f40 ("crc-t10dif: Allow current transform to be...") Fixes: b76377543b73 ("crc-t10dif: Pick better transform if one...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
When kobject_init_and_add() returns an error, it should be handled
because kobject_init_and_add() takes a reference even when it fails. If
this function returns an error, kobject_put() must be called to properly
clean up the memory associated with the object.
Therefore, replace calling kfree() and call kobject_put() and add a
missing kobject_put() in the edac_device_register_sysfs_main_kobj()
error path.
The puma gmac node currently uses opposite active-values for the
gmac phy reset pin. The gpio-declaration uses active-high while the
separate snps,reset-active-low property marks the pin as active low.
While on the kernel side this works ok, other DT users may get
confused - as seen with uboot right now.
So bring this in line and make both properties match, similar to the
other Rockchip board.
The puma vcc5v0_host regulator node currently uses opposite active-values
for the enable pin. The gpio-declaration uses active-high while the
separate enable-active-low property marks the pin as active low.
While on the kernel side this works ok, other DT users may get
confused - as seen with uboot right now.
So bring this in line and make both properties match, similar to the
gmac fix.
The lion gmac node currently uses opposite active-values for the
gmac phy reset pin. The gpio-declaration uses active-high while the
separate snps,reset-active-low property marks the pin as active low.
While on the kernel side this works ok, other DT users may get
confused - as seen with uboot right now.
So bring this in line and make both properties match, similar to the
other Rockchip board.
During sched domain init, we check whether non-topological SD_flags are
returned by tl->sd_flags(), if found, fire a waning and correct the
violation, but the code failed to correct the violation. Correct this.
Fixes: 143e1e28cb40 ("sched: Rework sched_domain topology definition") Signed-off-by: Peng Liu <iwtbavbm@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20200609150936.GA13060@iZj6chx1xj0e0buvshuecpZ Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
With commit:
'b7031a02ec75 ("sched/fair: Add NOHZ_STATS_KICK")'
rebalance_domains of the local cfs_rq happens before others idle cpus have
updated nohz.next_balance and its value is overwritten.
Move the update of nohz.next_balance for other idles cpus before balancing
and updating the next_balance of local cfs_rq.
Also, the nohz.next_balance is now updated only if all idle cpus got a
chance to rebalance their domains and the idle balance has not been aborted
because of new activities on the CPU. In case of need_resched, the idle
load balance will be kick the next jiffie in order to address remaining
ilb.
Fixes: b7031a02ec75 ("sched/fair: Add NOHZ_STATS_KICK") Reported-by: Peng Liu <iwtbavbm@gmail.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lkml.kernel.org/r/20200609123748.18636-1-vincent.guittot@linaro.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
The current implementation always uses rpmh_write_async, which doesn't
wait for completion. That's fine for disable requests since there's no
immediate need for the clocks and they can be disabled in the
background. However, for enable requests we need to ensure the clocks
are actually enabled before returning to the client. Otherwise, clients
can end up accessing their HW before the necessary clocks are enabled,
which can lead to bus errors.
Use the synchronous version of this API (rpmh_write) for enable requests
in the active set to ensure completion.
Completion isn't required for sleep/wake sets, since they don't take
effect until after we enter sleep. All rpmh requests are automatically
flushed prior to entering sleep.
Fixes: 9c7e47025a6b ("clk: qcom: clk-rpmh: Add QCOM RPMh clock driver") Signed-off-by: Mike Tipton <mdtipton@codeaurora.org> Link: https://lkml.kernel.org/r/20200215021232.1149-1-mdtipton@codeaurora.org Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[sboyd@kernel.org: Reorg code a bit for readability, rename to 'wait' to
make local variable not conflict with completion.h mechanism] Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Liu Yong [Thu, 13 Aug 2020 06:56:44 +0000 (23:56 -0700)]
fs/io_uring.c: Fix uninitialized variable is referenced in io_submit_sqe
BugLink: https://bugs.launchpad.net/bugs/1892417
the commit <a4d61e66ee4a> ("<io_uring: prevent re-read of sqe->opcode>")
caused another vulnerability. After io_get_req(), the sqe_submit struct
in req is not initialized, but the following code defaults that
req->submit.opcode is available.
Signed-off-by: Liu Yong <pkfxxxing@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Some devices, particularly the 3DConnexion Spacemouse wireless 3D
controllers, return more than just the battery capacity in the battery
report. The Spacemouse devices return an additional byte with a device
specific field. However, hidinput_query_battery_capacity() only
requests a 2 byte transfer.
When a spacemouse is connected via USB (direct wire, no wireless dongle)
and it returns a 3 byte report instead of the assumed 2 byte battery
report the larger transfer confuses and frightens the USB subsystem
which chooses to ignore the transfer. Then after 2 seconds assume the
device has stopped responding and reset it. This can be reproduced
easily by using a wired connection with a wireless spacemouse. The
Spacemouse will enter a loop of resetting every 2 seconds which can be
observed in dmesg.
This patch solves the problem by increasing the transfer request to 4
bytes instead of 2. The fix isn't particularly elegant, but it is simple
and safe to backport to stable kernels. A further patch will follow to
more elegantly handle battery reports that contain additional data.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Cc: Darren Hart <darren@dvhart.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: stable@vger.kernel.org Tested-by: Darren Hart <dvhart@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
__tracepoint_string's have their string data stored in .rodata, and an
address to that data stored in the "__tracepoint_str" section. Functions
that refer to those strings refer to the symbol of the address. Compiler
optimization can replace those address references with references
directly to the string data. If the address doesn't appear to have other
uses, then it appears dead to the compiler and is removed. This can
break the /tracing/printk_formats sysfs node which iterates the
addresses stored in the "__tracepoint_str" section.
Like other strings stored in custom sections in this header, mark these
__used to inform the compiler that there are other non-obvious users of
the address, so they should still be emitted.
Link: https://lkml.kernel.org/r/20200730224555.2142154-2-ndesaulniers@google.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Cc: stable@vger.kernel.org Fixes: 102c9323c35a8 ("tracing: Add __tracepoint_string() to export string pointers") Reported-by: Tim Murray <timmurray@google.com> Reported-by: Simon MacMullen <simonmacm@google.com> Suggested-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
New devices add a new hardware acceleration engine, which adds some
restrictions to the driver.
Metadata descriptor must be present for each packet and the maximum
burst size between two doorbells is now limited to a number
advertised by the device.
This patch adds:
1. A handshake protocol between the driver and the device, so the
device will enable the accelerated queues only when both sides
support it.
2. The driver support for the new acceleration engine:
2.1. Send metadata descriptor for each Tx packet.
2.2. Limit the number of packets sent between doorbells.(*)
(*) A previous driver implementation of this feature was comitted in
commit 05d62ca218f8 ("net: ena: add handling of llq max tx burst size")
however the design of the interface between the driver and device
changed since then. This change is reflected in this commit.
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
When the ENA device resets to recover from some error state, all LLQ
configuration values are reset to their defaults, because LLQ is
initialized only once during ena_probe().
Changes in this commit:
1. Move the LLQ configuration process into ena_init_device()
which is called from both ena_probe() and ena_restore_device(). This
way, LLQ setup configurations that are different from the default
values will survive resets.
2. Extract the LLQ bar mapping to ena_map_llq_bar(),
and call once in the lifetime of the driver from ena_probe(),
since there is no need to unmap and map the LLQ bar again every reset.
3. Map the LLQ bar if it exists, regardless if initialization of LLQ
placement policy (ENA_ADMIN_PLACEMENT_POLICY_DEV) succeeded
or not. Initialization might fail the first time, falling back to the
ENA_ADMIN_PLACEMENT_POLICY_HOST placement policy, but later succeed
after device reset, in which case the LLQ bar needs to be mapped
already.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Add the rss_configurable_function_key bit to driver_supported_feature.
This bit tells the device that the driver in question supports the
retrieving and updating of RSS function and hash key, and therefore
the device should allow RSS function and key manipulation.
This commit turns on device support for hash key and RSS function
management. Without this commit this feature is turned off at the
device and appears to the user as unsupported.
This commit concludes the following series of already merged commits:
commit 0af3c4e2eab8 ("net: ena: changes to RSS hash key allocation")
commit c1bd17e51c71 ("net: ena: change default RSS hash function to Toeplitz")
commit f66c2ea3b18a ("net: ena: allow setting the hash function without changing the key")
commit e9a1de378dd4 ("net: ena: fix error returning in ena_com_get_hash_function()")
commit 80f8443fcdaa ("net: ena: avoid unnecessary admin command when RSS function set fails")
commit 6a4f7dc82d1e ("net: ena: rss: do not allocate key when not supported")
commit 0d1c3de7b8c7 ("net: ena: fix incorrect default RSS key")
The above commits represent the last part of the implementation of
this feature, and with them merged the feature can be enabled
in the device.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Add support for traffic mirroring, where the hardware reads the
buffer from the instance memory directly.
Traffic Mirroring needs access to the rx buffers in the instance.
To have this access, this patch:
1. Changes the code to map and unmap the rx buffers bidirectionally.
2. Enables the relevant bit in driver_supported_features to indicate
to the FW that this driver supports traffic mirroring.
Rx completion is not generated until mirroring is done to avoid
the situation where the driver changes the buffer before it is
mirrored.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
The size of the admin statistics in ena_com_stats_admin is changed
from 32bit to 64bit so to align with the sizes of the other statistics
in the driver (i.e. rx_stats, tx_stats and ena_stats_dev).
This is done as part of an effort to create a unified API to read
statistics.
Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
gcc 4.8 reports a warning when initializing with = {0}.
Dropping the "0" from the braces fixes the issue.
This fix is not ANSI compatible but is allowed by gcc.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Add a reserved PCI device ID to the driver's table
Used for internal testing purposes.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
For an overview of the race created by this patch goto synchronization
label.
In napi busy-poll mode, the kernel invokes the napi handler of the
device repeatedly to poll the NIC's receive queues. This process
repeats until a timeout, specific for each connection, is up.
By polling packets in busy-poll mode the user may gain lower latency
and higher throughput (since the kernel no longer waits for interrupts
to poll the queues) in expense of CPU usage.
Upon completing a napi routine, the driver checks whether
the routine was called by an interrupt handler. If so, the driver
re-enables interrupts for the device. This is needed since an
interrupt routine invocation disables future invocations until
explicitly re-enabled.
The driver avoids re-enabling the interrupts if they were not disabled
in the first place (e.g. if driver in busy mode).
Originally, the driver checked whether interrupt re-enabling is needed
by reading the 'ena_napi->unmask_interrupt' variable. This atomic
variable was set upon interrupt and cleared after re-enabling it.
In the 4.10 Linux version, the 'napi_complete_done' call was changed
so that it returns 'false' when device should not re-enable
interrupts, and 'true' otherwise. The change includes reading the
"NAPIF_STATE_IN_BUSY_POLL" flag to check if the napi call is in
busy-poll mode, and if so, return 'false'.
The driver was changed to re-enable interrupts according to this
routine's return value.
The Linux community rejected the use of the
'ena_napi->unmaunmask_interrupt' variable to determine whether
unmasking is needed, and urged to use napi_napi_complete_done()
return value solely.
See https://lore.kernel.org/patchwork/patch/741149/ for more details
As explained, a busy-poll session exists for a specified timeout
value, after which it exits the busy-poll mode and re-enters it later.
This leads to many invocations of the napi handler where
napi_complete_done() false indicates that interrupts should be
re-enabled.
This creates a bug in which the interrupts are re-enabled
unnecessarily.
To reproduce this bug:
1) echo 50 | sudo tee /proc/sys/net/core/busy_poll
2) echo 50 | sudo tee /proc/sys/net/core/busy_read
3) Add counters that check whether
'ena_unmask_interrupt(tx_ring, rx_ring);'
is called without disabling the interrupts in the first
place (i.e. with calling the interrupt routine
ena_intr_msix_io())
Steps 1+2 enable busy-poll as the default mode for new connections.
The busy poll routine rearms the interrupts after every session by
design, and so we need to add an extra check that the interrupts were
masked in the first place.
synchronization:
This patch introduces a race between the interrupt handler
ena_intr_msix_io() and the napi routine ena_io_poll().
Some macros and instruction were added to prevent this race from leaving
the interrupts masked. The following specifies the different race
scenarios in this patch:
1) interrupt handler and napi routine run sequentially
i) interrupt handler is called, sets 'interrupts_masked' flag and
successfully schedules the napi handler via softirq.
In this scenario the napi routine might not see the flag change
for several reasons:
a) The flag is stored in a register by the compiler. For this
case the WRITE_ONCE macro which prevents this.
b) The compiler might reorder the instruction. For this the
smp_wmb() instruction was used which implies a compiler memory
barrier.
c) On archs with weak consistency model (like ARM64) the napi
routine might be scheduled and start running before the flag
STORE instruction is committed to cache/memory. To ensure this
doesn't happen, the smp_wmb() instruction was added. It ensures
that the flag set instruction is committed before scheduling
napi.
ii) compiler reorders the flag's value check in the 'if' with
the flag set in the napi routine.
This scenario is prevented by smp_rmb() call after the flag check.
2) interrupt handler and napi routine run in parallel (can happen when
busy poll routine invokes the napi handler)
i) interrupt handler sets the flag in one core, while the napi
routine reads it in another core.
This scenario also is divided into two cases:
a) napi_complete_done() doesn't finish running, in which case
napi_sched() would just set NAPIF_STATE_MISSED and the napi
routine would reschedule itself without changing the flag's value.
b) napi_complete_done() finishes running. In this case the
napi routine might override the flag's value.
This doesn't present any rise since it later unmasks the
interrupt vector.
Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
drivers/net/ethernet/amazon/ena/ena_netdev.c:2193:34: warning:
Using plain integer as NULL pointer
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Suggested-by: Joe Perches <joe@perches.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Acked-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
With legacy PM, drivers themselves were responsible for managing the
device's power states and takes care of register states.
After upgrading to the generic structure, PCI core will take care of
required tasks and drivers should do only device-specific operations.
Compile-tested only.
Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
1. If the XDP verdict is XDP_ABORTED we break the loop, which results in
us handling one buffer per napi cycle instead of the total budget
(usually 64). To overcome this simply change the xdp_verdict check to
!= XDP_PASS. When the verdict is XDP_PASS, the skb is not expected to
be NULL.
2. Update the residual budget for XDP_DROP and XDP_ABORTED, since
packets are handled in these cases.
Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
When sending very high packet rate, the XDP tx queues can get full and
start dropping packets. In this case we don't free the pages which
results in ena driver draining the system memory.
Fix:
Simply free the pages when necessary.
Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
This commit reduces the driver load time by using usec resolution
instead of msec when polling for hardware state change.
Also add back-off mechanism to handle cases where minimal sleep
time is not enough.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
1. Use BIT macro instead of shift operator for code clarity
2. Replace multiple flag assignments to a single assignment of multiple
flags in ena_com_add_single_rx_desc()
3. Move ENA_HASH_KEY_SIZE from ena_netdev.h to ena_com.h
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
1. Add leading and trailing spaces to several comments for better
readability
2. Make tabs and spaces uniform in enum defines in ena_admin_defs.h
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
1. Reorder sanity checks in get_comp_ctxt() to make more sense
2. Reorder variables in ena_com_fill_hash_function() and
ena_calc_io_queue_size() in reverse christmas tree.
3. Move around member initializations.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
1. Remove unused definition of DRV_MODULE_VERSION
2. Remove {} from single line-of-code ifs
3. Remove unnecessary comments from ena_get/set_coalesce()
4. Remove unnecessary extra spaces and newlines
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
1. Join unnecessarily broken short lines in ena_com.c ena_netdev.c
2. Fix Indentations of broken lines
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
fix spelling and grammar mistakes in comments in ena_com.h,
ena_com.c and ena_netdev.c
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Make all types of variables that convey the number and sizeof queues to
be u32, for consistency with the API between the driver and device via
ena_admin_defs.h:ena_admin_get_feat_resp.max_queue_ext fields. Current
code sometimes uses int and there are multiple assignments between these
variables with different types.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Initialize prev_intr_delay_resolution with ena_dev->intr_delay_resolution
unconditionally, since it is initialized with
ENA_DEFAULT_INTR_DELAY_RESOLUTION in ena_probe(). This approach makes much
more sense than handling errors of not initializing it.
Also added unlikely to if condition.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Default return value should be -EINVAL since the input
in this case was unexpected.
Also remove the now redundant check in the beginning
of the function.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Signed-off-by: Shai Brandes <shaibran@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Rename ena_com_free_desc to ena_com_free_q_entries to match
the LLQ mode.
In non-LLQ mode, an entry in an IO ring corresponds to a
a descriptor. In LLQ mode an entry may correspond to several
descriptors (per LLQ definition).
Signed-off-by: Igor Chauskin <igorch@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Newer ENA devices can write data to rx buffers with an offset
from the beginning of the buffer.
This commit adds support for this feature in the driver.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Extract code to ena_indirection_table_set() to make
the code cleaner.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
The macros in ena_com.h have inconsistent spaces between
the macro name and it's value.
This commit sets all the macros to have a single space between
the name and value.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
The 'ENA_REGS_RESET_SHUTDOWN' enum indicates a normal driver
shutdown / removal procedure.
Also, a comment is added to one of the reset reason assignments for
code clarity.
Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Before this commit there was a function prototype named
ena_com_get_ena_admin_polling_mode() that was never implemented.
This patch simply deletes it.
Signed-off-by: Igor Chauskin <igorch@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
1. Add support for getting tx drops from the device and saving them
in the driver.
2. Report tx via netdev stats.
Signed-off-by: Igor Chauskin <igorch@amazon.com> Signed-off-by: Guy Tzalik <gtzalik@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Both key and func parameters are pointers on the stack.
Setting them to NULL does nothing.
The original intent was to leave the key and func unset in this case,
but for this to happen nothing needs to be done as the calling
function ethtool_get_rxfh() already clears key and func.
This commit removes the above described useless code.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
1. Use ena_com_check_supported_feature_id() in
ena_com_hash_key_fill_default_key() instead of rewriting
its implementation. This also saves us a superfluous admin
command by using the cached value.
2. Change if conditions in ena_com_rss_init() to be clearer.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Currently in the driver we are setting the hash function to be CRC32.
Starting with this commit we want to change the default behaviour so that
we set the hash function to be Toeplitz instead.
Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Current code does not allow setting the hash function without
changing the key. This commit enables it.
To achieve this we separate ena_com_get_hash_function() to 2 functions:
ena_com_get_hash_function() - which gets only the hash function, and
ena_com_get_hash_key() - which gets only the hash key.
Also return 0 instead of rc at the end of ena_get_rxfh() since all
previous operations succeeded.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Currently when ena_set_hash_function() fails the hash function is
restored to the previous value by calling an admin command to get
the hash function from the device.
In this commit we avoid the admin command, by saving the previous
hash function before calling ena_set_hash_function() and using this
previous value to restore the hash function in case of failure of
ena_set_hash_function().
Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
drivers/net/ethernet/amazon/ena/ena_netdev.c:460:6: warning: symbol 'ena_xdp_exchange_program_rx_in_range' was not declared. Should it be static?
drivers/net/ethernet/amazon/ena/ena_netdev.c:481:6: warning: symbol 'ena_xdp_exchange_program' was not declared. Should it be static?
drivers/net/ethernet/amazon/ena/ena_netdev.c:1555:5: warning: symbol 'ena_xdp_handle_buff' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
last_keep_alive_jiffies is updated in probe and when a keep-alive
event is received. In case the driver times-out on a keep-alive event,
it has high chances of continuously timing-out on keep-alive events.
This is because when the driver recovers from the keep-alive-timeout reset
the value of last_keep_alive_jiffies is very old, and if a keep-alive
event is not received before the next timer expires, the value of
last_keep_alive_jiffies will cause another keep-alive-timeout reset
and so forth in a loop.
Solution:
Update last_keep_alive_jiffies whenever the device is restored after
reset.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Noam Dagan <ndagan@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Rx req_id is an index in struct ena_eth_io_rx_cdesc_base.
The driver should validate that the Rx req_id it received from
the device is in range [0, ring_size -1]. Failure to do so could
yield to potential memory access violoation.
The validation was mistakenly done when refilling
the Rx submission queue and not in Rx completion queue.
Fixes: ad974baef2a1 ("net: ena: add support for out of order rx buffers refill") Signed-off-by: Noam Dagan <ndagan@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Bug:
In short the main issue is caused by the fact that the number of queues
is changed using ethtool after ena_probe() has been called and before
ena_up() was executed. Here is the full scenario in detail:
* ena_probe() is called when the driver is loaded, the driver is not up
yet at the end of ena_probe().
* The number of queues is changed -> io_queue_count is changed as well -
ena_up() is not called since the "dev_was_up" boolean in
ena_update_queue_count() is false.
* ena_up() is called by the kernel (it's called asynchronously some
time after ena_probe()). ena_setup_io_intr() is called by ena_up() and
it uses io_queue_count to get the suitable irq lines for each msix
vector. The function ena_request_io_irq() is called right after that
and it uses msix_vecs - This value only changes during ena_probe() and
ena_restore() - to request the irq vectors. This results in "Failed to
request I/O IRQ" error for i > io_queue_count.
Numeric example:
* After ena_probe() io_queue_count = 8, msix_vecs = 9.
* The number of queues changes to 4 -> io_queue_count = 4, msix_vecs = 9.
* ena_up() is executed for the first time:
** ena_setup_io_intr() inits the vectors only up to io_queue_count.
** ena_request_io_irq() calls request_irq() and fails for i = 5.
How to reproduce:
simply run the following commands:
sudo rmmod ena && sudo insmod ena.ko;
sudo ethtool -L eth1 combined 3;
Fix:
Use ENA_MAX_MSIX_VEC(adapter->num_io_queues + adapter->xdp_num_queues)
instead of adapter->msix_vecs. We need to take XDP queues into
consideration as they need to have msix vectors assigned to them as well.
Note that the XDP cannot be attached before the driver is up and running
but in XDP mode the issue might occur when the number of queues changes
right after a reset trigger.
The ENA_MAX_MSIX_VEC simply adds one to the argument since the first msix
vector is reserved for management queue.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Overview:
We don't frequently change the msix vectors throughout the life cycle of
the driver. We do so in two functions: ena_probe() and ena_restore().
ena_probe() is only called when the driver is loaded. ena_restore() on the
other hand is called during device reset / resume operations.
We use num_io_queues for calculating and allocating the number of msix
vectors. At ena_probe() this value is equal to max_num_io_queues and thus
this is not an issue, however ena_restore() might be called after the
number of io queues has changed.
A possible bug scenario is as follows:
* Change number of queues from 8 to 4.
(num_io_queues = 4, max_num_io_queues = 8, msix_vecs = 9,)
* Trigger reset occurs -> ena_restore is called.
(num_io_queues = 4, max_num_io_queues =8 , msix_vecs = 5)
* Change number of queues from 4 to 6.
(num_io_queues = 6, max_num_io_queues = 8, msix_vecs = 5)
* The driver will reset due to failure of check_for_rx_interrupt_queue()
Fix:
This can be easily fixed by always using max_num_io_queues to init the
msix_vecs, since this number won't change as opposed to num_io_queues.
Fixes: 4d19266022ec ("net: ena: multiple queue creation related cleanups") Signed-off-by: Sameeh Jubran <sameehj@amazon.com> Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
There is a statement that is indented incorrectly, remove a space.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
In this commit we revert the part of
commit 1a63443afd70 ("net/amazon: Ensure that driver version is aligned to the linux kernel"),
which breaks the interface between the ENA driver and FW.
We also replace the use of DRIVER_VERSION with DRIVER_GENERATION
when we bring back the deleted constants that are used in interface with
ENA device FW.
This commit does not change the driver version reported to the user via
ethtool, which remains the kernel version.
Fixes: 1a63443afd70 ("net/amazon: Ensure that driver version is aligned to the linux kernel") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
Upstream drivers are managed inside global repository and released all
together, this ensure that driver version is the same as linux kernel,
so update amazon drivers to properly reflect it.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>
The non-zero check on rc is redundant as a previous non-zero
check on rc will always return and the second check is never
reached, hence it is redundant and can be removed. Also
remove a blank line.
Addresses-Coverity: ("Logically dead code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Ian May <ian.may@canonical.com> Signed-off-by: Kelsey Skunberg <kelsey.skunberg@canonical.com>