AceLan Kao [Wed, 8 Jan 2020 07:59:45 +0000 (15:59 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: add retry for get scalar status
BugLink: https://bugs.launchpad.net/bugs/1858761
Found on new platforms that UART require more than 1 second to respond
commands in the first 10 seconds after booted.
dell_uart_get_scalar_status() is the first command we send to scalar and
this command should be more reliable than other commands, and make sure
we got correct response from scalar. So, add retry and increase the read
timeout to 2 seconds.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Jouni Malinen [Fri, 24 Jan 2020 19:14:23 +0000 (11:14 -0800)]
mac80211: Do not send Layer 2 Update frame before authorization
CVE-2019-5108
The Layer 2 Update frame is used to update bridges when a station roams
to another AP even if that STA does not transmit any frames after the
reassociation. This behavior was described in IEEE Std 802.11F-2003 as
something that would happen based on MLME-ASSOCIATE.indication, i.e.,
before completing 4-way handshake. However, this IEEE trial-use
recommended practice document was published before RSN (IEEE Std
802.11i-2004) and as such, did not consider RSN use cases. Furthermore,
IEEE Std 802.11F-2003 was withdrawn in 2006 and as such, has not been
maintained amd should not be used anymore.
Sending out the Layer 2 Update frame immediately after association is
fine for open networks (and also when using SAE, FT protocol, or FILS
authentication when the station is actually authenticated by the time
association completes). However, it is not appropriate for cases where
RSN is used with PSK or EAP authentication since the station is actually
fully authenticated only once the 4-way handshake completes after
authentication and attackers might be able to use the unauthenticated
triggering of Layer 2 Update frame transmission to disrupt bridge
behavior.
Fix this by postponing transmission of the Layer 2 Update frame from
station entry addition to the point when the station entry is marked
authorized. Similarly, send out the VLAN binding update only if the STA
entry has already been authorized.
Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3e493173b7841259a08c5c8e5cbe90adb349da7e) Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Aaron Ma [Tue, 14 Jan 2020 05:47:17 +0000 (13:47 +0800)]
UBUNTU: SAUCE: ACPI: video: Use native backlight on Lenovo E41-25/45
BugLink: https://bugs.launchpad.net/bugs/1859561
ACPI backlight control doesn't work on 2 Lenovo E41 laptops.
So force to use native backlight control on them.
(cherry picked from 53870cf03faefa1f0d77fbc366db87d5beda203c) Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Symptom: Error message "Configuring the VNIC characteristics failed"
in dmesg whenever an OSA interface on z15 is set online.
The VNIC characteristics get re-programmed when setting a L2 device
online. This follows the selected 'wanted' characteristics - with the
exception that the INVISIBLE characteristic unconditionally gets
switched off.
For devices that don't support INVISIBLE (ie. OSA), the resulting
IO failure raises a noisy error message
("Configuring the VNIC characteristics failed").
For IQD, INVISIBLE is off by default anyways.
So don't unnecessarily special-case the INVISIBLE characteristic, and
thereby suppress the misleading error message on OSA devices.
Fixes: caa1f0b10d18 ("s390/qeth: add VNICC enable/disable support")
OriginalAuthor: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(backported from commit 68c57bfd52836e31bff33e5e1fc64029749d2c35)
[ frank-heimes: from 4.19-stable ] Signed-off-by: Frank Heimes <frank.heimes@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
it will consider mark<ENFORCED>note as the option name, as the regexp allows
any characters not in the whitespace class (\S). By using a class like [^\s<],
that is, not whitespace and not '<', it will be correctly parsed.
We could fix the entries that have this problem, but fixing the parser is more
robust.
Ignore: yes
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Kai-Heng Feng [Thu, 5 Dec 2019 17:05:00 +0000 (18:05 +0100)]
UBUNTU: SAUCE: USB: core: Attempt power cycle port when it's in eSS.Disabled state
BugLink: https://bugs.launchpad.net/bugs/1855312
On Dell TB16, Realtek USB ethernet (r8152) connects to an SMSC hub which
then connects to ASMedia xHCI's root hub:
/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/7p, 5000M
|__ Port 2: Dev 3, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 004 Device 002: ID 0424:5537 Standard Microsystems Corp. USB5537B
Bus 004 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
The SMSC hub may disconnect after system resume from suspend. When this
happens, the reset resume attempt fails, and the last resort to disable
the port and see something comes up later, also fails.
When the issue occurs, the link state stays in eSS.Disabled state
despite the warm reset attempts. Accoding to spec this can be caused by
invalid VBus, after some expiremets, the SMSC hub can be brought back
after a powercycle.
So let's power cycle the port at the end of reset resume attempt, if
it's in eSS.Disabled state.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Kai-Heng Feng [Thu, 5 Dec 2019 17:05:00 +0000 (18:05 +0100)]
UBUNTU: SAUCE: USB: core: Make port power cycle a seperate helper function
BugLink: https://bugs.launchpad.net/bugs/1855312
Add a new function, hub_port_power_cycle() to power cycle port's power.
It'll be used by a following patch.
In addition to that, check the return value of usb_hub_set_port_power(),
so we don't need to wait if the set power operation fails.
Furthermore, remove parameter *hdev from usb_hub_set_port_power(), since
we can get *hdev from *hub directly.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Jens Axboe [Wed, 27 Nov 2019 20:18:20 +0000 (17:18 -0300)]
blk-mq: punt failed direct issue to dispatch list
BugLink: https://bugs.launchpad.net/bugs/1848739
After the direct dispatch corruption fix, we permanently disallow direct
dispatch of non read/write requests. This works fine off the normal IO
path, as they will be retried like any other failed direct dispatch
request. But for the blk_insert_cloned_request() that only DM uses to
bypass the bottom level scheduler, we always first attempt direct
dispatch. For some types of requests, that's now a permanent failure,
and no amount of retrying will make that succeed. This results in a
livelock.
Instead of making special cases for what we can direct issue, and now
having to deal with DM solving the livelock while still retaining a BUSY
condition feedback loop, always just add a request that has been through
->queue_rq() to the hardware queue dispatch list. These are safe to use
as no merging can take place there. Additionally, if requests do have
prepped data from drivers, we aren't dependent on them not sharing space
in the request structure to safely add them to the IO scheduler lists.
This basically reverts ffe81d45322c and is based on a patch from Ming,
but with the list insert case covered as well.
Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue") Cc: stable@vger.kernel.org Suggested-by: Ming Lei <ming.lei@redhat.com> Reported-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Ming Lei <ming.lei@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit c616cbee97aed4bc6178f148a7240206dcdb85a6) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Ming Lei [Wed, 27 Nov 2019 20:18:19 +0000 (17:18 -0300)]
blk-mq: fail the request in case issue failure
BugLink: https://bugs.launchpad.net/bugs/1848739
Inside blk_mq_try_issue_list_directly(), if the request is issued as
failed, we shouldn't try to do it again, otherwise the warning in
blk_mq_start_request() will be triggered. This change is aligned to
behaviour of other ways of request issue & dispatch.
Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'") Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: kernel test robot <rong.a.chen@intel.com> Cc: LKP <lkp@01.org> Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 8824f62246bef288173a6624a363352f0d4d3b09) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Jens Axboe [Wed, 27 Nov 2019 20:18:18 +0000 (17:18 -0300)]
blk-mq: fix corruption with direct issue
BugLink: https://bugs.launchpad.net/bugs/1848739
If we attempt a direct issue to a SCSI device, and it returns BUSY, then
we queue the request up normally. However, the SCSI layer may have
already setup SG tables etc for this particular command. If we later
merge with this request, then the old tables are no longer valid. Once
we issue the IO, we only read/write the original part of the request,
not the new state of it.
This causes data corruption, and is most often noticed with the file
system complaining about the just read data being invalid:
[ 235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256)
because most of it is garbage...
This doesn't happen from the normal issue path, as we will simply defer
the request to the hardware queue dispatch list if we fail. Once it's on
the dispatch list, we never merge with it.
Fix this from the direct issue path by flagging the request as
REQ_NOMERGE so we don't change the size of it before issue.
See also:
https://bugzilla.kernel.org/show_bug.cgi?id=201685
Tested-by: Guenter Roeck <linux@roeck-us.net> Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'") Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit ffe81d45322cc3cb140f0db080a4727ea284661e) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Ming Lei [Wed, 27 Nov 2019 20:18:17 +0000 (17:18 -0300)]
blk-mq: issue directly if hw queue isn't busy in case of 'none'
BugLink: https://bugs.launchpad.net/bugs/1848739
In case of 'none' io scheduler, when hw queue isn't busy, it isn't
necessary to enqueue request to sw queue and dequeue it from
sw queue because request may be submitted to hw queue asap without
extra cost, meantime there shouldn't be much request in sw queue,
and we don't need to worry about effect on IO merge.
There are still some single hw queue SCSI HBAs(HPSA, megaraid_sas, ...)
which may connect high performance devices, so 'none' is often required
for obtaining good performance.
This patch improves IOPS and decreases CPU unilization on megaraid_sas,
per Kashyap's test.
Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Reported-by: Kashyap Desai <kashyap.desai@broadcom.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 6ce3dd6eec114930cf2035a8bcb1e80477ed79a8) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Ming Lei [Wed, 27 Nov 2019 20:18:16 +0000 (17:18 -0300)]
blk-mq: dequeue request one by one from sw queue if hctx is busy
BugLink: https://bugs.launchpad.net/bugs/1848739
It won't be efficient to dequeue request one by one from sw queue,
but we have to do that when queue is busy for better merge performance.
This patch takes the Exponential Weighted Moving Average(EWMA) to figure
out if queue is busy, then only dequeue request one by one from sw queue
when queue is busy.
Fixes: b347689ffbca ("blk-mq-sched: improve dispatching from sw queue") Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Laurence Oberman <loberman@redhat.com> Cc: Omar Sandoval <osandov@fb.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.de> Reported-by: Kashyap Desai <kashyap.desai@broadcom.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 6e768717304bdbe8d2897ca8298f6b58863fdc41) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Jens Axboe [Wed, 27 Nov 2019 20:18:15 +0000 (17:18 -0300)]
blk-mq: don't queue more if we get a busy return
BugLink: https://bugs.launchpad.net/bugs/1848739
Some devices have different queue limits depending on the type of IO. A
classic case is SATA NCQ, where some commands can queue, but others
cannot. If we have NCQ commands inflight and encounter a non-queueable
command, the driver returns busy. Currently we attempt to dispatch more
from the scheduler, if we were able to queue some commands. But for the
case where we ended up stopping due to BUSY, we should not attempt to
retrieve more from the scheduler. If we do, we can get into a situation
where we attempt to queue a non-queueable command, get BUSY, then
successfully retrieve more commands from that scheduler and queue those.
This can repeat forever, starving the non-queuable command indefinitely.
Fix this by NOT attempting to pull more commands from the scheduler, if
we get a BUSY return. This should also be more optimal in terms of
letting requests stay in the scheduler for as long as possible, if we
get a BUSY due to the regular out-of-tags condition.
Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 1f57f8d442f8017587eeebd8617913bfc3661d3d) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Bart Van Assche [Wed, 27 Nov 2019 20:18:14 +0000 (17:18 -0300)]
blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
BugLink: https://bugs.launchpad.net/bugs/1848739
Most blk-mq functions have a name that follows the pattern blk_mq_${action}.
However, the function name blk_mq_request_direct_issue is an exception.
Hence rename this function. This patch does not change any functionality.
Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit c77ff7fd03ddca8face268c4cf093c0edf4bcf1f) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Ming Lei [Wed, 27 Nov 2019 20:18:13 +0000 (17:18 -0300)]
blk-mq: introduce BLK_STS_DEV_RESOURCE
BugLink: https://bugs.launchpad.net/bugs/1848739
This status is returned from driver to block layer if device related
resource is unavailable, but driver can guarantee that IO dispatch
will be triggered in future when the resource is available.
Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver
returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is
3 ms because both scsi-mq and nvmefc are using that magic value.
If a driver can make sure there is in-flight IO, it is safe to return
BLK_STS_DEV_RESOURCE because:
1) If all in-flight IOs complete before examining SCHED_RESTART in
blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
is run immediately in this case by blk_mq_dispatch_rq_list();
2) if there is any in-flight IO after/when examining SCHED_RESTART
in blk_mq_dispatch_rq_list():
- if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
- otherwise, this request will be dispatched after any in-flight IO is
completed via blk_mq_sched_restart()
3) if SCHED_RESTART is set concurently in context because of
BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
cases and make sure IO hang can be avoided.
One invariant is that queue will be rerun if SCHED_RESTART is set.
Suggested-by: Jens Axboe <axboe@kernel.dk> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 86ff7c2a80cd357f6156a53b354f6a0b357dc0c9)
[marcelo.cerri@canonical.com: Fixed context in
include/linux/blk_types.h, the missing context is from commit 9111e5686c8c ("block: Provide blk_status_t decoding for path errors")
which is not necessary] Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Ming Lei [Wed, 27 Nov 2019 20:18:12 +0000 (17:18 -0300)]
blk-mq: don't dispatch request in blk_mq_request_direct_issue if queue is busy
BugLink: https://bugs.launchpad.net/bugs/1848739
If we run into blk_mq_request_direct_issue(), when queue is busy, we
don't want to dispatch this request into hctx->dispatch_list, and
what we need to do is to return the queue busy info to caller, so
that caller can deal with it well.
Fixes: 396eaf21ee ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Reported-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 23d4ee19e789ae3dce3e04bd24e3d1537965475f) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
923218f6166a ("blk-mq: don't allocate driver tag upfront for flush rq")
we no longer use the 'can_block' argument in
blk_mq_sched_insert_request(). Kill it.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Added actual commit message as to why it's being removed.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 9e97d2951a7e6ee6e204f87f6bda4ff754a8cede)
[marcelo.cerri@canonical.com: fixed conflict in blk_mq_requeue_work()
because the commit aef1897cd36d ("blk-mq: insert rq with DONTPREP to
hctx dispatch list when requeue") was already applied] Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Yufen Yu [Wed, 27 Nov 2019 20:18:10 +0000 (17:18 -0300)]
dm mpath: fix missing call of path selector type->end_io
BugLink: https://bugs.launchpad.net/bugs/1848739
After commit 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via
blk_insert_cloned_request feedback"), map_request() will requeue the tio
when issued clone request return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE.
Thus, if device driver status is error, a tio may be requeued multiple
times until the return value is not DM_MAPIO_REQUEUE. That means
type->start_io may be called multiple times, while type->end_io is only
called when IO complete.
In fact, even without commit 396eaf21ee17, setup_clone() failure can
also cause tio requeue and associated missed call to type->end_io.
The service-time path selector selects path based on in_flight_size,
which is increased by st_start_io() and decreased by st_end_io().
Missed calls to st_end_io() can lead to in_flight_size count error and
will cause the selector to make the wrong choice. In addition,
queue-length path selector will also be affected.
To fix the problem, call type->end_io in ->release_clone_rq before tio
requeue. map_info is passed to ->release_clone_rq() for map_request()
error path that result in requeue.
Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") Cc: stable@vger.kernl.org Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit 5de719e3d01b4abe0de0d7b857148a880ff2a90b)
[marcelo.cerri@canonical.com: This patch was already partially applied
before via upstream stable updates. This patch adds the remaining
changes after the inclusion of 396eaf21ee17 ("blk-mq: improve DM's
blk-mq IO merging via blk_insert_cloned_request feedback").] Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Ming Lei [Wed, 27 Nov 2019 20:18:09 +0000 (17:18 -0300)]
blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback
BugLink: https://bugs.launchpad.net/bugs/1848739
blk_insert_cloned_request() is called in the fast path of a dm-rq driver
(e.g. blk-mq request-based DM mpath). blk_insert_cloned_request() uses
blk_mq_request_bypass_insert() to directly append the request to the
blk-mq hctx->dispatch_list of the underlying queue.
1) This way isn't efficient enough because the hctx spinlock is always
used.
2) With blk_insert_cloned_request(), we completely bypass underlying
queue's elevator and depend on the upper-level dm-rq driver's elevator
to schedule IO. But dm-rq currently can't get the underlying queue's
dispatch feedback at all. Without knowing whether a request was issued
or not (e.g. due to underlying queue being busy) the dm-rq elevator will
not be able to provide effective IO merging (as a side-effect of dm-rq
currently blindly destaging a request from its elevator only to requeue
it after a delay, which kills any opportunity for merging). This
obviously causes very bad sequential IO performance.
Fix this by updating blk_insert_cloned_request() to use
blk_mq_request_direct_issue(). blk_mq_request_direct_issue() allows a
request to be issued directly to the underlying queue and returns the
dispatch feedback (blk_status_t). If blk_mq_request_direct_issue()
returns BLK_SYS_RESOURCE the dm-rq driver will now use DM_MAPIO_REQUEUE
to _not_ destage the request. Whereby preserving the opportunity to
merge IO.
With this, request-based DM's blk-mq sequential IO performance is vastly
improved (as much as 3X in mpath/virtio-scsi testing).
Signed-off-by: Ming Lei <ming.lei@redhat.com>
[blk-mq.c changes heavily influenced by Ming Lei's initial solution, but
they were refactored to make them less fragile and easier to read/review] Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 396eaf21ee17c476e8f66249fb1f4a39003d0ab4) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In following commit, __blk_mq_try_issue_directly() will be used to
return the dispatch result (blk_status_t) to DM. DM needs this
information to improve IO merging.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 0f95549c0ea1e8075ae049202088b2c6a0cb40ad) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Jens Axboe [Wed, 27 Nov 2019 20:18:07 +0000 (17:18 -0300)]
blk-mq: move hctx lock/unlock into a helper
BugLink: https://bugs.launchpad.net/bugs/1848739
Move the RCU vs SRCU logic into lock/unlock helpers, which makes
the actual functional bits within the locked region much easier
to read.
tj: Reordered in front of timeout revamp patches and added the missing
blk_mq_run_hw_queue() conversion.
Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 04ced159cec863f9bc27015d6b970bb13cfa6176) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Ming Lei [Wed, 27 Nov 2019 20:18:06 +0000 (17:18 -0300)]
blk-mq: quiesce queue during switching io sched and updating nr_requests
BugLink: https://bugs.launchpad.net/bugs/1848739
Dispatch may still be in-progress after queue is frozen, so we have to
quiesce queue before switching IO scheduler and updating nr_requests.
Also when switching io schedulers, blk_mq_run_hw_queue() may still be
called somewhere(such as from nvme_reset_work()), and io scheduler's
per-hctx data may not be setup yet, so cause oops even inside
blk_mq_hctx_has_pending(), such as it can be run just between:
ret = e->ops.mq.init_sched(q, e);
AND
ret = e->ops.mq.init_hctx(hctx, i)
inside blk_mq_init_sched().
This reverts commit 7a148c2fcff8330(block: don't call blk_mq_quiesce_queue()
after queue is frozen) basically, and makes sure blk_mq_hctx_has_pending
won't be called if queue is quiesced.
Reviewed-by: Christoph Hellwig <hch@lst.de> Fixes: 7a148c2fcff83309(block: don't call blk_mq_quiesce_queue() after queue is frozen) Reported-by: Yi Zhang <yi.zhang@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 24f5a90f0d13a97b51aa79f468143fafea4246bb) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 445ee2de112a18419aeae72fdae4221cd90f2948) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
scsi: hisi_sas: Assign NCQ tag for all NCQ commands
BugLink: https://launchpad.net/bugs/1853995
Currently the NCQ tag is only assigned for FPDMA READ and FPDMA WRITE
commands, and for other NCQ commands (such as FPDMA SEND), their NCQ tags
are set in the delivery command to 0.
So for all the NCQ commands, we also need to assign normal NCQ tag for
them, so drop the command type check in hisi_sas_get_ncq_tag() [drop
hisi_sas_get_ncq_tag() altogether actually], and always use the ATA command
NCQ tag when appropriate.
Link: https://lore.kernel.org/r/1567774537-20003-8-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 435a05cf8c003221fd945229e2b61b5ecca9070d) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
scsi: hisi_sas: Fix the conflict between device gone and host reset
BugLink: https://launchpad.net/bugs/1853997
When device gone, it will check whether it is during reset, if not, it will
send internal task abort. Before internal task abort returned, reset
begins, and it will check whether SAS_PHY_UNUSED is set, if not, it will
call hisi_sas_init_device(), but at that time domain_device may already be
freed or part of it is freed, so it may referenece null pointer in
hisi_sas_init_device(). It may occur as follows:
thread0 thread1
hisi_sas_dev_gone()
check whether in RESET(no)
internal task abort
reset prep
soft_reset
... (part of reset_done)
internal task abort failed
release resource anyway
clear_itct
device->lldd_dev=NULL
hisi_sas_reset_init_all_device
check sas_dev->dev_type is SAS_PHY_UNUSED and
!device
set dev_type SAS_PHY_UNUSED
sas_free_device
hisi_sas_init_device
...
Semaphore hisi_hba.sema is used to sync the processes of device gone and
host reset.
To solve the issue, expand the scope that semaphore protects and let them
never occur together.
And also some places will check whether domain_device is NULL to judge
whether the device is gone. So when device gone, need to clear
sas_dev->sas_device.
Link: https://lore.kernel.org/r/1567774537-20003-14-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e74006edd0d42b45ff37ae4ae13c614cfa30056b) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Anand Jain [Tue, 10 Dec 2019 00:48:00 +0000 (01:48 +0100)]
btrfs: merge btrfs_find_device and find_device
CVE-2019-18885
Both btrfs_find_device() and find_device() does the same thing except
that the latter does not take the seed device onto account in the device
scanning context. We can merge them.
Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
(backported from commit 09ba3bc9dd150457c506e4661380a6183af651c1)
[ Connor Kuehl: 'metadata_uuid' does not exist yet, so use the
previously used 'fsid' data member in 'struct btrfs_fs_devices' ] Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Hui Wang [Wed, 25 Dec 2019 03:40:00 +0000 (04:40 +0100)]
ALSA: usb-audio: set the interface format after resume on Dell WD19
BugLink: https://bugs.launchpad.net/bugs/1857496
Recently we found the headset-mic on the Dell Dock WD19 doesn't work
anymore after s3 (s2i or deep), this problem could be workarounded by
closing (pcm_close) the app and then reopening (pcm_open) the app, so
this bug is not easy to be detected by users.
When problem happens, retire_capture_urb() could still be called
periodically, but the size of captured data is always 0, it could be
a firmware bug on the dock. Anyway I found after resuming, the
snd_usb_pcm_prepare() will be called, and if we forcibly run
set_format() to set the interface and its endpoint, the capture
size will be normal again. This problem and workaound also apply to
playback.
To fix it in the kernel, add a quirk to let set_format() run
forcibly once after resume.
Signed-off-by: Hui Wang <hui.wang@canonical.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191218132650.6303-1-hui.wang@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
(backported from commit 92adc96f8eecd9522a907c197cc3d62e405539fe) Signed-off-by: Hui Wang <hui.wang@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
qede: Handle infinite driver spinning for Tx timestamp.
BugLink: https://bugs.launchpad.net/bugs/1855409
In PTP Tx implementation, driver kept scheduling a poll thread until the
timestamp is available. In the error scenarios (e.g. app requesting the
timestamp for non-ptp packet), this thread kept waiting for the timestamp
forever. This patch add changes to report such scenario as an error and
terminate the thread. Added a timeout of 2 seconds i.e., max time to wait
for Tx timestamp. Added a stat value ptp_skip_txts for reporting the number
of packets for which Tx timestamping is skipped.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Michal Kalderon <mkalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit 9adebac37e7d26c5cd73776a0279574afe3f410b)
[gpiccoli: context adjustments due to the following missing commits:
- 32d26a685c18: Added a link change statistics;
- 149d3775f108: Changed ancestor to access qede-flags;
- ccc67ef50b90: Added error recovery process.
Due to these, small conflicts arose demanding adjustments.] Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Xi Wang [Fri, 20 Dec 2019 08:41:00 +0000 (09:41 +0100)]
RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver
BugLink: https://launchpad.net/bugs/1853989
kasan will report a BUG when run command 'insmod hns_roce_hw_v2.ko', the
calltrace is as follows:
==================================================================
BUG: KASAN: slab-out-of-bounds in hns_roce_v2_init_eq_table+0x1324/0x1948
[hns_roce_hw_v2]
Read of size 8 at addr ffff8020e7a10608 by task insmod/256
Memory state around the buggy address: ffff8020e7a10500: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ffff8020e7a10580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >ffff8020e7a10600: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff8020e7a10680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8020e7a10700: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Disabling lock debugging due to kernel taint
Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Signed-off-by: Xi Wang <wangxi11@huawei.com> Link: https://lore.kernel.org/r/1565343666-73193-7-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit bf8c02f961c89e5ccae5987b7ab28f5592a35101) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Xi Wang [Fri, 20 Dec 2019 08:41:00 +0000 (09:41 +0100)]
RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver
BugLink: https://launchpad.net/bugs/1853989
kasan will report a BUG when run command 'rmmod hns_roce_hw_v2', the calltrace
is as follows:
==================================================================
BUG: KASAN: slab-out-of-bounds in hns_roce_table_mhop_put+0x584/0x828
[hns_roce]
Read of size 8 at addr ffff802185e08300 by task rmmod/270
Memory state around the buggy address: ffff802185e08200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff802185e08280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >ffff802185e08300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff802185e08380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff802185e08400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint
Fixes: a25d13cbe816 ("RDMA/hns: Add the interfaces to support multi hop addressing for the contexts in hip08") Signed-off-by: Xi Wang <wangxi11@huawei.com> Link: https://lore.kernel.org/r/1565343666-73193-6-git-send-email-oulijun@huawei.com Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 9bba3f0cbfc8abf2e1549ea03c0128186081d7a8) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Xi Wang [Fri, 20 Dec 2019 08:41:00 +0000 (09:41 +0100)]
RDMA/hns: Fixs hw access invalid dma memory error
BugLink: https://launchpad.net/bugs/1853990
When smmu is enable, if execute the perftest command and then use 'kill
-9' to exit, follow this operation repeatedly, the kernel will have a high
probability to print the following smmu event:
This is because the hw will periodically refresh the qpc cache until the
next reset.
This patch fixed it by removing the action that release qpc memory in the
'hns_roce_qp_free' function.
Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
(cherry picked from commit ec5bc2cc69b4fc494e04d10fc5226f6f9cf67c56) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Yonglong Liu [Wed, 1 Jan 2020 16:09:00 +0000 (17:09 +0100)]
net: hns: add support for vlan TSO
BugLink: https://launchpad.net/bugs/1853937
The hip07 chip support vlan TSO, this patch adds NETIF_F_TSO
and NETIF_F_TSO6 flags to vlan_features to improve the
performance after adding vlan to the net ports.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0d581ba311a27762fe1a14e5db5f65d225b3d844) Signed-off-by: Ike Panhc <ike.pan@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Saeed Mahameed [Tue, 10 Dec 2019 23:24:00 +0000 (00:24 +0100)]
net/mlx5e: Rx, Fix checksum calculation for new hardware
BugLink: https://bugs.launchpad.net/bugs/1854842
CQE checksum full mode in new HW, provides a full checksum of rx frame.
Covering bytes starting from eth protocol up to last byte in the received
frame (frame_size - ETH_HLEN), as expected by the stack.
Fixing up skb->csum by the driver is not required in such case. This fix
is to avoid wrong checksum calculation in drivers which already support
the new hardware with the new checksum mode.
Fixes: 85327a9c4150 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
(backported from commit db849faa9bef993a1379dc510623f750a72fa7ce)
[mruffell: minor context adjustments, offset recalculations] Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Saeed Mahameed [Tue, 10 Dec 2019 23:24:00 +0000 (00:24 +0100)]
net/mlx5e: Rx, Fixup skb checksum for packets with tail padding
BugLink: https://bugs.launchpad.net/bugs/1854842
When an ethernet frame with ip payload is padded, the padding octets are
not covered by the hardware checksum.
Prior to the cited commit, skb checksum was forced to be CHECKSUM_NONE
when padding is detected. After it, the kernel will try to trim the
padding bytes and subtract their checksum from skb->csum.
In this patch we fixup skb->csum for any ip packet with tail padding of
any size, if any padding found.
FCS case is just one special case of this general purpose patch, hence,
it is removed.
Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"), Cc: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
(backported from commit 0aa1d18615c163f92935b806dcaff9157645233a)
[mruffell: file change mlx5e_update_sw_counters, context adjustments] Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Hui Wang [Wed, 11 Dec 2019 07:52:00 +0000 (08:52 +0100)]
ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO
BugLink: https://bugs.launchpad.net/bugs/1855999
After applying the fixup ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, the
Line-out jack works well. And instead of adding a new set of pin
definition in the pin_fixup_tbl, we put a more generic matching entry
in the fallback_pin_fixup_tbl.
Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Link: https://lore.kernel.org/r/20191211051321.5883-1-hui.wang@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
(cherry picked from commit 5815bdfd7f54739be9abed1301d55f5e74d7ad1f) Signed-off-by: Hui Wang <hui.wang@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Po-Hsu Lin [Fri, 3 Jan 2020 10:52:00 +0000 (11:52 +0100)]
selftests/efivarfs: clean up test files from test_create*()
BugLink: https://bugs.launchpad.net/bugs/1809704
Test files created by test_create() and test_create_empty() tests will
stay in the $efivarfs_mount directory until the system was rebooted.
When the tester tries to run this efivarfs test again on the same
system, the immutable characteristics in that directory will cause some
"Operation not permitted" noises, and a false-positve test result as the
file was created in previous run.
--------------------
running test_create
--------------------
./efivarfs.sh: line 59: /sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27: Operation not permitted
[PASS]
--------------------
running test_create_empty
--------------------
./efivarfs.sh: line 78: /sys/firmware/efi/efivars/test_create_empty-210be57c-9849-4fc7-a635-e6382d1aec27: Operation not permitted
[PASS]
--------------------
Create a file_cleanup() to remove those test files in the end of each
test to solve this issue.
For the test_create_read, we can move the clean up task to the end of
the test to ensure the system is clean.
Also, use this function to replace the existing file removal code.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
(backported from commit dff6d2ae56d0056261e2f7b19c4b96b86b893cd8)
[PHLin: fuzzy adjustment] Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
selftests: efivarfs: return Kselftest Skip code for skipped tests
BugLink: https://bugs.launchpad.net/bugs/1809704
When efivarfs test is skipped because of unmet dependencies and/or
unsupported configuration, it returns 0 which is treated as a pass
by the Kselftest framework. This leads to false positive result even
when the test could not be run.
Change it to return kselftest skip code when a test gets skipped to
clearly report that the test could not be run.
Kselftest framework SKIP code is 4 and the framework prints appropriate
messages to indicate that the test is skipped.
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
(cherry picked from commit 1f0ea95853bd55a1812f2903b2f776853069f21f) Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When a port sends PLOGI, discovery state should be changed to login
pending, otherwise RELOGIN_NEEDED bit is set in
qla24xx_handle_plogi_done_event(). RELOGIN_NEEDED triggers another PLOGI,
and it never goes out of the loop until login timer expires.
Fixes: 8777e4314d397 ("scsi: qla2xxx: Migrate NVME N2N handling into state machine") Fixes: 8b5292bcfcacf ("scsi: qla2xxx: Fix Relogin to prevent modifying scan_state flag") Cc: Quinn Tran <qutran@marvell.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20191125165702.1013-6-r.bolshakov@yadro.com Acked-by: Himanshu Madhani <hmadhani@marvell.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Tested-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
No changeset entries are created for #address-cells and #size-cells
properties, but the duplicated properties are never freed. This
results in a memory leak which is detected by kmemleak:
strictly enforces the MACCTLR inizialization value - 39.3.1 - "Initial
Setting of PCI Express":
"Be sure to write the initial value (= H'80FF 0000) to MACCTLR before
enabling PCIETCTLR.CFINIT".
To avoid unexpected behavior and to match the SW initialization sequence
guidelines, this patch programs the MACCTLR with the correct value.
Note that the MACCTLR.SPCHG bit in the MACCTLR register description
reports that "Only writing 1 is valid and writing 0 is invalid" but this
"invalid" has to be interpreted as a write-ignore aka "ignored", not
"prohibited".
Reported-by: Eugeniu Rosca <erosca@de.adit-jv.com> Fixes: c25da4778803 ("PCI: rcar: Add Renesas R-Car PCIe driver") Fixes: be20bbcb0a8c ("PCI: rcar: Add the initialization of PCIe link in resume_noirq()") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: <stable@vger.kernel.org> # v5.2+ Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When a secondary CPU is brought up it must initialize its control
registers. CPU A which triggers that a secondary CPU B is brought up
stores its control register contents into the lowcore of new CPU B,
which then loads these values on startup.
This is problematic in various ways: the control register which
contains the home space ASCE will correctly contain the kernel ASCE;
however control registers for primary and secondary ASCEs are
initialized with whatever values were present in CPU A.
Typically:
- the primary ASCE will contain the user process ASCE of the process
that triggered onlining of CPU B.
- the secondary ASCE will contain the percpu VDSO ASCE of CPU A.
Due to lazy ASCE handling we may also end up with other combinations.
When then CPU B switches to a different process (!= idle) it will
fixup the primary ASCE. However the problem is that the (wrong) ASCE
from CPU A was loaded into control register 1: as soon as an ASCE is
attached (aka loaded) a CPU is free to generate TLB entries using that
address space.
Even though it is very unlikey that CPU B will actually generate such
entries, this could result in TLB entries of the address space of the
process that ran on CPU A. These entries shouldn't exist at all and
could cause problems later on.
Furthermore the secondary ASCE of CPU B will not be updated correctly.
This means that processes may see wrong results or even crash if they
access VDSO data on CPU B. The correct VDSO ASCE will eventually be
loaded on return to user space as soon as the kernel executed a call
to strnlen_user or an atomic futex operation on CPU B.
Fix both issues by intializing the to be loaded control register
contents with the correct ASCEs and also enforce (re-)loading of the
ASCEs upon first context switch and return to user space.
Fixes: 0aaba41b58bc ("s390: remove all code using the access register mode") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Andreas Grünbacher reports that on the two filesystems that support
iomap directio, it's possible for splice() to return -EAGAIN (instead of
a short splice) if the pipe being written to has less space available in
its pipe buffers than the length supplied by the calling process.
Months ago we fixed splice_direct_to_actor to clamp the length of the
read request to the size of the splice pipe. Do the same to do_splice.
Fixes: 17614445576b6 ("splice: don't read more than available pipe space") Reported-by: syzbot+3c01db6025f26530cf8d@syzkaller.appspotmail.com Reported-by: Andreas Grünbacher <andreas.gruenbacher@gmail.com> Reviewed-by: Andreas Grünbacher <andreas.gruenbacher@gmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When setting the time in the future with the uie timer enabled,
rtc_timer_do_work will loop for a while because the expiration of the uie
timer was way before the current RTC time and a new timer will be enqueued
until the current rtc time is reached.
If the uie timer is enabled, disable it before setting the time and enable
it after expiring current timers (which may actually be an alarm).
This is the safest thing to do to ensure the uie timer is still
synchronized with the RTC, especially in the UIE emulation case.
Reported-by: syzbot+08116743f8ad6f9a6de7@syzkaller.appspotmail.com Fixes: 6610e0893b8b ("RTC: Rework RTC code to use timerqueue for events") Link: https://lore.kernel.org/r/20191020231320.8191-1-alexandre.belloni@bootlin.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When building with CONFIG_MFD_88PM800 and CONFIG_REGULATOR_88PM800
enabled as loadable modules, we see the following warning:
warning: same module names found:
drivers/regulator/88pm800.ko
drivers/mfd/88pm800.ko
Rework so that the file is named 88pm800-regulator.
Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The SAS controller cannot support a programmed minimum linkrate of > 1.5G
(it will always negotiate to 1.5G at least), so just reject it.
This solves a strange situation where the PHY negotiated linkrate may be
less than the programmed minimum linkrate.
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Send primitive NOTIFY to SSP situation only, or it causes underflow issue
when sending IO. Also rename hisi_sas_hw.sl_notify() to hisi_sas_hw.
sl_notify_ssp().
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In hnae3_register_ae_dev(), ae_algo->ops is assigned to ae_dev->ops
before check that ae_algo->ops is valid.
And in hnae3_register_ae_algo(), missing check for ae_algo->ops.
This patch fixes them.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
hnae3_register_ae_dev() may fail, and it should return a error code
to its caller, so change hnae3_register_ae_dev() return type to int.
Also, when hnae3_register_ae_dev() return error, hns3_probe() should
do some error handling and return the error code.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When unload hns3 driver, we should clear the pci private data.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Driver missed classifying the chip type for G7 when reporting supported
topologies. This resulted in loop being shown as supported on FC links that
are not supported per the standard.
Add the chip classifications to the topology checks in the driver.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In exynos_eint_wkup_init() the for_each_child_of_node() loop is used
with a break to find a matching child node. Although each iteration of
for_each_child_of_node puts the previous node, but early exit from loop
misses it. This leads to leak of device node.
Cc: <stable@vger.kernel.org> Fixes: 43b169db1841 ("pinctrl: add exynos4210 specific extensions for samsung pinctrl driver") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
As explained in the following commit a9a1a4833613 ("pinctrl:
armada-37xx: Fix gpio interrupt setup") the armada_37xx_irq_set_type()
function can be called before the initialization of the mask field.
That means that we can't use this field in this function and need to
workaround it using hwirq.
Fixes: 30ac0d3b0702 ("pinctrl: armada-37xx: Add edge both type gpio irq support") Cc: stable@vger.kernel.org Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Link: https://lore.kernel.org/r/20191115155752.2562-1-gregory.clement@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If pers->make_request fails in md_flush_request(), the bio is lost. To
fix this, pass back a bool to indicate if the original make_request call
should continue to handle the I/O and instead of assuming the flush logic
will push it to completion.
Convert md_flush_request to return a bool and no longer calls the raid
driver's make_request function. If the return is true, then the md flush
logic has or will complete the bio and the md make_request call is done.
If false, then the md make_request function needs to keep processing like
it is a normal bio. Let the original call to md_handle_request handle any
need to retry sending the bio to the raid driver's make_request function
should it be needed.
Also mark md_flush_request and the make_request function pointer as
__must_check to issue warnings should these critical return values be
ignored.
Fixes: 2bc13b83e629 ("md: batch flush requests.") Cc: stable@vger.kernel.org # # v4.19+ Cc: NeilBrown <neilb@suse.com> Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
During a cyclic writeback, extent_write_cache_pages() uses done_index
to update the writeback_index after the current run is over. However,
instead of current index + 1, it gets to to the current index itself.
Unfortunately, this, combined with returning on EOF instead of looping
back, can lead to the following pathlogical behavior.
1. There is a single file which has accumulated enough dirty pages to
trigger balance_dirty_pages() and the writer appending to the file
with a series of short writes.
2. balance_dirty_pages kicks in, wakes up background writeback and sleeps.
3. Writeback kicks in and the cursor is on the last page of the dirty
file. Writeback is started or skipped if already in progress. As
it's EOF, extent_write_cache_pages() returns and the cursor is set
to done_index which is pointing to the last page.
4. Writeback is done. Nothing happens till balance_dirty_pages
finishes, at which point we go back to #1.
This can almost completely stall out writing back of the file and keep
the system over dirty threshold for a long time which can mess up the
whole system. We encountered this issue in production with a package
handling application which can reliably reproduce the issue when
running under tight memory limits.
Reading the comment in the error handling section, this seems to be to
avoid accidentally skipping a page in case the write attempt on the
page doesn't succeed. However, this concern seems bogus.
On each page, the code either:
* Skips and moves onto the next page.
* Fails issue and sets done_index to index + 1.
* Successfully issues and continue to the next page if budget allows
and not EOF.
IOW, as long as it's not EOF and there's budget, the code never
retries writing back the same page. Only when a page happens to be
the last page of a particular run, we end up retrying the page, which
can't possibly guarantee anything data integrity related. Besides,
cyclic writes are only used for non-syncing writebacks meaning that
there's no data integrity implication to begin with.
Fix it by always setting done_index past the current page being
processed.
Note that this problem exists in other writepages too.
CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In the fixup worker, if we fail to mark the range as delalloc in the io
tree, we must release the previously reserved metadata, as well as update
the outstanding extents counter for the inode, otherwise we leak metadata
space.
In pratice we can't return an error from btrfs_set_extent_delalloc(),
which is just a wrapper around __set_extent_bit(), as for most errors
__set_extent_bit() does a BUG_ON() (or panics which hits a BUG_ON() as
well) and returning an -EEXIST error doesn't happen in this case since
the exclusive bits parameter always has a value of 0 through this code
path. Nevertheless, just fix the error handling in the fixup worker,
in case one day __set_extent_bit() can return an error to this code
path.
Fixes: f3038ee3a3f101 ("btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
xfs_prepare_shift() fails to check the error return from
xfs_flush_unmap_range(). If the latter fails, that could lead to an
insert/collapse range operation over a delalloc range, which is not
supported.
Add an error check and return appropriately. This is reproduced
rarely by generic/475.
Fixes: 7f9f71be84bc ("xfs: extent shifting doesn't fully invalidate page cache") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In commit 4721a601099, we tried to fix a problem wherein directio reads
into a splice pipe will bounce EFAULT/EAGAIN all the way out to
userspace by simulating a zero-byte short read. This happens because
some directio read implementations (xfs) will call
bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
reads, but as soon as we run out of pipe buffers that _get_pages call
returns EFAULT, which the splice code translates to EAGAIN and bounces
out to userspace.
In that commit, the iomap code catches the EFAULT and simulates a
zero-byte read, but that causes assertion errors on regular splice reads
because xfs doesn't allow short directio reads. This causes infinite
splice() loops and assertion failures on generic/095 on overlayfs
because xfs only permit total success or total failure of a directio
operation. The underlying issue in the pipe splice code has now been
fixed by changing the pipe splice loop to avoid avoid reading more data
than there is space in the pipe.
Therefore, it's no longer necessary to simulate the short directio, so
remove the hack from iomap.
Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill") Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Ranted-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In commit 4721a601099, we tried to fix a problem wherein directio reads
into a splice pipe will bounce EFAULT/EAGAIN all the way out to
userspace by simulating a zero-byte short read. This happens because
some directio read implementations (xfs) will call
bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
reads, but as soon as we run out of pipe buffers that _get_pages call
returns EFAULT, which the splice code translates to EAGAIN and bounces
out to userspace.
In that commit, the iomap code catches the EFAULT and simulates a
zero-byte read, but that causes assertion errors on regular splice reads
because xfs doesn't allow short directio reads.
The brokenness is compounded by splice_direct_to_actor immediately
bailing on do_splice_to returning <= 0 without ever calling ->actor
(which empties out the pipe), so if userspace calls back we'll EFAULT
again on the full pipe, and nothing ever gets copied.
Therefore, teach splice_direct_to_actor to clamp its requests to the
amount of free space in the pipe and remove the simulated short read
from the iomap directio code.
Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill") Reported-by: Murphy Zhou <jencce.kernel@gmail.com> Ranted-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The 'len' returned by grab_bb() includes an extra MAXINSN bytes to allow
for the last instruction, so the the final 'offs' will not be 'len'.
Fix the error condition logic accordingly.
Before:
$ perf record -e '{intel_pt//,cpu/mem_inst_retired.all_loads,aux-sample-size=8192/pp}:u' grep -rqs jhgjhg /boot
[ perf record: Woken up 19 times to write data ]
[ perf record: Captured and wrote 2.274 MB perf.data ]
$ perf script -F +brstackinsn --xed --itrace=i1usl100 | head
grep 13759 [002] 8091.310257: 1862 instructions:uH: 5641d58069eb bmexec+0x86b (/bin/grep)
bmexec+2485: 00005641d5806b35 jnz 0x5641d5806bd0 # MISPRED 00005641d5806bd0 movzxb (%r13,%rdx,1), %eax 00005641d5806bd6 add %rdi, %rax 00005641d5806bd9 movzxb -0x1(%rax), %edx 00005641d5806bdd cmp %rax, %r14 00005641d5806be0 jnb 0x5641d58069c0 # MISPRED
mismatch of LBR data and executable 00005641d58069c0 movzxb (%r13,%rdx,1), %edi
binder_alloc_print_pages() iterates over
alloc->pages[0..alloc->buffer_size-1] under alloc->mutex.
binder_alloc_mmap_handler() writes alloc->pages and alloc->buffer_size
without holding that lock, and even writes them before the last bailout
point.
Unfortunately we can't take the alloc->mutex in the ->mmap() handler
because mmap_sem can be taken while alloc->mutex is held.
So instead, we have to locklessly check whether the binder_alloc has been
fully initialized with binder_alloc_get_vma(), like in
binder_alloc_new_buf_locked().
Fixes: 8ef4665aa129 ("android: binder: Add page usage in binder stats") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20191018205631.248274-1-jannh@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
commit 394a9e044702 ("crypto: cfb - add missing 'chunksize' property")
adds a test vector where the input length is smaller than the IV length
(the second test vector). This revealed a NULL pointer dereference in
the atmel-aes driver, that is caused by passing an incorrect offset in
scatterwalk_map_and_copy() when atmel_aes_complete() is called.
Do not save the IV in req->info of ablkcipher_request (or equivalently
req->iv of skcipher_request) when req->nbytes < ivsize, because the IV
will not be further used.
While touching the code, modify the type of ivsize from int to
unsigned int, to comply with the return type of
crypto_ablkcipher_ivsize().
Fixes: 91308019ecb4 ("crypto: atmel-aes - properly set IV after {en,de}crypt") Cc: <stable@vger.kernel.org> Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The job of vmalloc_sync_all() is to help the lazy freeing of vmalloc()
ranges: before such vmap ranges are reused we make sure that they are
unmapped from every task's page tables.
This is really easy on pagetable setups where the kernel page tables
are shared between all tasks - this is the case on 32-bit kernels
with SHARED_KERNEL_PMD = 1.
But on !SHARED_KERNEL_PMD 32-bit kernels this involves iterating
over the pgd_list and clearing all pmd entries in the pgds that
are cleared in the init_mm.pgd, which is the reference pagetable
that the vmalloc() code uses.
In that context the current practice of vmalloc_sync_all() iterating
until FIX_ADDR_TOP is buggy:
This is mostly harmless for the FIX_ADDR and CPU_ENTRY_AREA ranges
that don't clear their pmds, but it's lethal for the LDT range,
which relies on having different mappings in different processes,
and 'synchronizing' them in the vmalloc sense corrupts those
pagetable entries (clearing them).
This got particularly prominent with PTI, which turns SHARED_KERNEL_PMD
off and makes this the dominant mapping mode on 32-bit.
To make LDT working again vmalloc_sync_all() must only iterate over
the volatile parts of the kernel address range that are identical
between all processes.
So the correct check in vmalloc_sync_all() is "address < VMALLOC_END"
to make sure the VMALLOC areas are synchronized and the LDT
mapping is not falsely overwritten.
The CPU_ENTRY_AREA and the FIXMAP area are no longer synced either,
but this is not really a proplem since their PMDs get established
during bootup and never change.
This change fixes the ldt_gdt selftest in my setup.
[ mingo: Fixed up the changelog to explain the logic and modified the
copying to only happen up until VMALLOC_END. ]
The driver tries to figure out which state a SD clock is in when the
clock is registered, instead of setting a known state. This can be
problematic for two reasons.
1. If the clock driver can't figure out the state of the clock,
registration of the clock fails, and setting of a known state by a
clock user is not possible.
2. The state of the clock depends on if and how the bootloader
configured it. The driver only checks that the rate is known, not if
the clock is stopped or not for example.
Fix this by setting a known state and making sure the clock is stopped.
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The streaming object is a key part of handling the UVC device. Although
not critical, we are currently missing a call to destroy the mutex on
clean up paths, and we are due to extend the objects complexity in the
near future.
Facilitate easy management of a stream object by creating a pair of
functions to handle creating and destroying the allocation. The new
uvc_stream_delete() function also performs the missing mutex_destroy()
operation.
Previously a failed streaming object allocation would cause
uvc_parse_streaming() to return -EINVAL, which is inappropriate. If the
constructor failes, we will instead return -ENOMEM.
While we're here, fix the trivial spelling error in the function banner
of uvc_delete().
Unlike the other PLLs on Meson8b the N value "vid_pll_dco" (a better
name would be hdmi_pll_dco or - as the datasheet calls it - HPLL) is
located at HHI_VID_PLL_CNTL[14:10] instead of [13:9].
This results in an incorrect calculation of the rate of this PLL because
the value seen by the kernel is double the actual N (divider) value.
Update the offset of the N value to fix the calculation of the PLL rate.
Fixes: 28b9fcd016126e ("clk: meson8b: Add support for Meson8b clocks") Reported-by: Jianxin Pan <jianxin.pan@amlogic.com> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Link: https://lkml.kernel.org/r/20181202214220.7715-2-martin.blumenstingl@googlemail.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Jean-Louis Dupond reported poor iscsi TCP receive performance
that we tracked to backlog drops.
Apparently we fail to send window updates reflecting the
fact that we are under stress.
Note that we might lack a proper window increase when
backlog is fully processed, since __release_sock() clears
sk->sk_backlog.len _after_ all skbs have been processed.
This should not matter in practice. If we had a significant
load through socket backlog, we are in a dangerous
situation.
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Tested-by: Jean-Louis Dupond<jean-louis@dupond.be> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This function is called from driver probe, which isn't the same as
__init code because driver probe can happen later. Drop the __init
marking here to fix this potential problem.
Cc: Sean Wang <sean.wang@mediatek.com> Cc: Ryder Lee <ryder.lee@mediatek.com> Cc: Rob Herring <robh@kernel.org> Cc: Wenzhen Yu <wenzhen.yu@mediatek.com> Cc: Weiyi Lu <weiyi.lu@mediatek.com> Fixes: 2fc0a509e4ee ("clk: mediatek: add clock support for MT7622 SoC") Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This function is used from more places than just __init code. Removing
__init silences a section mismatch warning here.
Cc: Sean Wang <sean.wang@mediatek.com> Cc: Ryder Lee <ryder.lee@mediatek.com> Cc: Rob Herring <robh@kernel.org> Cc: Wenzhen Yu <wenzhen.yu@mediatek.com> Cc: Weiyi Lu <weiyi.lu@mediatek.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, so make sure the implementation in
this driver has returns 'netdev_tx_t' value, and change the function
return type to netdev_tx_t.
Found by coccinelle.
Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
DTC will emit a warning on our OPPs nodes for the common DTSI between the
A23 and A33 since those nodes use the frequency as unit addresses, but
don't have a matching reg property.
Fix this by moving the frequency to the node name instead.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Our HDMI output endpoint on the A10s DTSI has a warning under DTC: "graph
node has single child node 'endpoint', #address-cells/#size-cells are not
necessary". Fix this by removing those properties.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Our HDMI output endpoint on the A10 DTSI has a warning under DTC: "graph
node has single child node 'endpoint', #address-cells/#size-cells are not
necessary". Fix this by removing those properties.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The number of syncpoints on Tegra186 is 576 and therefore no longer fits
into 8 bits. Increase the size of the syncpoint ID field to 10 in order
to accomodate all syncpoints.
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Previously, we only account preflush command for flush_merge mode,
so for noflush_merge mode, we can not know in-flight preflush
command count, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Some ptrace selftests are passing input operands using a constraint that
can allocate any register for the operand, and using these registers on
load/store operations.
If the register allocated by the compiler happens to be zero (r0), it might
cause an invalid memory address access, since load and store operations
consider the content of 0x0 address if the base register is r0, instead of
the content of the r0 register. For example:
r1 := 0xdeadbeef
r0 := 0xdeadbeef
ld r2, 0(1) /* will load into r2 the content of r1 address */
ld r2, 0(0) /* will load into r2 the content of 0x0 */
In order to avoid this possible problem, the inline assembly constraint
should be aware that these registers will be used as a base register, thus,
r0 should not be allocated.
Other than that, this patch removes inline assembly operands that are not
used by the tests.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When we add a new IPv6 address, we should also join corresponding solicited-node
multicast address, unless the interface has IFF_NOARP flag, as function
addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do
not do dad and add the mcast address. So we will drop corresponding neighbour
discovery message that came from other nodes.
A typical example is after creating a ipvlan with mode l3, setting up an ipv6
address and changing the mode to l2. Then we will not be able to ping this
address as the interface doesn't join related solicited-node mcast address.
Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add
corresponding mcast group and check if there is a duplicate address on the
network.
Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Since only full-duplex operation is supported by the
hardware, remove duplex handling code and keep the
register setting of ECMR.DM fixed at 1.
This updates the driver implementation to follow the
data sheet text "This bit should always be set to 1."
Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When doing direct IO to a pipe for do_splice_direct(), then pipe is
trivial to fill up and overflow as it can only hold 16 pages. At
this point bio_iov_iter_get_pages() then returns -EFAULT, and we
abort the IO submission process. Unfortunately, iomap_dio_rw()
propagates the error back up the stack.
The error is converted from the EFAULT to EAGAIN in
generic_file_splice_read() to tell the splice layers that the pipe
is full. do_splice_direct() completely fails to handle EAGAIN errors
(it aborts on error) and returns EAGAIN to the caller.
copy_file_write() then completely fails to handle EAGAIN as well,
and so returns EAGAIN to userspace, having failed to copy the data
it was asked to.
Avoid this whole steaming pile of fail by having iomap_dio_rw()
silently swallow EFAULT errors and so do short reads.
To make matters worse, iomap_dio_actor() has a stale data exposure
bug bio_iov_iter_get_pages() fails - it does not zero the tail block
that it may have been left uncovered by partial IO. Fix the error
handling case to drop to the sub-block zeroing rather than
immmediately returning the -EFAULT error.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The extent shifting code uses a flush and invalidate mechainsm prior
to shifting extents around. This is similar to what
xfs_free_file_space() does, but it doesn't take into account things
like page cache vs block size differences, and it will fail if there
is a page that it currently busy.
xfs_flush_unmap_range() handles all of these cases, so just convert
xfs_prepare_shift() to us that mechanism rather than having it's own
special sauce.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Now sctp increases sk_wmem_alloc by 1 when doing set_owner_w for the
skb allocked in sctp_packet_transmit and decreases by 1 when freeing
this skb.
But when this skb goes through networking stack, some subcomponents
might change skb->truesize and add the same amount on sk_wmem_alloc.
However sctp doesn't know the amount to decrease by, it would cause
a leak on sk->sk_wmem_alloc and the sock can never be freed.
Xiumei found this issue when it hit esp_output_head() by using sctp
over ipsec, where skb->truesize is added and so is sk->sk_wmem_alloc.
Since sctp has used sk_wmem_queued to count for writable space since
Commit cd305c74b0f8 ("sctp: use sk_wmem_queued to check for writable
space"), it's ok to fix it by counting sk_wmem_alloc by skb truesize
in sctp_packet_transmit.
Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
From Odroid XU3/XU4/HC1 schematics the LDO13 regulator for SD2, can be
set on 1.8V or 2.8V so the minimal value should be fixed to 1.8V. This
is necessary to support UHS-I tuning (otherwise card won't be detected
during boot).
Signed-off-by: Anand Moon <linux.amoon@gmail.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If IOC was already enabled (due to bootloader) it technically needs to
be reconfigured with aperture base,size corresponding to Linux memory map
which will certainly be different than uboot's. But disabling and
reenabling IOC when DMA might be potentially active is tricky business.
To avoid random memory issues later, just panic here and ask user to
upgrade bootloader to one which doesn't enable IOC
This was actually seen as issue on some of the HSDK board with a version
of uboot which enabled IOC. There were random issues later with starting
of X or peripherals etc.
Also while I'm at it, replace hardcoded bits in ARC_REG_IO_COH_PARTIAL
and ARC_REG_IO_COH_ENABLE registers by definitions.
Its possible to set both HANDLE and POSITION when replacing a rule.
In this case, the rule at POSITION gets replaced using the
userspace-provided handle. Rule handles are supposed to be generated
by the kernel only.
Duplicate handles should be harmless, however better disable this "feature"
by only checking for the POSITION attribute on insert operations.
Fixes: 5e94846686d0 ("netfilter: nf_tables: add insert operation") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Currently chunk hash key (which is in fact pointer to the inode) is
derived as chunk->mark.conn->obj. It is tricky to make this dereference
reliable for hash table lookups only under RCU as mark can get detached
from the connector and connector gets freed independently of the
running lookup. Thus there is a possible use after free / NULL ptr
dereference issue:
This race is probably impossible to hit in practice as the race window
on CPU1 is very narrow and CPU2 has a lot of code to execute. Still it's
better to have this fixed. Since the inode the chunk is attached to is
constant during chunk's lifetime it is easy to cache the key in the
chunk itself and thus avoid these issues.
Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>