Netanel Belgazal [Fri, 23 Jun 2017 08:21:54 +0000 (11:21 +0300)]
net: ena: add support for out of order rx buffers refill
BugLink: http://bugs.launchpad.net/bugs/1701575
ENA driver post Rx buffers through the Rx submission queue
for the ENA device to fill them with receive packets.
Each Rx buffer is marked with req_id in the Rx descriptor.
Newer ENA devices could consume the posted Rx buffer in out of order,
and as result the corresponding Rx completion queue will have Rx
completion descriptors with non contiguous req_id(s)
In this change the driver holds two rings.
The first ring (called free_rx_ids) is a mapping ring.
It holds all the unused request ids.
The values in this ring are from 0 to ring_size -1.
When the driver wants to allocate a new Rx buffer it uses the head of
free_rx_ids and uses it's value as the index for rx_buffer_info ring.
The req_id is also written to the Rx descriptor
Upon Rx completion,
The driver took the req_id from the completion descriptor and uses it
as index in rx_buffer_info.
The req_id is then return to the free_rx_ids ring.
This patch also adds statistics to inform when the driver receive out
of range or unused req_id.
Note:
free_rx_ids is only accessible from the napi handler, so no locking is
required
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ad974baef2a17a170fe837ad19f10dcab63e9470 net-next) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
Netanel Belgazal [Fri, 23 Jun 2017 08:21:51 +0000 (11:21 +0300)]
net: ena: add hardware hints capability to the driver
BugLink: http://bugs.launchpad.net/bugs/1701575
With this patch, ENA device can update the ena driver about
the desired timeout values:
These values are part of the "hardware hints" which are transmitted
to the driver as Asynchronous event through ENA async
event notification queue.
In case the ENA device does not support this capability,
the driver will use its own default values.
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 82ef30f13be0b4fea70a9c6215f7cff40bb3be63 net-next) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
Netanel Belgazal [Sun, 11 Jun 2017 12:42:50 +0000 (15:42 +0300)]
net: ena: bug fix in lost tx packets detection mechanism
BugLink: http://bugs.launchpad.net/bugs/1701575
check_for_missing_tx_completions() is called from a timer
task and looking for lost tx packets.
The old implementation accumulate all the lost tx packets
and did not check if those packets were retrieved on a later stage.
This cause to a situation where the driver reset
the device for no reason.
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 800c55cb76be6617232ef50a2be29830f3aa8e5c) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
Netanel Belgazal [Sun, 11 Jun 2017 12:42:48 +0000 (15:42 +0300)]
net: ena: fix theoretical Rx hang on low memory systems
BugLink: http://bugs.launchpad.net/bugs/1701575
For the rare case where the device runs out of free rx buffer
descriptors (in case of pressure on kernel memory),
and the napi handler continuously fail to refill new Rx descriptors
until device rx queue totally runs out of all free rx buffers
to post incoming packet, leading to a deadlock:
* The device won't send interrupts since all the new
Rx packets will be dropped.
* The napi handler won't try to allocate new Rx descriptors
since allocation is part of NAPI that's not being invoked any more
The fix involves detecting this scenario and rescheduling NAPI
(to refill buffers) by the keepalive/watchdog task.
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a3af7c18cfe545a711e5df7491b7d6df71eba2ff) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
Netanel Belgazal [Sun, 11 Jun 2017 12:42:46 +0000 (15:42 +0300)]
net: ena: fix race condition between submit and completion admin command
BugLink: http://bugs.launchpad.net/bugs/1701575
Bug:
"Completion context is occupied" error printout will be noticed in
dmesg.
This error will cause the admin command to fail, which will lead to
an ena_probe() failure or a watchdog reset (depends on which admin
command failed).
Root cause:
__ena_com_submit_admin_cmd() is the function that submits new entries to
the admin queue.
The function have a check that makes sure the queue is not full and the
function does not override any outstanding command.
It uses head and tail indexes for this check.
The head is increased by ena_com_handle_admin_completion() which runs
from interrupt context, and the tail index is increased by the submit
function (the function is running under ->q_lock, so there is no risk
of multithread increment).
Each command is associated with a completion context. This context
allocated before call to __ena_com_submit_admin_cmd() and freed by
ena_com_wait_and_process_admin_cq_interrupts(), right after the command
was completed.
This can lead to a state where the head was increased, the check passed,
but the completion context is still in use.
Solution:
Use the atomic variable ->outstanding_cmds instead of using the head and
the tail indexes.
This variable is safe for use since it is bumped in get_comp_ctx() in
__ena_com_submit_admin_cmd() and is freed by comp_ctxt_release()
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 661d2b0ccef6a63f48b61105cf7be17403d1db01) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
check_for_completion()
sleep()
}
So in case the sleep took more than the timeout
(in case the thread/workqueue was not scheduled due to higher priority
task or prolonged VMexit), the driver can detect a stall even if
the completion is present.
The fix changes the order of this function to first check for
completion and only after that check if the timeout expired.
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a77c1aafcc906f657d1a0890c1d898be9ee1d5c9) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
Lino Sanfilippo [Sat, 18 Feb 2017 11:19:41 +0000 (12:19 +0100)]
net: ena: remove superfluous check in ena_remove()
BugLink: http://bugs.launchpad.net/bugs/1701575
The check in ena_remove() for the pci driver data not being NULL is not
needed, since it is always set in the probe() function. Remove the
superfluous check.
Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 90a6c997bd8bb836809eda5efb6406d8c58c0924) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com>
mm/mmap.c: expand_downwards: don't require the gap if !vm_prev
expand_stack(vma) fails if address < stack_guard_gap even if there is no
vma->vm_prev. I don't think this makes sense, and we didn't do this
before the recent commit 1be7107fbe18 ("mm: larger stack guard gap,
between vmas").
We do not need a gap in this case, any address is fine as long as
security_mmap_addr() doesn't object.
This also simplifies the code, we know that address >= prev->vm_end and
thus underflow is not possible.
Link: http://lkml.kernel.org/r/20170628175258.GA24881@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CVE-2017-1000364
(cherry picked from commit 32e4e6d5cbb0c0e427391635991fe65e17797af8) Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Michal Hocko [Mon, 17 Jul 2017 12:53:28 +0000 (14:53 +0200)]
mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
Commit 1be7107fbe18 ("mm: larger stack guard gap, between vmas") has
introduced a regression in some rust and Java environments which are
trying to implement their own stack guard page. They are punching a new
MAP_FIXED mapping inside the existing stack Vma.
This will confuse expand_{downwards,upwards} into thinking that the
stack expansion would in fact get us too close to an existing non-stack
vma which is a correct behavior wrt safety. It is a real regression on
the other hand.
Let's work around the problem by considering PROT_NONE mapping as a part
of the stack. This is a gros hack but overflowing to such a mapping
would trap anyway an we only can hope that usespace knows what it is
doing and handle it propely.
Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas") Link: http://lkml.kernel.org/r/20170705182849.GA18027@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Debugged-by: Vlastimil Babka <vbabka@suse.cz> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Willy Tarreau <w@1wt.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CVE-2017-1000364
(cherry picked from commit 561b5e0709e4a248c67d024d4d94b6e31e3edf2f) Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Pavel Shilovsky [Mon, 17 Jul 2017 19:29:12 +0000 (16:29 -0300)]
CIFS: Fix null pointer deref during read resp processing
BugLink: https://bugs.launchpad.net/bugs/1704857
Currently during receiving a read response mid->resp_buf can be
NULL when it is being passed to cifs_discard_remaining_data() from
cifs_readv_discard(). Fix it by always passing server->smallbuf
instead and initializing mid->resp_buf at the end of read response
processing.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> Acked-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
(cherry picked from commit 350be257ea83029daee974c72b1fe2e6f1f8e615) Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
hv_utils: fix TimeSync work on pre-TimeSync-v4 hosts
BugLink: http://bugs.launchpad.net/bugs/1676635
It was found that ICTIMESYNCFLAG_SYNC packets are handled incorrectly
on WS2012R2, e.g. after the guest is paused and resumed its time is set
to something different from host's time. The problem is that we call
adj_guesttime() with reftime=0 for these old hosts and we don't account
for that in 'if (adj_flags & ICTIMESYNCFLAG_SYNC)' branch and
hv_set_host_time().
While we could've solved this by adding a check like
'if (ts_srv_version > TS_VERSION_3)' to hv_set_host_time() I prefer
to do some refactoring. We don't need to have two separate containers
for host samples, struct host_ts which we use for PTP is enough.
Throw away 'struct adj_time_work' and create hv_get_adj_host_time()
accessor to host_ts to avoid code duplication.
Fixes: 3716a49a81ba ("hv_utils: implement Hyper-V PTP source") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from linux-next commit 1d10602d306cb7f70545b5e1166efc9409e7d384) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
hv_utils: drop .getcrosststamp() support from PTP driver
BugLink: http://bugs.launchpad.net/bugs/1676635
Turns out that our implementation of .getcrosststamp() never actually
worked. Hyper-V is sending time samples every 5 seconds and this is
too much for get_device_system_crosststamp() as it's interpolation
algorithm (which nobody is currently using in kernel, btw) accounts
for a 'slow' device but we're not slow in Hyper-V, our time reference
is too far away.
.getcrosststamp() is not currently used, get_device_system_crosststamp()
almost always returns -EINVAL and client falls back to using PTP_SYS_OFFSET
so this patch doesn't change much. I also tried doing interpolation
manually (e.g. the same way hv_ptp_gettime() works and it turns out that
we're getting even lower quality:
This fact wasn't noticed during the initial testing of the PTP device
somehow but got revealed now. Let's just drop .getcrosststamp()
implementation for now as it doesn't seem to be suitable for us.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from linux-next commit 4f9bac039a64f6306b613a0d90e6b7e75d7ab0c4) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1676635
With TimeSync version 4 protocol support we started updating system time
continuously through the whole lifetime of Hyper-V guests. Every 5 seconds
there is a time sample from the host which triggers do_settimeofday[64]().
While the time from the host is very accurate such adjustments may cause
issues:
- Time is jumping forward and backward, some applications may misbehave.
- In case an NTP server runs in parallel and uses something else for time
sync (network, PTP,...) system time will never converge.
- Systemd starts annoying you by printing "Time has been changed" every 5
seconds to the system log.
Instead of doing in-kernel time adjustments offload the work to an
NTP client by exposing TimeSync messages as a PTP device. Users may now
decide what they want to use as a source.
I tested the solution with chrony, the config was:
refclock PHC /dev/ptp0 poll 3 dpoll -2 offset 0
The result I'm seeing is accurate enough, the time delta between the guest
and the host is almost always within [-10us, +10us], the in-kernel solution
was giving us comparable results.
I also tried implementing PPS device instead of PTP by using not currently
used Hyper-V synthetic timers (we use only one of four for clockevent) but
with PPS source only chrony wasn't able to give me the required accuracy,
the delta often more that 100us.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3716a49a81ba19dda7202633a68b28564ba95eb5) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Drivers: hv: util: Use hv_get_current_tick() to get current tick
BugLink: http://bugs.launchpad.net/bugs/1676635
As part of the effort to interact with Hyper-V in an instruction set
architecture independent way, use the new API to get the current
tick.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 305f7549c9298247723c255baddb7a54b4e63050) Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
UBUNTU: SAUCE: hv: make clocksource available for PTP device supporting
BugLink: http://bugs.launchpad.net/bugs/1676635
Make the MSR and TSC clock sources available during the hv_init() and
export them via hyperv_cs in a similar way as the following upstream
commit does:
commit dee863b571b0 ("hv: export current Hyper-V clocksource")
Signed-off-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Acked-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
For AMD Promontory xHCI host, although you can disable USB 2.0 ports in BIOS
settings, those ports will be enabled anyway after you remove a device on
that port and re-plug it in again. It's a known limitation of the chip.
As a workaround we can clear the PORT_WAKE_BITS.
Signed-off-by: Jiahau Chang <Lars_Chang@asmedia.com.tw> Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Shih-Yuan Lee (FourDollars) <sylee@canonical.com> Suggested-by: Owen Lin <olin@rivetnetworks.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
(backported from commit 06e41d8a36689f465006f017bbcd8a73edb98109 linux-next) Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Sachin Prabhu [Fri, 3 Mar 2017 23:41:38 +0000 (15:41 -0800)]
Handle mismatched open calls
BugLink: http://bugs.launchpad.net/bugs/1670508
A signal can interrupt a SendReceive call which result in incoming
responses to the call being ignored. This is a problem for calls such as
open which results in the successful response being ignored. This
results in an open file resource on the server.
The patch looks into responses which were cancelled after being sent and
in case of successful open closes the open fids.
For this patch, the check is only done in SendReceive2()
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Cc: Stable <stable@vger.kernel.org>
(cherry-picked from commit 38bd49064a1ecb67baad33598e3d824448ab11ec) Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Sachin Prabhu [Thu, 20 Oct 2016 23:52:24 +0000 (19:52 -0400)]
Call echo service immediately after socket reconnect
BugLink: http://bugs.launchpad.net/bugs/1670508
Commit 4fcd1813e640 ("Fix reconnect to not defer smb3 session reconnect
long after socket reconnect") changes the behaviour of the SMB2 echo
service and causes it to renegotiate after a socket reconnect. However
under default settings, the echo service could take up to 120 seconds to
be scheduled.
The patch forces the echo service to be called immediately resulting a
negotiate call being made immediately on reconnect.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
(cherry picked from commit b8c600120fc87d53642476f48c8055b38d6e14c7) Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Pavel Shilovsky [Wed, 1 Mar 2017 00:05:19 +0000 (16:05 -0800)]
CIFS: Fix possible use after free in demultiplex thread
BugLink: http://bugs.launchpad.net/bugs/1670508
The recent changes that added SMB3 encryption support introduced
a possible use after free in the demultiplex thread. When we
process an encrypted packed we obtain a pointer to SMB session
but do not obtain a reference. This can possibly lead to a situation
when this session was freed before we copy a decryption key from
there. Fix this by obtaining a copy of the key rather than a pointer
to the session under a spinlock.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
(cherry picked from commit 61cfac6f267dabcf2740a7ec8a0295833b28b5f5) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(backport from commit ae6f8dd4d0c87bfb72da9d9b56342adf53e69c31) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
[cascardo: fixup of conflict with 06e7bc1427cfb2fe4d2ac84305afb79dbb0b42f3] Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Pavel Shilovsky [Fri, 18 Nov 2016 00:20:23 +0000 (16:20 -0800)]
CIFS: Add capability to decrypt big read responses
BugLink: http://bugs.launchpad.net/bugs/1670508
Allow to decrypt transformed packets that are bigger than the big
buffer size. In particular it is used for read responses that can
only exceed the big buffer size.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit c42a6abe3012832a68a371dabe17c2ced97e62ad) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit 4326ed2f6a16ae9d33e4209b540dc9a371aba840) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Pavel Shilovsky [Fri, 18 Nov 2016 00:20:18 +0000 (16:20 -0800)]
CIFS: Add copy into pages callback for a read operation
BugLink: http://bugs.launchpad.net/bugs/1670508
Since we have two different types of reads (pagecache and direct)
we need to process such responses differently after decryption of
a packet. The change allows to specify a callback that copies a read
payload data into preallocated pages.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit d70b9104b1ca586f73aaf59426756cec3325a40e) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Pavel Shilovsky [Wed, 16 Nov 2016 22:06:17 +0000 (14:06 -0800)]
CIFS: Add mid handle callback
BugLink: http://bugs.launchpad.net/bugs/1670508
We need to process read responses differently because the data
should go directly into preallocated pages. This can be done
by specifying a mid handle callback.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit 9b7c18a2d4b798963ea80f6769701dcc4c24b55e) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Pavel Shilovsky [Thu, 17 Nov 2016 23:24:34 +0000 (15:24 -0800)]
CIFS: Add transform header handling callbacks
BugLink: http://bugs.launchpad.net/bugs/1670508
We need to recognize and parse transformed packets in demultiplex
thread to find a corresponsing mid and process it further.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit 9bb17e0916a03ab901fb684e874d77a1e96b3d1e) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Pavel Shilovsky [Thu, 3 Nov 2016 23:47:37 +0000 (16:47 -0700)]
CIFS: Encrypt SMB3 requests before sending
BugLink: http://bugs.launchpad.net/bugs/1670508
This change allows to encrypt packets if it is required by a server
for SMB sessions or tree connections.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(backported from commit 026e93dc0a3eefb0be060bcb9ecd8d7a7fd5c398) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Pavel Shilovsky [Tue, 8 Nov 2016 02:20:50 +0000 (18:20 -0800)]
CIFS: Enable encryption during session setup phase
BugLink: http://bugs.launchpad.net/bugs/1670508
In order to allow encryption on SMB connection we need to exchange
a session key and generate encryption and decryption keys.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit cabfb3680f78981d26c078a26e5c748531257ebb) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Pavel Shilovsky [Mon, 31 Oct 2016 20:49:30 +0000 (13:49 -0700)]
CIFS: Add capability to transform requests before sending
BugLink: http://bugs.launchpad.net/bugs/1670508
This will allow us to do protocol specific tranformations of packets
before sending to the server. For SMB3 it can be used to support
encryption.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit 7fb8986e7449d0a5cebd84d059927afa423fbf85) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Pavel Shilovsky [Wed, 23 Nov 2016 23:31:54 +0000 (15:31 -0800)]
CIFS: Separate RFC1001 length processing for SMB2 read
BugLink: http://bugs.launchpad.net/bugs/1670508
Allocate and initialize SMB2 read request without RFC1001 length
field to directly call cifs_send_recv() rather than SendReceive2()
in a read codepath.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit b8f57ee8aad414a3122bff72d7968a94baacb9b6) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Pavel Shilovsky [Mon, 24 Oct 2016 23:59:57 +0000 (16:59 -0700)]
CIFS: Separate SMB2 sync header processing
BugLink: http://bugs.launchpad.net/bugs/1670508
Do not process RFC1001 length in smb2_hdr_assemble() because
it is not a part of SMB2 header. This allows to cleanup the code
and adds a possibility combine several SMB2 packets into one
for compounding.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit cb200bd6264a80c04e09e8635fa4f3901cabdaef) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Pavel Shilovsky [Wed, 23 Nov 2016 23:14:57 +0000 (15:14 -0800)]
CIFS: Send RFC1001 length in a separate iov
BugLink: http://bugs.launchpad.net/bugs/1670508
In order to simplify further encryption support we need to separate
RFC1001 length and SMB2 header when sending a request. Put the length
field in iov[0] and the rest of the packet into following iovs.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit 738f9de5cdb9175c19d24cfdf90b4543fc3b47bf) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Pavel Shilovsky [Tue, 25 Oct 2016 18:38:47 +0000 (11:38 -0700)]
CIFS: Make SendReceive2() takes resp iov
BugLink: http://bugs.launchpad.net/bugs/1670508
Now SendReceive2 frees the first iov and returns a response buffer
in it that increases a code complexity. Simplify this by making
a caller responsible for freeing request buffer itself and returning
a response buffer in a separate iov.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit da502f7df03d2d0b416775f92ae022f3f82bedd5) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Pavel Shilovsky [Mon, 24 Oct 2016 22:33:04 +0000 (15:33 -0700)]
CIFS: Separate SMB2 header structure
BugLink: http://bugs.launchpad.net/bugs/1670508
In order to support compounding and encryption we need to separate
RFC1001 length field and SMB2 header structure because the protocol
treats them differently. This change will allow to simplify parsing
of such complex SMB2 packets further.
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit 31473fc4f9653b73750d3792ffce6a6e1bdf0da7) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Steve French <sfrench@samba.org>
(cherry picked from commit b9be76d585d48cb25af8db0d35e1ef9030fbe13a) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Jean Delvare [Wed, 25 Jan 2017 15:08:17 +0000 (16:08 +0100)]
cifs: Only select the required crypto modules
BugLink: http://bugs.launchpad.net/bugs/1670508
The sha256 and cmac crypto modules are only needed for SMB2+, so move
the select statements to config CIFS_SMB2. Also select CRYPTO_AES
there as SMB2+ needs it.
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Steve French <sfrench@samba.org>
(cherry picked from commit 3692304bba6164be3810afd41b84ecb0e1e41db1) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Jean Delvare [Wed, 25 Jan 2017 15:07:29 +0000 (16:07 +0100)]
cifs: Simplify SMB2 and SMB311 dependencies
BugLink: http://bugs.launchpad.net/bugs/1670508
* CIFS_SMB2 depends on CIFS, which depends on INET and selects NLS. So
these dependencies do not need to be repeated for CIFS_SMB2.
* CIFS_SMB311 depends on CIFS_SMB2, which depends on INET. So this
dependency doesn't need to be repeated for CIFS_SMB311.
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Steve French <sfrench@samba.org>
(cherry picked from commit c1ecea87471bbb614f8121e00e5787f363140365) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Steve French [Sat, 12 Nov 2016 04:36:20 +0000 (22:36 -0600)]
SMB3: parsing for new snapshot timestamp mount parm
BugLink: http://bugs.launchpad.net/bugs/1670508
New mount option "snapshot=<time>" to allow mounting an earlier
version of the remote volume (if such a snapshot exists on
the server).
Note that eventually specifying a snapshot time of 1 will allow
the user to mount the oldest snapshot. A subsequent patch
add the processing for that and another for actually specifying
the "time warp" create context on SMB2/SMB3 open.
Check to make sure SMB2 negotiated, and ensure that
we use a different tcon if mount same share twice
but with different snaphshot times
Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit 8b217fe7fcadd162944a88b14990b9723c27419f) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Sachin Prabhu [Fri, 7 Oct 2016 18:11:22 +0000 (19:11 +0100)]
SMB2: Separate RawNTLMSSP authentication from SMB2_sess_setup
BugLink: http://bugs.launchpad.net/bugs/1670508
We split the rawntlmssp authentication into negotiate and
authencate parts. We also clean up the code and add helpers.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit 166cea4dc3a4f66f020cfb9286225ecd228ab61d) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit 3baf1a7b921500596b77487d5a34a27d656fc032) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Steve French [Fri, 23 Sep 2016 05:44:16 +0000 (00:44 -0500)]
SMB3: Add mount parameter to allow user to override max credits
BugLink: http://bugs.launchpad.net/bugs/1670508
Add mount option "max_credits" to allow setting maximum SMB3
credits to any value from 10 to 64000 (default is 32000).
This can be useful to workaround servers with problems allocating
credits, or to throttle the client to use smaller amount of
simultaneous i/o or to workaround server performance issues.
Also adds a cap, so that even if the server granted us more than
65000 credits due to a server bug, we would not use that many.
Signed-off-by: Steve French <steve.french@primarydata.com>
(cherry picked from commit 141891f4727c08829755be6c785e125d2e96c899) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(backported from commit 71335664c38f03de10d7cf1d82705fe55a130b33) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 16c568efff82e4a6a75d2bd86576e648fad8a7fe) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
(cherry picked from commit 2da62906b1e298695e1bb725927041cd59942c98) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Steve French [Fri, 18 Dec 2015 18:31:36 +0000 (12:31 -0600)]
cifs: Make echo interval tunable
BugLink: http://bugs.launchpad.net/bugs/1670508
Currently the echo interval is set to 60 seconds using a macro. This
setting determines the interval at which echo requests are sent to the
server on an idling connection. This setting also affects the time
required for a connection to an unresponsive server to timeout.
Making this setting a tunable allows users to control the echo interval
times as well as control the time after which the connecting to an
unresponsive server times out.
To set echo interval, pass the echo_interval=n mount option.
Version four of the patch.
v2: Change MIN and MAX timeout values
v3: Remove incorrect comment in cifs_get_tcp_session
v4: Fix bug in setting echo_intervalw
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
(backported from commit adfeb3e00e8e1b9fb4ad19eb7367e7c272d16003) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Daniel Borkmann [Wed, 21 Jun 2017 14:13:32 +0000 (22:13 +0800)]
bpf: don't let ldimm64 leak map addresses on unprivileged
CVE-2017-9150
The patch fixes two things at once:
1) It checks the env->allow_ptr_leaks and only prints the map address to
the log if we have the privileges to do so, otherwise it just dumps 0
as we would when kptr_restrict is enabled on %pK. Given the latter is
off by default and not every distro sets it, I don't want to rely on
this, hence the 0 by default for unprivileged.
2) Printing of ldimm64 in the verifier log is currently broken in that
we don't print the full immediate, but only the 32 bit part of the
first insn part for ldimm64. Thus, fix this up as well; it's okay to
access, since we verified all ldimm64 earlier already (including just
constants) through replace_map_fd_with_map_ptr().
Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs") Fixes: cbd357008604 ("bpf: verifier (add ability to receive verification log)") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit 0d0e57697f162da4aa218b5feafe614fb666db07) Signed-off-by: Wen-chien Jesse Sung <jesse.sung@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Shrirang Bagul <shrirang.bagul@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Linus Torvalds [Fri, 9 Jun 2017 12:56:00 +0000 (14:56 +0200)]
/proc/iomem: only expose physical resource addresses to privileged users
CVE-2015-8944
In commit c4004b02f8e5b ("x86: remove the kernel code/data/bss resources
from /proc/iomem") I was hoping to remove the phyiscal kernel address
data from /proc/iomem entirely, but that had to be reverted because some
system programs actually use it.
This limits all the detailed resource information to properly
credentialed users instead.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 51d7b120418e99d6b3bf8df9eb3cc31e8171dee4) Signed-off-by: Brad Figg <brad.figg@canonical.com> Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com> Acked-by: Colin King <colin.king@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Make file credentials available to the seqfile interfaces
A lot of seqfile users seem to be using things like %pK that uses the
credentials of the current process, but that is actually completely
wrong for filesystem interfaces.
The unix semantics for permission checking files is to check permissions
at _open_ time, not at read or write time, and that is not just a small
detail: passing off stdin/stdout/stderr to a suid application and making
the actual IO happen in privileged context is a classic exploit
technique.
So if we want to be able to look at permissions at read time, we need to
use the file open credentials, not the current ones. Normal file
accesses can just use "f_cred" (or any of the helper functions that do
that, like file_ns_capable()), but the seqfile interfaces do not have
any such options.
It turns out that seq_file _does_ save away the user_ns information of
the file, though. Since user_ns is just part of the full credential
information, replace that special case with saving off the cred pointer
instead, and suddenly seq_file has all the permission information it
needs.
(cherry-picked from commit 34dbbcdbf63360661ff7bda6c5f52f99ac515f92) Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin King <colin.king@canonical.com> Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1698817 Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
My static checker complains that if "lvl" is ULONG_MAX (this is 64 bit)
then some of the strings will overflow. I don't know if that's possible
but it seems simple enough to make the buffers slightly larger.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Waldemar Brodkorb <wbx@openadk.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
For most cases a protection exception in the host (e.g. copy
on write or dirty tracking) on the sie instruction will indicate
an instruction length of 4. Turns out that there are some corner
cases (e.g. runtime instrumentation) where this is not necessarily
true and the ILC is unpredictable.
Let's replace our 4 byte rewind_pad with 3 byte nops to prepare for
all possible ILCs.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Linux IRQ #0 is reserved for error reporting and may not be used.
Increase NR_IRQS for one additional slot and increase
irq_domain_add_legacy parameter first_irq value to 1, so that linux
IRQ #0 is not associated with hardware IRQ #0 in legacy IRQ domains.
Introduce macro XTENSA_PIC_LINUX_IRQ for static translation of xtensa
PIC hardware IRQ # to linux IRQ #. Use this macro in XTFPGA platform
data definitions.
This fixes inability to use hardware IRQ #0 in configurations that don't
use device tree and allows for non-identity mapping between linux IRQ #
and hardware IRQ #.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
In tipc_conn_sendmsg(), we first queue the request to the outqueue
followed by the connection state check. If the connection is not
connected, we should not queue this message.
In this commit, we reject the messages if the connection state is
not CF_CONNECTED.
Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: John Thompson <thompa.atl@gmail.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Commit 8a59f5d25265 ("fs/romfs: return f_fsid for statfs(2)") generates
a 64bit id from sb->s_bdev->bd_dev. This is only correct when romfs is
defined with CONFIG_ROMFS_ON_BLOCK. If romfs is only defined with
CONFIG_ROMFS_ON_MTD, sb->s_bdev is NULL, referencing sb->s_bdev->bd_dev
will triger an oops.
Richard Weinberger points out that when CONFIG_ROMFS_BACKED_BY_BOTH=y,
both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD are defined.
Therefore when calling huge_encode_dev() to generate a 64bit id, I use
the follow order to choose parameter,
- CONFIG_ROMFS_ON_BLOCK defined
use sb->s_bdev->bd_dev
- CONFIG_ROMFS_ON_BLOCK undefined and CONFIG_ROMFS_ON_MTD defined
use sb->s_dev when,
- both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD undefined
leave id as 0
When CONFIG_ROMFS_ON_MTD is defined and sb->s_mtd is not NULL, sb->s_dev
is set to a device ID generated by MTD_BLOCK_MAJOR and mtd index,
otherwise sb->s_dev is 0.
This is a try-best effort to generate a uniq file system ID, if all the
above conditions are not meet, f_fsid of this romfs instance will be 0.
Generally only one romfs can be built on single MTD block device, this
method is enough to identify multiple romfs instances in a computer.
Link: http://lkml.kernel.org/r/1482928596-115155-1-git-send-email-colyli@suse.de Signed-off-by: Coly Li <colyli@suse.de> Reported-by: Nong Li <nongli1031@gmail.com> Tested-by: Nong Li <nongli1031@gmail.com> Cc: Richard Weinberger <richard.weinberger@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
sctp_addr_id2transport is a function for sockopt to look up assoc by
address. As the address is from userspace, it can be a v4-mapped v6
address. But in sctp protocol stack, it always handles a v4-mapped
v6 address as a v4 address. So it's necessary to convert it to a v4
address before looking up assoc by address.
This patch is to fix it by calling sctp_verify_addr in which it can do
this conversion before calling sctp_endpoint_lookup_assoc, just like
what sctp_sendmsg and __sctp_connect do for the address from users.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Stop the tx when the napi is disabled to prevent napi_schedule() is
called.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
The rtl8152_post_reset() should sumbit rx urb and interrupt transfer,
otherwise the rx wouldn't work and the linking change couldn't be
detected.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Re-schedule napi after napi_complete() for tx, if it is necessay.
In r8152_poll(), if the tx is completed after tx_bottom() and before
napi_complete(), the scheduling of napi would be lost. Then, no
one handles the next tx until the next napi_schedule() is called.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Lock sequence IDs are bumped in decode_lock by calling
nfs_increment_seqid(). nfs_increment_sequid() does not use the
seqid_mutating_err() function fixed in commit 059aa7348241 ("Don't
increment lock sequence ID after NFS4ERR_MOVED").
Fixes: 059aa7348241 ("Don't increment lock sequence ID after ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Xuan Qi <xuan.qi@oracle.com> Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
"swiotlb buffer is full" errors occur after repeated initialisation of a
device - f.e. suspend/resume or ip link set up/down. This is because memory
mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit()
is not released. Resolve this problem by unmapping descriptors when
freeing rings.
Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
[simon: reworked] Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
The original ast driver will access some BMC configuration through P2A bridge
that can be disabled since AST2300 and after.
It will cause system hanged if P2A bridge is disabled.
Here is the update to fix it.
Signed-off-by: Y.C. Chen <yc_chen@aspeedtech.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
As it turns out, on cards that actually have CRTCs on them we're already
calling drm_kms_helper_poll_enable(drm_dev) from
nouveau_display_resume() before we call it in
nouveau_pmops_runtime_resume(). This leads us to accidentally trying to
enable polling twice, which results in a potential deadlock between the
RPM locks and drm_dev->mode_config.mutex if we end up trying to enable
polling the second time while output_poll_execute is running and holding
the mode_config lock. As such, make sure we only enable polling in
nouveau_pmops_runtime_resume() if we need to.
This fixes hangs observed on the ThinkPad W541
Signed-off-by: Lyude <lyude@redhat.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Kilian Singer <kilian.singer@quantumtechnology.info> Cc: Lukas Wunner <lukas@wunner.de> Cc: David Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
init_ring(), refill_rx_ring() and start_tx() don't check
if mapping dma memory succeed.
The patch adds the checks and failure handling.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
In spite of switching to paged allocation of Rx buffers, the driver still
called dma_unmap_single() in the Rx queues tear-down path.
The DMA region unmapping code in free_skb_rx_queue() basically predates
the introduction of paged allocation to the driver. While being refactored,
it apparently hasn't reflected the change in the DMA API usage by its
counterpart gfar_new_page().
As a result, setting an interface to the DOWN state now yields the following:
# ip link set eth2 down
fsl-gianfar ffe24000.ethernet: DMA-API: device driver frees DMA memory with wrong function [device address=0x000000001ecd0000] [size=40]
------------[ cut here ]------------
WARNING: CPU: 1 PID: 189 at lib/dma-debug.c:1123 check_unmap+0x8e0/0xa28
CPU: 1 PID: 189 Comm: ip Tainted: G O 4.9.5 #1
task: dee73400 task.stack: dede2000
NIP: c02101e8 LR: c02101e8 CTR: c0260d74
REGS: dede3bb0 TRAP: 0700 Tainted: G O (4.9.5)
MSR: 00021000 <CE,ME> CR: 28002222 XER: 00000000
ip6_make_flowlabel() determines the flow label for IPv6 packets. It's
supposed to be passed a flow label, which it returns as is if non-0 and
in some other cases, otherwise it calculates a new value.
The problem is callers often pass a flowi6.flowlabel, which may also
contain traffic class bits. If the traffic class is non-0
ip6_make_flowlabel() mistakes the non-0 it gets as a flow label and
returns the whole thing. Thus it can return a 'flow label' longer than
20b and the low 20b of that is typically 0 resulting in packets with 0
label. Moreover, different packets of a flow may be labeled differently.
For a TCP flow with ECN non-payload and payload packets get different
labels as exemplified by this pair of consecutive packets:
This patch allows ip6_make_flowlabel() to be passed more than just a
flow label and has it extract the part it really wants. This was simpler
than modifying the callers. With this patch packets like the above become
Signed-off-by: Dimitris Michailidis <dmichail@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Initialise the stores_lock in fscache netfs cookies. Technically, it
shouldn't be necessary, since the netfs cookie is an index and stores no
data, but initialising it anyway adds insignificant overhead.
Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
fscache_disable_cookie() needs to clear the outstanding writes on the
cookie it's disabling because they cannot be completed after.
Without this, fscache_nfs_open_file() gets stuck because it disables the
cookie when the file is opened for writing but can't uncache the pages till
afterwards - otherwise there's a race between the open routine and anyone
who already has it open R/O and is still reading from it.
Looking in /proc/pid/stack of the offending process shows:
Under some circumstances, an fscache object can become queued such that it
fscache_object_work_func() can be called once the object is in the
OBJECT_DEAD state. This results in the kernel oopsing when it tries to
invoke the handler for the state (which is hard coded to 0x2).
The way this comes about is something like the following:
(1) The object dispatcher is processing a work state for an object. This
is done in workqueue context.
(2) An out-of-band event comes in that isn't masked, causing the object to
be queued, say EV_KILL.
(3) The object dispatcher finishes processing the current work state on
that object and then sees there's another event to process, so,
without returning to the workqueue core, it processes that event too.
It then follows the chain of events that initiates until we reach
OBJECT_DEAD without going through a wait state (such as
WAIT_FOR_CLEARANCE).
At this point, object->events may be 0, object->event_mask will be 0
and oob_event_mask will be 0.
(4) The object dispatcher returns to the workqueue processor, and in due
course, this sees that the object's work item is still queued and
invokes it again.
(5) The current state is a work state (OBJECT_DEAD), so the dispatcher
jumps to it - resulting in an OOPS.
When I'm seeing this, the work state in (1) appears to have been either
LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is
fscache_osm_lookup_oob).
The window for (2) is very small:
(A) object->event_mask is cleared whilst the event dispatch process is
underway - though there's no memory barrier to force this to the top
of the function.
The window, therefore is from the time the object was selected by the
workqueue processor and made requeueable to the time the mask was
cleared.
(B) fscache_raise_event() will only queue the object if it manages to set
the event bit and the corresponding event_mask bit was set.
The enqueuement is then deferred slightly whilst we get a ref on the
object and get the per-CPU variable for workqueue congestion. This
slight deferral slightly increases the probability by allowing extra
time for the workqueue to make the item requeueable.
Handle this by giving the dead state a processor function and checking the
for the dead state address rather than seeing if the processor function is
address 0x2. The dead state processor function can then set a flag to
indicate that it's occurred and give a warning if it occurs more than once
per object.
If this race occurs, an oops similar to the following is seen (note the RIP
value):
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeremy McNicoll <jeremymc@redhat.com> Tested-by: Frank Sorenson <sorenson@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>