mm, thp: Do not make pmd/pud dirty without a reason
Currently we make page table entries dirty all the time regardless of
access type and don't even consider if the mapping is write-protected.
The reasoning is that we don't really need dirty tracking on THP and
making the entry dirty upfront may save some time on first write to the
page.
Unfortunately, such approach may result in false-positive
can_follow_write_pmd() for huge zero page or read-only shmem file.
Let's only make page dirty only if we about to write to the page anyway
(as we do for small pages).
I've restructured the code to make entry dirty inside
maybe_p[mu]d_mkwrite(). It also takes into account if the vma is
write-protected.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jon Maloy [Mon, 27 Nov 2017 19:13:39 +0000 (20:13 +0100)]
tipc: eliminate access after delete in group_filter_msg()
KASAN revealed another access after delete in group.c. This time
it found that we read the header of a received message after the
buffer has been released.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eduardo Otubo [Thu, 23 Nov 2017 14:18:35 +0000 (15:18 +0100)]
xen-netfront: remove warning when unloading module
v2:
* Replace busy wait with wait_event()/wake_up_all()
* Cannot garantee that at the time xennet_remove is called, the
xen_netback state will not be XenbusStateClosed, so added a
condition for that
* There's a small chance for the xen_netback state is
XenbusStateUnknown by the time the xen_netfront switches to Closed,
so added a condition for that.
When unloading module xen_netfront from guest, dmesg would output
warning messages like below:
[ 105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use!
[ 105.236839] deferring g.e. 0x903 (pfn 0x35805)
This problem relies on netfront and netback being out of sync. By the time
netfront revokes the g.e.'s netback didn't have enough time to free all of
them, hence displaying the warnings on dmesg.
The trick here is to make netfront to wait until netback frees all the g.e.'s
and only then continue to cleanup for the module removal, and this is done by
manipulating both device states.
Signed-off-by: Eduardo Otubo <otubo@redhat.com> Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Liu Bo [Tue, 21 Nov 2017 21:35:40 +0000 (14:35 -0700)]
Btrfs: fix list_add corruption and soft lockups in fsync
Xfstests btrfs/146 revealed this corruption,
[ 58.138831] Buffer I/O error on dev dm-0, logical block 2621424, async page read
[ 58.151233] BTRFS error (device sdf): bdev /dev/mapper/error-test errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
[ 58.152403] list_add corruption. prev->next should be next (ffff88005e6775d8), but was ffffc9000189be88. (prev=ffffc9000189be88).
[ 58.153518] ------------[ cut here ]------------
[ 58.153892] WARNING: CPU: 1 PID: 1287 at lib/list_debug.c:31 __list_add_valid+0x169/0x1f0
...
[ 58.157379] RIP: 0010:__list_add_valid+0x169/0x1f0
...
[ 58.161956] Call Trace:
[ 58.162264] btrfs_log_inode_parent+0x5bd/0xfb0 [btrfs]
[ 58.163583] btrfs_log_dentry_safe+0x60/0x80 [btrfs]
[ 58.164003] btrfs_sync_file+0x4c2/0x6f0 [btrfs]
[ 58.164393] vfs_fsync_range+0x5f/0xd0
[ 58.164898] do_fsync+0x5a/0x90
[ 58.165170] SyS_fsync+0x10/0x20
[ 58.165395] entry_SYSCALL_64_fastpath+0x1f/0xbe
...
It turns out that we could record btrfs_log_ctx:io_err in
log_one_extents when IO fails, but make log_one_extents() return '0'
instead of -EIO, so the IO error is not acknowledged by the callers,
i.e. btrfs_log_inode_parent(), which would remove btrfs_log_ctx:list
from list head 'root->log_ctxs'. Since btrfs_log_ctx is allocated
from stack memory, it'd get freed with a object alive on the
list. then a future list_add will throw the above warning.
This returns the correct error in the above case.
Jeff also reported this while testing against his fsync error
patch set[1].
[1]: https://www.spinics.net/lists/linux-btrfs/msg65308.html
"btrfs list corruption and soft lockups while testing writeback error handling"
Fixes: 8407f553268a4611f254 ("Btrfs: fix data corruption after fast fsync and writeback error") Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
David S. Miller [Mon, 27 Nov 2017 16:09:42 +0000 (01:09 +0900)]
Merge tag 'mac80211-for-davem-2017-11-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Four fixes:
* CRYPTO_SHA256 is needed for regdb validation
* mac80211: mesh path metric was wrong in some frames
* mac80211: use QoS null-data packets on QoS connections
* mac80211: tear down RX aggregation sessions first to
drop fewer packets in HW restart scenarios
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Qu Wenruo [Mon, 6 Nov 2017 02:43:18 +0000 (10:43 +0800)]
btrfs: Fix wild memory access in compression level parser
[BUG]
Kernel panic when mounting with "-o compress" mount option.
KASAN will report like:
------
==================================================================
BUG: KASAN: wild-memory-access in strncmp+0x31/0xc0
Read of size 1 at addr d86735fce994f800 by task mount/662
...
Call Trace:
dump_stack+0xe3/0x175
kasan_report+0x163/0x370
__asan_load1+0x47/0x50
strncmp+0x31/0xc0
btrfs_compress_str2level+0x20/0x70 [btrfs]
btrfs_parse_options+0xff4/0x1870 [btrfs]
open_ctree+0x2679/0x49f0 [btrfs]
btrfs_mount+0x1b7f/0x1d30 [btrfs]
mount_fs+0x49/0x190
vfs_kern_mount.part.29+0xba/0x280
vfs_kern_mount+0x13/0x20
btrfs_mount+0x31e/0x1d30 [btrfs]
mount_fs+0x49/0x190
vfs_kern_mount.part.29+0xba/0x280
do_mount+0xaad/0x1a00
SyS_mount+0x98/0xe0
entry_SYSCALL_64_fastpath+0x1f/0xbe
------
[Cause]
For 'compress' and 'compress_force' options, its token doesn't expect
any parameter so its args[0] contains uninitialized data.
Accessing args[0] will cause above wild memory access.
[Fix]
For Opt_compress and Opt_compress_force, set compression level to
the default.
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com>
[ set the default in advance ] Signed-off-by: David Sterba <dsterba@suse.com>
Xin Long [Sat, 25 Nov 2017 13:05:36 +0000 (21:05 +0800)]
sctp: set sender next_tsn for the old result with ctsn_ack_point plus 1
When doing asoc reset, if the sender of the response has already sent some
chunk and increased asoc->next_tsn before the duplicate request comes, the
response will use the old result with an incorrect sender next_tsn.
Better than asoc->next_tsn, asoc->ctsn_ack_point can't be changed after
the sender of the response has performed the asoc reset and before the
peer has confirmed it, and it's value is still asoc->next_tsn original
value minus 1.
This patch sets sender next_tsn for the old result with ctsn_ack_point
plus 1 when processing the duplicate request, to make sure the sender
next_tsn value peer gets will be always right.
Fixes: 692787cef651 ("sctp: implement receiver-side procedures for the SSN/TSN Reset Request Parameter") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 25 Nov 2017 13:05:35 +0000 (21:05 +0800)]
sctp: avoid flushing unsent queue when doing asoc reset
Now when doing asoc reset, it cleans up sacked and abandoned queues
by calling sctp_outq_free where it also cleans up unsent, retransmit
and transmitted queues.
It's safe for the sender of response, as these 3 queues are empty at
that time. But when the receiver of response is doing the reset, the
users may already enqueue some chunks into unsent during the time
waiting the response, and these chunks should not be flushed.
To void the chunks in it would be removed, it moves the queue into a
temp list, then gets it back after sctp_outq_free is done.
The patch also fixes some incorrect comments in
sctp_process_strreset_tsnreq.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 25 Nov 2017 13:05:34 +0000 (21:05 +0800)]
sctp: only allow the asoc reset when the asoc outq is empty
As it says in rfc6525#section5.1.4, before sending the request,
C2: The sender has either no outstanding TSNs or considers all
outstanding TSNs abandoned.
Prior to this patch, it tried to consider all outstanding TSNs abandoned
by dropping all chunks in all outqs with sctp_outq_free (even including
sacked, retransmit and transmitted queues) when doing this reset, which
is too aggressive.
To make it work gently, this patch will only allow the asoc reset when
the sender has no outstanding TSNs by checking if unsent, transmitted
and retransmit are all empty with sctp_outq_is_empty before sending
and processing the request.
Fixes: 692787cef651 ("sctp: implement receiver-side procedures for the SSN/TSN Reset Request Parameter") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 25 Nov 2017 13:05:33 +0000 (21:05 +0800)]
sctp: only allow the out stream reset when the stream outq is empty
Now the out stream reset in sctp stream reconf could be done even if
the stream outq is not empty. It means that users can not be sure
since which msg the new ssn will be used.
To make this more synchronous, it shouldn't allow to do out stream
reset until these chunks in unsent outq all are sent out.
This patch checks the corresponding stream outqs when sending and
processing the request . If any of them has unsent chunks in outq,
it will return -EAGAIN instead or send SCTP_STRRESET_IN_PROGRESS
back to the sender.
Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter") Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sat, 25 Nov 2017 13:05:32 +0000 (21:05 +0800)]
sctp: use sizeof(__u16) for each stream number length instead of magic number
Now in stream reconf part there are still some places using magic
number 2 for each stream number length. To make it more readable,
this patch is to replace them with sizeof(__u16).
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Josef Bacik [Wed, 15 Nov 2017 21:20:52 +0000 (16:20 -0500)]
btrfs: fix deadlock when writing out space cache
If we fail to prepare our pages for whatever reason (out of memory in
our case) we need to make sure to drop the block_group->data_rwsem,
otherwise hilarity ensues.
Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com>
[ add label and use existing unlocking code ] Signed-off-by: David Sterba <dsterba@suse.com>
Sara Sharon [Sun, 29 Oct 2017 09:51:09 +0000 (11:51 +0200)]
mac80211: tear down RX aggregations first
When doing HW restart we tear down aggregations.
Since at this point we are not TX'ing any aggregation, while
the peer is still sending RX aggregation over the air, it will
make sense to tear down the RX aggregations first.
Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
zhangliping [Sat, 25 Nov 2017 14:02:12 +0000 (22:02 +0800)]
openvswitch: fix the incorrect flow action alloc size
If we want to add a datapath flow, which has more than 500 vxlan outputs'
action, we will get the following error reports:
openvswitch: netlink: Flow action size 32832 bytes exceeds max
openvswitch: netlink: Flow action size 32832 bytes exceeds max
openvswitch: netlink: Actions may not be safe on all matching packets
... ...
It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
this is not the root cause. For example, for a vxlan output action, we need
about 60 bytes for the nlattr, but after it is converted to the flow
action, it only occupies 24 bytes. This means that we can still support
more than 1000 vxlan output actions for a single datapath flow under the
the current 32k max limitation.
So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
shouldn't report EINVAL and keep it move on, as the judgement can be
done by the reserve_sfa_size.
Signed-off-by: zhangliping <zhangliping02@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 26 Nov 2017 23:03:49 +0000 (15:03 -0800)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- LPAE fixes for kernel-readonly regions
- Fix for get_user_pages_fast on LPAE systems
- avoid tying decompressor to a particular platform if DEBUG_LL is
enabled
- BUG if we attempt to return to userspace but the to-be-restored PSR
value keeps us in privileged mode (defeating an issue that ftracetest
found)
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: BUG if jumping to usermode address in kernel mode
ARM: 8722/1: mm: make STRICT_KERNEL_RWX effective for LPAE
ARM: 8721/1: mm: dump: check hardware RO bit for LPAE
ARM: make decompressor debug output user selectable
ARM: fix get_user_pages_fast
Linus Torvalds [Sun, 26 Nov 2017 22:39:20 +0000 (14:39 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Glexiner:
- unbreak the irq trigger type check for legacy platforms
- a handful fixes for ARM GIC v3/4 interrupt controllers
- a few trivial fixes all over the place
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/matrix: Make - vs ?: Precedence explicit
irqchip/imgpdc: Use resource_size function on resource object
irqchip/qcom: Fix u32 comparison with value less than zero
irqchip/exiu: Fix return value check in exiu_init()
irqchip/gic-v3-its: Remove artificial dependency on PCI
irqchip/gic-v4: Add forward definition of struct irq_domain_ops
irqchip/gic-v3: pr_err() strings should end with newlines
irqchip/s3c24xx: pr_err() strings should end with newlines
irqchip/gic-v3: Fix ppi-partitions lookup
irqchip/gic-v4: Clear IRQ_DISABLE_UNLAZY again if mapping fails
genirq: Track whether the trigger type has been set
Linus Torvalds [Sun, 26 Nov 2017 22:11:54 +0000 (14:11 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
- topology enumeration fixes
- KASAN fix
- two entry fixes (not yet the big series related to KASLR)
- remove obsolete code
- instruction decoder fix
- better /dev/mem sanity checks, hopefully working better this time
- pkeys fixes
- two ACPI fixes
- 5-level paging related fixes
- UMIP fixes that should make application visible faults more debuggable
- boot fix for weird virtualization environment
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/decoder: Add new TEST instruction pattern
x86/PCI: Remove unused HyperTransport interrupt support
x86/umip: Fix insn_get_code_seg_params()'s return value
x86/boot/KASLR: Remove unused variable
x86/entry/64: Add missing irqflags tracing to native_load_gs_index()
x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow
x86/entry/64: Fix entry_SYSCALL_64_after_hwframe() IRQ tracing
x86/pkeys/selftests: Fix protection keys write() warning
x86/pkeys/selftests: Rename 'si_pkey' to 'siginfo_pkey'
x86/mpx/selftests: Fix up weird arrays
x86/pkeys: Update documentation about availability
x86/umip: Print a warning into the syslog if UMIP-protected instructions are used
x86/smpboot: Fix __max_logical_packages estimate
x86/topology: Avoid wasting 128k for package id array
perf/x86/intel/uncore: Cache logical pkg id in uncore driver
x86/acpi: Reduce code duplication in mp_override_legacy_irq()
x86/acpi: Handle SCI interrupts above legacy space gracefully
x86/boot: Fix boot failure when SMP MP-table is based at 0
x86/mm: Limit mmap() of /dev/mem to valid physical addresses
x86/selftests: Add test for mapping placement for 5-level paging
...
Linus Torvalds [Sun, 26 Nov 2017 21:43:25 +0000 (13:43 -0800)]
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc fixes: a documentation fix, a Sparse warning fix and a debugging
fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/debug: Fix task state recording/printout
sched/deadline: Don't use dubious signed bitfields
sched/deadline: Fix the description of runtime accounting in the documentation
Linus Torvalds [Sun, 26 Nov 2017 21:41:48 +0000 (13:41 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Misc fixes: two PMU driver fixes and a memory leak fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix memory leak triggered by perf --namespace
perf/x86/intel/uncore: Add event constraint for BDX PCU
perf/x86/intel: Hide TSX events when RTM is not supported
Linus Torvalds [Sun, 26 Nov 2017 21:11:18 +0000 (13:11 -0800)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"A handful of objtool fixes, most of them related to making the UAPI
header-syncing warnings easier to read and easier to act upon"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/headers: Sync objtool UAPI header
objtool: Fix cross-build
objtool: Move kernel headers/code sync check to a script
objtool: Move synced files to their original relative locations
objtool: Make unreachable annotation inline asms explicitly volatile
objtool: Add a comment for the unreachable annotation macros
net: openvswitch: datapath: fix data type in queue_gso_packets
gso_type is being used in binary AND operations together with SKB_GSO_UDP.
The issue is that variable gso_type is of type unsigned short and
SKB_GSO_UDP expands to more than 16 bits:
SKB_GSO_UDP = 1 << 16
this makes any binary AND operation between gso_type and SKB_GSO_UDP to
be always zero, hence making some code unreachable and likely causing
undesired behavior.
Fix this by changing the data type of variable gso_type to unsigned int.
Addresses-Coverity-ID: 1462223 Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 24 Nov 2017 23:49:34 +0000 (23:49 +0000)]
ARM: BUG if jumping to usermode address in kernel mode
Detect if we are returning to usermode via the normal kernel exit paths
but the saved PSR value indicates that we are in kernel mode. This
could occur due to corrupted stack state, which has been observed with
"ftracetest".
This ensures that we catch the problem case before we get to user code.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
New file seems to have missed the SPDX license scan and update.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 24 Nov 2017 16:36:06 +0000 (11:36 -0500)]
net: dsa: fix 'increment on 0' warning
Setting the refcount to 0 when allocating a tree to match the number of
switch devices it holds may cause an 'increment on 0; use-after-free',
if CONFIG_REFCOUNT_FULL is enabled.
To fix this, do not decrement the refcount of a newly allocated tree,
increment it when an already allocated tree is found, and decrement it
after the probing of a switch, as done with the previous behavior.
At the same time, make dsa_tree_get and dsa_tree_put accept a NULL
argument to simplify callers, and return the tree after incrementation,
as most kref users like of_node_get and of_node_put do.
Fixes: 8e5bf9759a06 ("net: dsa: simplify tree reference counting") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jorgen Hansen [Fri, 24 Nov 2017 14:25:28 +0000 (06:25 -0800)]
VSOCK: Don't call vsock_stream_has_data in atomic context
When using the host personality, VMCI will grab a mutex for any
queue pair access. In the detach callback for the vmci vsock
transport, we call vsock_stream_has_data while holding a spinlock,
and vsock_stream_has_data will access a queue pair.
To avoid this, we can simply omit calling vsock_stream_has_data
for host side queue pairs, since the QPs are empty per default
when the guest has detached.
This bug affects users of VMware Workstation using kernel version
4.4 and later.
Testing: Ran vsock tests between guest and host, and verified that
with this change, the host isn't calling vsock_stream_has_data
during detach. Ran mixedTest between guest and host using both
guest and host as server.
v2: Rebased on top of recent change to sk_state values Reviewed-by: Adit Ranadive <aditr@vmware.com> Reviewed-by: Aditya Sarwade <asarwade@vmware.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 25 Nov 2017 18:21:54 +0000 (08:21 -1000)]
Merge tag 'arc-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
- more changes for HS48 cores: supporting MMUv5, detecting new
micro-arch gizmos
- axs10x platform wiring up reset driver merged in this cycle
- ARC perf driver optimizations
* tag 'arc-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: perf: avoid vmalloc backed mmap
ARCv2: perf: optimize given that num counters <= 32
ARCv2: perf: tweak overflow interrupt
ARC: [plat-axs10x] DTS: Add reset controller node to manage ethernet reset
ARCv2: boot log: updates for HS48: dual-issue, ECC, Loop Buffer
ARCv2: Accomodate HS48 MMUv5 by relaxing MMU ver checking
ARC: [plat-axs10x] auto-select AXS101 or AXS103 given the ISA config
Linus Torvalds [Sat, 25 Nov 2017 18:06:30 +0000 (08:06 -1000)]
Merge tag 'kbuild-v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- use 'pwd' instead of '/bin/pwd' for portability
- clean up Makefiles
- fix ld-option for clang
- fix malloc'ed data size in Kconfig
- fix parallel building along with coccicheck
- fix a minor issue of package building
- prompt to use "rpm-pkg" instead of "rpm"
- clean up *.i and *.lst patterns by "make clean"
* tag 'kbuild-v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: drop $(extra-y) from real-objs-y
kbuild: clean up *.i and *.lst patterns by make clean
kbuild: rpm: prompt to use "rpm-pkg" if "rpm" target is used
kbuild: pkg: use --transform option to prefix paths in tar
coccinelle: fix parallel build with CHECK=scripts/coccicheck
kconfig/symbol.c: use correct pointer type argument for sizeof
kbuild: Set KBUILD_CFLAGS before incl. arch Makefile
kbuild: remove all dummy assignments to obj-
kbuild: create built-in.o automatically if parent directory wants it
kbuild: /bin/pwd -> pwd
Linus Torvalds [Sat, 25 Nov 2017 17:58:25 +0000 (07:58 -1000)]
Merge tag 'afs-fixes-20171124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
- Make AFS file locking work again.
- Don't write to a page that's being written out, but wait for it to
complete.
- Do d_drop() and d_add() in the right places.
- Put keys on error paths.
- Remove some redundant code.
* tag 'afs-fixes-20171124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: remove redundant assignment of dvnode to itself
afs: cell: Remove unnecessary code in afs_lookup_cell
afs: Fix signal handling in some file ops
afs: Fix some dentry handling in dir ops and missing key_puts
afs: Make afs_write_begin() avoid writing to a page that's being stored
afs: Fix file locking
Roman Kapl [Fri, 24 Nov 2017 11:27:58 +0000 (12:27 +0100)]
net: sched: crash on blocks with goto chain action
tcf_block_put_ext has assumed that all filters (and thus their goto
actions) are destroyed in RCU callback and thus can not race with our
list iteration. However, that is not true during netns cleanup (see
tcf_exts_get_net comment).
Prevent the user after free by holding all chains (except 0, that one is
already held). foreach_safe is not enough in this case.
To reproduce, run the following in a netns and then delete the ns:
ip link add dtest type dummy
tc qdisc add dev dtest ingress
tc filter add dev dtest chain 1 parent ffff: handle 1 prio 1 flower action goto chain 2
Fixes: 822e86d997 ("net_sched: remove tcf_block_put_deferred()") Signed-off-by: Roman Kapl <code@rkapl.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mika Westerberg [Fri, 24 Nov 2017 11:05:36 +0000 (14:05 +0300)]
net: thunderbolt: Stop using zero to mean no valid DMA mapping
Commit 86dabda426ac ("net: thunderbolt: Clear finished Tx frame bus
address in tbnet_tx_callback()") fixed a DMA-API violation where the
driver called dma_unmap_page() in tbnet_free_buffers() for a bus address
that might already be unmapped. The fix was to zero out the bus address
of a frame in tbnet_tx_callback().
However, as pointed out by David Miller, zero might well be valid
mapping (at least in theory) so it is not good idea to use it here.
It turns out that we don't need the whole map/unmap dance for Tx buffers
at all. Instead we can map the buffers when they are initially allocated
and unmap them when the interface is brought down. In between we just
DMA sync the buffers for the CPU or device as needed.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Thu, 23 Nov 2017 19:34:31 +0000 (22:34 +0300)]
net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts
Don't offload IP header checksum to NIC.
This fixes a previous patch which enabled checksum offloading
for both IPv4 and IPv6 packets. So L3 checksum offload was
getting enabled for IPv6 pkts. And HW is dropping these pkts
as it assumes the pkt is IPv4 when IP csum offload is set
in the SQ descriptor.
Fixes: 3a9024f52c2e ("net: thunderx: Enable TSO and checksum offloads for ipv6") Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@auriga.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 25 Nov 2017 05:44:25 +0000 (19:44 -1000)]
Merge tag 'kvm-4.15-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"Trimmed second batch of KVM changes for Linux 4.15:
- GICv4 Support for KVM/ARM
- re-introduce support for CPUs without virtual NMI (cc stable) and
allow testing of KVM without virtual NMI on available CPUs
- fix long-standing performance issues with assigned devices on AMD
(cc stable)"
* tag 'kvm-4.15-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (30 commits)
kvm: vmx: Allow disabling virtual NMI support
kvm: vmx: Reinstate support for CPUs without virtual NMI
KVM: SVM: obey guest PAT
KVM: arm/arm64: Don't queue VLPIs on INV/INVALL
KVM: arm/arm64: Fix GICv4 ITS initialization issues
KVM: arm/arm64: GICv4: Theory of operations
KVM: arm/arm64: GICv4: Enable VLPI support
KVM: arm/arm64: GICv4: Prevent userspace from changing doorbell affinity
KVM: arm/arm64: GICv4: Prevent a VM using GICv4 from being saved
KVM: arm/arm64: GICv4: Enable virtual cpuif if VLPIs can be delivered
KVM: arm/arm64: GICv4: Hook vPE scheduling into vgic flush/sync
KVM: arm/arm64: GICv4: Use the doorbell interrupt as an unblocking source
KVM: arm/arm64: GICv4: Add doorbell interrupt handling
KVM: arm/arm64: GICv4: Use pending_last as a scheduling hint
KVM: arm/arm64: GICv4: Handle INVALL applied to a vPE
KVM: arm/arm64: GICv4: Propagate property updates to VLPIs
KVM: arm/arm64: GICv4: Handle MOVALL applied to a vPE
KVM: arm/arm64: GICv4: Handle CLEAR applied to a VLPI
KVM: arm/arm64: GICv4: Propagate affinity changes to the physical ITS
KVM: arm/arm64: GICv4: Unmap VLPI when freeing an LPI
...
Linus Torvalds [Sat, 25 Nov 2017 05:40:12 +0000 (19:40 -1000)]
Merge tag 'powerpc-4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A small batch of fixes, about 50% tagged for stable and the rest for
recently merged code.
There's one more fix for the >128T handling on hash. Once a process
had requested a single mmap above 128T we would then always search
above 128T. The correct behaviour is to consider the hint address in
isolation for each mmap request.
Then a couple of fixes for the IMC PMU, a missing EXPORT_SYMBOL in
VAS, a fix for STRICT_KERNEL_RWX on 32-bit, and a fix to correctly
identify P9 DD2.1 but in code that is currently not used by default.
* tag 'powerpc-4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s: Fix Power9 DD2.1 logic in DT CPU features
powerpc/perf: Fix IMC_MAX_PMU macro
powerpc/perf: Fix pmu_count to count only nest imc pmus
powerpc: Fix boot on BOOK3S_32 with CONFIG_STRICT_KERNEL_RWX
powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id()
powerpc/vas: Export chip_to_vas_id()
powerpc/64s/slice: Use addr limit when computing slice mask
Linus Torvalds [Sat, 25 Nov 2017 05:19:20 +0000 (19:19 -1000)]
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"This series is predominantly bug-fixes, with a few small improvements
that have been outstanding over the last release cycle.
As usual, the associated bug-fixes have CC' tags for stable.
Also, things have been particularly quiet wrt new developments the
last months, with most folks continuing to focus on stability atop 4.x
stable kernels for their respective production configurations.
Also at this point, the stable trees have been synced up with
mainline. This will continue to be a priority, as production users
tend to run exclusively atop stable kernels, a few releases behind
mainline.
- Fix quiese during transport_write_pending_qf endless loop (nab)
- Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK in 3.14+
(Don White + nab)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (35 commits)
tcmu: Add a missing unlock on an error path
tcmu: Fix some memory corruption
iscsi-target: Fix non-immediate TMR reference leak
iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
target: Fix quiese during transport_write_pending_qf endless loop
target: Fix caw_sem leak in transport_generic_request_failure
target: Fix QUEUE_FULL + SCSI task attribute handling
iSCSI-target: Use common error handling code in iscsi_decode_text_input()
target/iscsi: Detect conn_cmd_list corruption early
target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd()
target/iscsi: Modify iscsit_do_crypto_hash_buf() prototype
target/iscsi: Fix endianness in an error message
target/iscsi: Use min() in iscsit_dump_data_payload() instead of open-coding it
target/iscsi: Define OFFLOAD_BUF_SIZE once
target: Inline transport_put_cmd()
target: Suppress gcc 7 fallthrough warnings
target: Move a declaration of a global variable into a header file
tcmu: fix double se_cmd completion
target: return SAM_STAT_TASK_SET_FULL for TCM_OUT_OF_RESOURCES
...
Ondrej Mosnáček [Thu, 23 Nov 2017 12:49:06 +0000 (13:49 +0100)]
crypto: skcipher - Fix skcipher_walk_aead_common
The skcipher_walk_aead_common function calls scatterwalk_copychunks on
the input and output walks to skip the associated data. If the AD end
at an SG list entry boundary, then after these calls the walks will
still be pointing to the end of the skipped region.
These offsets are later checked for alignment in skcipher_walk_next,
so the skcipher_walk may detect the alignment incorrectly.
This patch fixes it by calling scatterwalk_done after the copychunks
calls to ensure that the offsets refer to the right SG list entry.
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface") Cc: <stable@vger.kernel.org> Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Zhu Yanjun [Mon, 20 Nov 2017 03:21:08 +0000 (22:21 -0500)]
forcedeth: replace pci_unmap_page with dma_unmap_page
The function pci_unmap_page is obsolete. So it is replaced with
the function dma_unmap_page.
CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Joe Jin <joe.jin@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 20 Nov 2017 13:58:20 +0000 (13:58 +0000)]
afs: remove redundant assignment of dvnode to itself
The assignment of dvnode to itself is redundant and can be removed.
Cleans up warning detected by cppcheck:
fs/afs/dir.c:975: (warning) Redundant assignment of 'dvnode' to itself.
Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 20 Nov 2017 22:41:00 +0000 (22:41 +0000)]
afs: Fix signal handling in some file ops
afs_mkdir(), afs_create(), afs_link() and afs_symlink() all need to drop
the target dentry if a signal causes the operation to be killed immediately
before we try to contact the server.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Mon, 20 Nov 2017 23:04:08 +0000 (23:04 +0000)]
afs: Fix some dentry handling in dir ops and missing key_puts
Fix some of dentry handling in AFS directory ops:
(1) Do d_drop() on the new_dentry before assigning a new inode to it in
afs_vnode_new_inode(). It's fine to do this before calling afs_iget()
because the operation has taken place on the server.
(2) Replace d_instantiate()/d_rehash() with d_add().
(3) Don't d_drop() the new_dentry in afs_rename() on error.
Also fix afs_link() and afs_rename() to call key_put() on all error paths
where the key is taken.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Sat, 18 Nov 2017 00:13:30 +0000 (00:13 +0000)]
afs: Make afs_write_begin() avoid writing to a page that's being stored
Make afs_write_begin() wait for a page that's marked PG_writeback because:
(1) We need to avoid interference with the data being stored so that the
data on the server ends up in a defined state.
(2) page->private is used to track the window of dirty data within a page,
but it's also used by the storage code to track what's being written,
being cleared by the completion notification. Ownership can't be
relinquished by the storage code until completion because it a store
fails, the data must be remarked dirty.
Tracing shows something like the following (edited):
The clear (completion) corresponding to the store+ (store continuation from
a previous page) happens between the second begin (afs_write_begin) and the
store corresponding to that. This results in the second store not seeing
any data to write back, leading to the following warning:
David Howells [Fri, 24 Nov 2017 10:18:42 +0000 (10:18 +0000)]
rxrpc: Fix conn expiry timers
Fix the rxrpc connection expiry timers so that connections for closed
AF_RXRPC sockets get deleted in a more timely fashion, freeing up the
transport UDP port much more quickly.
(1) Replace the delayed work items with work items plus timers so that
timer_reduce() can be used to shorten them and so that the timer
doesn't requeue the work item if the net namespace is dead.
(2) Don't use queue_delayed_work() as that won't alter the timeout if the
timer is already running.
(3) Don't rearm the timers if the network namespace is dead.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 24 Nov 2017 10:18:42 +0000 (10:18 +0000)]
rxrpc: Fix service endpoint expiry
RxRPC service endpoints expire like they're supposed to by the following
means:
(1) Mark dead rxrpc_net structs (with ->live) rather than twiddling the
global service conn timeout, otherwise the first rxrpc_net struct to
die will cause connections on all others to expire immediately from
then on.
(2) Mark local service endpoints for which the socket has been closed
(->service_closed) so that the expiration timeout can be much
shortened for service and client connections going through that
endpoint.
(3) rxrpc_put_service_conn() needs to schedule the reaper when the usage
count reaches 1, not 0, as idle conns have a 1 count.
(4) The accumulator for the earliest time we might want to schedule for
should be initialised to jiffies + MAX_JIFFY_OFFSET, not ULONG_MAX as
the comparison functions use signed arithmetic.
(5) Simplify the expiration handling, adding the expiration value to the
idle timestamp each time rather than keeping track of the time in the
past before which the idle timestamp must go to be expired. This is
much easier to read.
(6) Ignore the timeouts if the net namespace is dead.
(7) Restart the service reaper work item rather the client reaper.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 24 Nov 2017 10:18:42 +0000 (10:18 +0000)]
rxrpc: Add keepalive for a call
We need to transmit a packet every so often to act as a keepalive for the
peer (which has a timeout from the last time it received a packet) and also
to prevent any intervening firewalls from closing the route.
Do this by resetting a timer every time we transmit a packet. If the timer
ever expires, we transmit a PING ACK packet and thereby also elicit a PING
RESPONSE ACK from the other side - which prevents our last-rx timeout from
expiring.
The timer is set to 1/6 of the last-rx timeout so that we can detect the
other side going away if it misses 6 replies in a row.
This is particularly necessary for servers where the processing of the
service function may take a significant amount of time.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 24 Nov 2017 10:18:42 +0000 (10:18 +0000)]
rxrpc: Add a timeout for detecting lost ACKs/lost DATA
Add an extra timeout that is set/updated when we send a DATA packet that
has the request-ack flag set. This allows us to detect if we don't get an
ACK in response to the latest flagged packet.
The ACK packet is adjudged to have been lost if it doesn't turn up within
2*RTT of the transmission.
If the timeout occurs, we schedule the sending of a PING ACK to find out
the state of the other side. If a new DATA packet is ready to go sooner,
we cancel the sending of the ping and set the request-ack flag on that
instead.
If we get back a PING-RESPONSE ACK that indicates a lower tx_top than what
we had at the time of the ping transmission, we adjudge all the DATA
packets sent between the response tx_top and the ping-time tx_top to have
been lost and retransmit immediately.
Rather than sending a PING ACK, we could just pick a DATA packet and
speculatively retransmit that with request-ack set. It should result in
either a REQUESTED ACK or a DUPLICATE ACK which we can then use in lieu the
a PING-RESPONSE ACK mentioned above.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 24 Nov 2017 10:18:41 +0000 (10:18 +0000)]
rxrpc: Express protocol timeouts in terms of RTT
Express protocol timeouts for data retransmission and deferred ack
generation in terms on RTT rather than specified timeouts once we have
sufficient RTT samples.
For the moment, this requires just one RTT sample to be able to use this
for ack deferral and two for data retransmission.
The data retransmission timeout is set at RTT*1.5 and the ACK deferral
timeout is set at RTT.
Note that the calculated timeout is limited to a minimum of 4ns to make
sure it doesn't happen too quickly.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 24 Nov 2017 10:18:41 +0000 (10:18 +0000)]
rxrpc: Don't transmit DELAY ACKs immediately on proposal
Don't transmit a DELAY ACK immediately on proposal when the Rx window is
rotated, but rather defer it to the work function. This means that we have
a chance to queue/consume more received packets before we actually send the
DELAY ACK, or even cancel it entirely, thereby reducing the number of
packets transmitted.
We do, however, want to continue sending other types of packet immediately,
particularly REQUESTED ACKs, as they may be used for RTT calculation by the
other side.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 24 Nov 2017 10:18:41 +0000 (10:18 +0000)]
rxrpc: Fix call timeouts
Fix the rxrpc call expiration timeouts and make them settable from
userspace. By analogy with other rx implementations, there should be three
timeouts:
(1) "Normal timeout"
This is set for all calls and is triggered if we haven't received any
packets from the peer in a while. It is measured from the last time
we received any packet on that call. This is not reset by any
connection packets (such as CHALLENGE/RESPONSE packets).
If a service operation takes a long time, the server should generate
PING ACKs at a duration that's substantially less than the normal
timeout so is to keep both sides alive. This is set at 1/6 of normal
timeout.
(2) "Idle timeout"
This is set only for a service call and is triggered if we stop
receiving the DATA packets that comprise the request data. It is
measured from the last time we received a DATA packet.
(3) "Hard timeout"
This can be set for a call and specified the maximum lifetime of that
call. It should not be specified by default. Some operations (such
as volume transfer) take a long time.
Allow userspace to set/change the timeouts on a call with sendmsg, using a
control message:
RXRPC_SET_CALL_TIMEOUTS
The data to the message is a number of 32-bit words, not all of which need
be given:
u32 hard_timeout; /* sec from first packet */
u32 idle_timeout; /* msec from packet Rx */
u32 normal_timeout; /* msec from data Rx */
This can be set in combination with any other sendmsg() that affects a
call.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 24 Nov 2017 10:18:41 +0000 (10:18 +0000)]
rxrpc: Split the call params from the operation params
When rxrpc_sendmsg() parses the control message buffer, it places the
parameters extracted into a structure, but lumps together call parameters
(such as user call ID) with operation parameters (such as whether to send
data, send an abort or accept a call).
Split the call parameters out into their own structure, a copy of which is
then embedded in the operation parameters struct.
The call parameters struct is then passed down into the places that need it
instead of passing the individual parameters. This allows for extra call
parameters to be added.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 24 Nov 2017 10:18:41 +0000 (10:18 +0000)]
rxrpc: Delay terminal ACK transmission on a client call
Delay terminal ACK transmission on a client call by deferring it to the
connection processor. This allows it to be skipped if we can send the next
call instead, the first DATA packet of which will implicitly ack this call.
Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Fri, 24 Nov 2017 10:18:40 +0000 (10:18 +0000)]
rxrpc: Provide a different lockdep key for call->user_mutex for kernel calls
Provide a different lockdep key for rxrpc_call::user_mutex when the call is
made on a kernel socket, such as by the AFS filesystem.
The problem is that lockdep registers a false positive between userspace
calling the sendmsg syscall on a user socket where call->user_mutex is held
whilst userspace memory is accessed whereas the AFS filesystem may perform
operations with mmap_sem held by the caller.
In such a case, the following warning is produced.
======================================================
WARNING: possible circular locking dependency detected
4.14.0-fscache+ #243 Tainted: G E
------------------------------------------------------
modpost/16701 is trying to acquire lock:
(&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs]
but task is already holding lock:
(&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
Thomas Gleixner [Wed, 22 Nov 2017 12:05:48 +0000 (13:05 +0100)]
sched/debug: Fix task state recording/printout
The recent conversion of the task state recording to use task_state_index()
broke the sched_switch tracepoint task state output.
task_state_index() returns surprisingly an index (0-7) which is then
printed with __print_flags() applying bitmasks. Not really working and
resulting in weird states like 'prev_state=t' instead of 'prev_state=I'.
Use TASK_REPORT_MAX instead of TASK_STATE_MAX to report preemption. Build a
bitmask from the return value of task_state_index() and store it in
entry->prev_state, which makes __print_flags() work as expected.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@vger.kernel.org Fixes: efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1711221304180.1751@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org>
Masami Hiramatsu [Fri, 24 Nov 2017 04:56:30 +0000 (13:56 +0900)]
x86/decoder: Add new TEST instruction pattern
The kbuild test robot reported this build warning:
Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c
Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx)
Warning: objdump says 3 bytes, but insn_get_length() says 2
Warning: decoded and checked 1569014 instructions with 1 warnings
This sequence seems to be a new instruction not in the opcode map in the Intel SDM.
The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8.
Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of
the ModR/M Byte (bits 2,1,0 in parenthesis)"
In that table, opcodes listed by the index REG bits as:
1) Fix PCI IDs of 9000 series iwlwifi devices, from Luca Coelho.
2) bpf offload bug fixes from Jakub Kicinski.
3) Fix bpf verifier to NOP out code which is dead at run time because
due to branch pruning the verifier will not explore such
instructions. From Alexei Starovoitov.
4) Fix crash when deleting secondary chains in packet scheduler
classifier. From Roman Kapl.
5) Fix buffer management bugs in smc, from Ursula Braun.
6) Fix regression in anycast route handling, from David Ahern.
7) Fix link settings regression in r8169, from Tobias Jakobi.
8) Add back enough UFO support so that live migration still works, from
Willem de Bruijn.
9) Linearize enough packet data for the full extent to which the ipvlan
code will inspect the packet headers, from Gao Feng.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
ipvlan: Fix insufficient skb linear check for ipv6 icmp
ipvlan: Fix insufficient skb linear check for arp
geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6
net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY
net: accept UFO datagrams from tuntap and packet
net: realtek: r8169: implement set_link_ksettings()
net: ipv6: Fixup device for anycast routes during copy
net/smc: Fix preinitialization of buf_desc in __smc_buf_create()
net/smc: use sk_rcvbuf as start for rmb creation
ipv6: Do not consider linkdown nexthops during multipath
net: sched: fix crash when deleting secondary chains
net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
bpf: fix branch pruning logic
bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO
bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO
bpf: remove explicit handling of 0 for arg2 in bpf_probe_read
bpf: introduce ARG_PTR_TO_MEM_OR_NULL
i40evf: Use smp_rmb rather than read_barrier_depends
fm10k: Use smp_rmb rather than read_barrier_depends
igb: Use smp_rmb rather than read_barrier_depends
...
Linus Torvalds [Fri, 24 Nov 2017 07:14:30 +0000 (21:14 -1000)]
Merge tag 'platform-drivers-x86-v4.15-2' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Darren Hart:
"Fix two issues resulting from the dell-smbios refactoring and
introduction of the dell-smbios-wmi dispatcher.
The first ensures a proper error code is returned when kzalloc fails.
The second avoids an issue in older Dell BIOS implementations which
would fail if the more complex calls were made by limiting those
platforms to the simple calls such as those used by the existing
dell-laptop and dell-wmi drivers, preserving their functionality prior
to the addition of the dell-smbios-wmi dispatcher"
* tag 'platform-drivers-x86-v4.15-2' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: dell-laptop: fix error return code in dell_init()
platform/x86: dell-smbios-wmi: Disable userspace interface if missing hotfix
Linus Torvalds [Fri, 24 Nov 2017 07:12:58 +0000 (21:12 -1000)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two basic fixes: one for the sparse problem with the blacklist flags
and another for a hang forever in bnx2i"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Use 'blist_flags_t' for scsi_devinfo flags
scsi: bnx2fc: Fix hung task messages when a cleanup response is not received during abort
- Missing help text for the recent Intel SST kconfig change"
* tag 'sound-fix-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda: Add Raven PCI ID
ALSA: hda/realtek - Fix ALC700 family no sound issue
ALSA: hda - Fix yet remaining issue with vmaster 0dB initialization
ALSA: usb-audio: Add sanity checks in v2 clock parsers
ALSA: usb-audio: Fix potential zero-division at parsing FU
ALSA: usb-audio: Fix potential out-of-bound access at parsing SU
ALSA: usb-audio: Add sanity checks to FE parser
ALSA: timer: Remove kernel warning at compat ioctl error paths
ALSA: pcm: update tstamp only if audio_tstamp changed
ALSA: hda/realtek: Add headset mic support for Intel NUC Skull Canyon
ALSA: hda: Fix too short HDMI/DP chmap reporting
ALSA: usb-audio: uac1: Invalidate ctl on interrupt
ALSA: hda/realtek - Fix ALC275 no sound issue
ASoC: Intel: Add help text for SND_SOC_INTEL_SST_TOPLEVEL
Linus Torvalds [Fri, 24 Nov 2017 07:04:56 +0000 (21:04 -1000)]
Merge tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux
Pull more drm updates from Dave Airlie:
"Fixes/cleanups for rc1, non-desktop flags for VR
- remove the MSM dt-bindings file Rob managed to push in the previous
pull.
- add a property/edid quirk to denote HMD devices, I had these
hanging around for a few weeks and Keith had done some work on
them, they are fairly self contained and small, and only affect
people using HTC Vive VR headsets so far.
- amdgpu, tegra, tilcdc, fsl fixes
- some imx-drm cleanups I missed, these seemed pretty small, and no
reason to hold off.
I have one TTM regression fix (fixes bochs-vga in qemu) sitting
locally awaiting review I'll probably send that in a separate pull
request tomorrow"
* tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux: (33 commits)
dt-bindings: remove file that was added accidentally
drm/edid: quirk HTC vive headset as non-desktop. [v2]
drm/fb: add support for not enabling fbcon on non-desktop displays [v2]
drm: add connector info/property for non-desktop displays [v2]
drm/amdgpu: fix rmmod KCQ disable failed error
drm/amdgpu: fix kernel hang when starting VNC server
drm/amdgpu: don't skip attributes when powerplay is enabled
drm/amd/pp: fix typecast error in powerplay.
drm/tilcdc: Remove obsolete "ti,tilcdc,slave" dts binding support
drm/tegra: sor: Reimplement pad clock
Revert "drm/radeon: dont switch vt on suspend"
drm/amd/amdgpu: fix over-bound accessing in amdgpu_cs_wait_any_fence
drm/amd/powerplay: fix unfreeze level smc message for smu7
drm/amdgpu:fix memleak
drm/amdgpu:fix memleak in takedown
drm/amd/pp: fix dpm randomly failed on Vega10
drm/amdgpu: set f_mapping on exported DMA-bufs
drm/amdgpu: Properly allocate VM invalidate eng v2
drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume()
drm/fsl-dcu: avoid disabling pixel clock twice on suspend
...
Linus Torvalds [Fri, 24 Nov 2017 07:01:32 +0000 (21:01 -1000)]
Merge tag 'docs-4.15-2' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"A few late-arriving docs updates that have no real reason to wait.
There's a new "Co-Developed-by" tag described by Greg, and a build
enhancement from Willy to generate docs warnings during a kernel build
(but only when additional warnings have been requested in general)"
* tag 'docs-4.15-2' of git://git.lwn.net/linux:
Add optional check for bad kernel-doc comments
Documentation: fix profile= options in kernel-parameters.txt
documentation/svga.txt: update outdated file
kokr/memory-barriers.txt: Fix typo in paring example
kokr/memory-barriers/txt: Replace uses of "transitive"
Documentation/process: add Co-Developed-by: tag for patches with multiple authors
Linus Torvalds [Fri, 24 Nov 2017 06:51:27 +0000 (20:51 -1000)]
Merge branch 'next-keys' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull keys update from James Morris:
"There's nothing too controversial here:
- Doc fix for keyctl_read().
- time_t -> time64_t replacement.
- Set the module licence on things to prevent tainting"
* 'next-keys' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
pkcs7: Set the module licence to prevent tainting
security: keys: Replace time_t with time64_t for struct key_preparsed_payload
security: keys: Replace time_t/timespec with time64_t
KEYS: fix in-kernel documentation for keyctl_read()
Bug Fixes:
- initialized returned struct aa_perms
- fix leak of null profile name if profile allocation fails
- ensure that undecidable profile attachments fail
- fix profile attachment for special unconfined profiles
- fix locking when creating a new complain profile.
- fix possible recursive lock warning in __aa_create_ns"
* tag 'apparmor-pr-2017-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
apparmor: fix possible recursive lock warning in __aa_create_ns
apparmor: fix locking when creating a new complain profile.
apparmor: fix profile attachment for special unconfined profiles
apparmor: ensure that undecidable profile attachments fail
apparmor: fix leak of null profile name if profile allocation fails
apparmor: remove unused redundant variable stop
apparmor: Fix bool initialization/comparison
apparmor: initialized returned struct aa_perms
apparmor: fix spelling mistake: "resoure" -> "resource"
Stephan Mueller [Fri, 10 Nov 2017 12:20:55 +0000 (13:20 +0100)]
crypto: af_alg - remove locking in async callback
The code paths protected by the socket-lock do not use or modify the
socket in a non-atomic fashion. The actions pertaining the socket do not
even need to be handled as an atomic operation. Thus, the socket-lock
can be safely ignored.
This fixes a bug regarding scheduling in atomic as the callback function
may be invoked in interrupt context.
In addition, the sock_hold is moved before the AIO encrypt/decrypt
operation to ensure that the socket is always present. This avoids a
tiny race window where the socket is unprotected and yet used by the AIO
operation.
Finally, the release of resources for a crypto operation is moved into a
common function of af_alg_free_resources.
Stephan Mueller [Fri, 10 Nov 2017 10:04:52 +0000 (11:04 +0100)]
crypto: algif_aead - skip SGL entries with NULL page
The TX SGL may contain SGL entries that are assigned a NULL page. This
may happen if a multi-stage AIO operation is performed where the data
for each stage is pointed to by one SGL entry. Upon completion of that
stage, af_alg_pull_tsgl will assign NULL to the SGL entry.
The NULL cipher used to copy the AAD from TX SGL to the destination
buffer, however, cannot handle the case where the SGL starts with an SGL
entry having a NULL page. Thus, the code needs to advance the start
pointer into the SGL to the first non-NULL entry.
This fixes a crash visible on Intel x86 32 bit using the libkcapi test
suite.
Cc: <stable@vger.kernel.org> Fixes: 72548b093ee38 ("crypto: algif_aead - copy AAD from src to dst") Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Dave Airlie [Fri, 24 Nov 2017 01:33:29 +0000 (11:33 +1000)]
Merge tag 'drm-misc-fixes-2017-11-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
4.15 merge window fixes 1
* tag 'drm-misc-fixes-2017-11-20' of git://anongit.freedesktop.org/drm/drm-misc:
drm/edid: Don't send non-zero YQ in AVI infoframe for HDMI 1.x sinks
drm/vc4: Account for interrupts in flight
Dave Airlie [Fri, 24 Nov 2017 01:33:12 +0000 (11:33 +1000)]
Merge tag 'drm-intel-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
drm/i915 fixes for v4.15
* tag 'drm-intel-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-intel:
drm/i915: Fix init_clock_gating for resume
drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM
drm/i915: Clear breadcrumb node when cancelling signaling
drm/i915/gvt: ensure -ve return value is handled correctly
drm/i915: Re-register PMIC bus access notifier on runtime resume
drm/i915: Fix false-positive assert_rpm_wakelock_held in i915_pmic_bus_access_notifier v2
Dave Airlie [Thu, 23 Nov 2017 02:12:17 +0000 (12:12 +1000)]
drm/ttm: don't attempt to use hugepages if dma32 requested (v2)
The commit below introduced thp support for ttm allocations, however it didn't
take into account the case where dma32 was requested. Some drivers always request
dma32, and the bochs driver is one of those.
Reported-by: Laura Abbott <labbott@redhat.com> Reported-by: Adam Williamson <awilliam@redhat.com> Fixes: 0284f1ead87463bc17cf5e81a24fc65c052486f3 (drm/ttm: add transparent huge page support for cached allocations v2) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Bjorn Helgaas [Wed, 22 Nov 2017 22:13:37 +0000 (16:13 -0600)]
x86/PCI: Remove unused HyperTransport interrupt support
There are no in-tree callers of ht_create_irq(), the driver interface for
HyperTransport interrupts, left. Remove the unused entry point and all the
supporting code.
See 8b955b0dddb3 ("[PATCH] Initial generic hypertransport interrupt
support").
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-pci@vger.kernel.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
Borislav Petkov [Thu, 23 Nov 2017 09:19:51 +0000 (10:19 +0100)]
x86/umip: Fix insn_get_code_seg_params()'s return value
In order to save on redundant structs definitions
insn_get_code_seg_params() was made to return two 4-bit values in a char
but clang complains:
arch/x86/lib/insn-eval.c:780:10: warning: implicit conversion from 'int' to 'char'
changes value from 132 to -124 [-Wconstant-conversion]
return INSN_CODE_SEG_PARAMS(4, 8);
~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~
./arch/x86/include/asm/insn-eval.h:16:57: note: expanded from macro 'INSN_CODE_SEG_PARAMS'
#define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4))
Those two values do get picked apart afterwards the opposite way of how
they were ORed so wrt to the LSByte, the return value is the same.
But this function returns -EINVAL in the error case, which is an int. So
make it return an int which is the native word size anyway and thus fix
the clang warning.
Reported-by: Kees Cook <keescook@google.com> Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ricardo.neri-calderon@linux.intel.com Link: https://lkml.kernel.org/r/20171123091951.1462-1-bp@alien8.de
Chao Fan [Thu, 23 Nov 2017 09:08:47 +0000 (17:08 +0800)]
x86/boot/KASLR: Remove unused variable
There are two variables "rc" in mem_avoid_memmap. One at the top of the
function and another one inside the while() loop. Drop the outer one as it
is unused. Cleanup some whitespace damage while at it.
Kees Cook [Wed, 22 Nov 2017 20:56:45 +0000 (12:56 -0800)]
genirq/matrix: Make - vs ?: Precedence explicit
Noticed with a Clang build. This improves the readability of the ?:
expression, as it has lower precedence than the - expression. Show
explicitly that - is evaluated first.
Colin Ian King [Fri, 17 Nov 2017 18:35:53 +0000 (18:35 +0000)]
irqchip/qcom: Fix u32 comparison with value less than zero
The comparison of u32 nregs being less than zero is never true since
nregs is unsigned. Fix this by making nregs a signed integer.
Fixes: f20cc9b00c7b ("irqchip/qcom: Add IRQ combiner driver") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kernel-janitors@vger.kernel.org Cc: Jason Cooper <jason@lakedaemon.net> Link: https://lkml.kernel.org/r/20171117183553.2739-1-colin.king@canonical.com
====================
ipvlan: Fix insufficient skb linear check
The current ipvlan codes use pskb_may_pull to get the skb linear header in
func ipvlan_get_L3_hdr, but the size isn't enough for arp and ipv6 icmp.
So it may access the unexpected momory in ipvlan_addr_lookup.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Thu, 23 Nov 2017 03:47:12 +0000 (11:47 +0800)]
ipvlan: Fix insufficient skb linear check for ipv6 icmp
In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to
make sure the skb header has enough linear room for ipv6 header. But it
would use the latter memory directly without linear check when it is icmp.
So it still may access the unepxected memory in ipvlan_addr_lookup.
Now invoke the pskb_may_pull again if it is ipv6 icmp.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Thu, 23 Nov 2017 03:47:11 +0000 (11:47 +0800)]
ipvlan: Fix insufficient skb linear check for arp
In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to
make sure the skb header has enough linear room for arp header. But it
would access the arp payload in func ipvlan_addr_lookup. So it still may
access the unepxected memory.
Now use arp_hdr_len(port->dev) instead of the arp header as the param.
Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Thu, 23 Nov 2017 03:27:24 +0000 (11:27 +0800)]
geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6
Stefano pointed that configure or show UDP_ZERO_CSUM6_RX/TX info doesn't
make sense if we haven't enabled CONFIG_IPV6. Fix it by adding
if IS_ENABLED(CONFIG_IPV6) check.
Fixes: abe492b4f50c ("geneve: UDP checksum configuration via netlink") Fixes: fd7eafd02121 ("geneve: fix fill_info when link down") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 23 Nov 2017 17:53:38 +0000 (02:53 +0900)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2017-11-21
This series contains fixes for igb/vf, ixgbe/vf, i40e/vf and fm10k.
Jake fixes a regression issue with older firmware, where we were using
the NVM lock to synchronize NVM reads for all devices and firmware
versions, yet this caused issues with older firmware prior to version
1.5. Fixed this by only grabbing the lock for newer devices and firmware
version 1.5 or newer.
Zijie Pan fixes the calculation of the i40e VF MAC addresses, where it was
possible to increment to the next MAC entry without calling
i40e_add_mac_filter().
Amritha removes the upper limit of 64 queues on a channel VSI since the
upper bound is determined by the VSI's num_queue_pairs.
Filip fixes an issue during FLR resets, where should have been checking
for upcoming core reset and if so, just return with I40E_ERR_NOT_READY.
Alan fixes the notifying clients of l2 parameters by copying the
parameters to the client instance struct and re-organizes the priority
in which the client tasks fire so that if the flag for notifying l2
params is set, it will trigger before the client open task. Also fixed
the promiscuous settings after reset for all the VSI's.
Brian King from IBM fixes an issue seen on Power systems which would
result in skb list corruption and eventual kernel oops. Brian
provides the same fix for nearly all our drivers, to replace the
read_barrier_depends with smp_rmb() to ensure loads are ordered with
respect to the load of tx_buffer->next_to_watch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 22 Nov 2017 01:37:46 +0000 (17:37 -0800)]
net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY
The PHY on BCM7278 has an additional bit that needs to be cleared:
IDDQ_GLOBAL_PWR, without doing this, the PHY remains stuck in reset out
of suspend/resume cycles.
Fixes: 0fe9933804eb ("net: dsa: bcm_sf2: Add support for BCM7278 integrated switch") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Several BPF offloading fixes, from Jakub. Among others:
- Limit offload to cls_bpf and XDP program types only.
- Move device validation into the driver and don't make
any assumptions about the device in the classifier due
to shared blocks semantics.
- Don't pass offloaded XDP program into the driver when
it should be run in native XDP instead. Offloaded ones
are not JITed for the host in such cases.
- Don't destroy device offload state when moved to
another namespace.
- Revert dumping offload info into user space for now,
since ifindex alone is not sufficient. This will be
redone properly for bpf-next tree.
2) Fix test_verifier to avoid using bpf_probe_write_user()
helper in test cases, since it's dumping a warning into
kernel log which may confuse users when only running tests.
Switch to use bpf_trace_printk() instead, from Yonghong.
3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics
before it becomes uabi, from Gianluca. More specifically:
- Add a type ARG_PTR_TO_MEM_OR_NULL that is used only
by bpf_csum_diff(), where the argument is either a
valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO
then enforces a valid pointer in case of non-0 size
or a valid pointer or NULL in case of size 0. Given
that, the semantics for ARG_PTR_TO_MEM in combination
with ARG_CONST_SIZE_OR_ZERO are now such that in case
of size 0, the pointer must always be valid and cannot
be NULL. This fix in semantics allows for bpf_probe_read()
to drop the recently added size == 0 check in the helper
that would become part of uabi otherwise once released.
At the same time we can then fix bpf_probe_read_str() and
bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO
instead of ARG_CONST_SIZE in order to fix recently
reported issues by Arnaldo et al, where LLVM optimizes
two boundary checks into a single one for unknown
variables where the verifier looses track of the variable
bounds and thus rejects valid programs otherwise.
4) A fix for the verifier for the case when it detects
comparison of two constants where the branch is guaranteed
to not be taken at runtime. Verifier will rightfully prune
the exploration of such paths, but we still pass the program
to JITs, where they would complain about using reserved
fields, etc. Track such dead instructions and sanitize
them with mov r0,r0. Rejection is not possible since LLVM
may generate them for valid C code and doesn't do as much
data flow analysis as verifier. For bpf-next we might
implement removal of such dead code and adjust branches
instead. Fix from Alexei.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Tue, 21 Nov 2017 15:22:25 +0000 (10:22 -0500)]
net: accept UFO datagrams from tuntap and packet
Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.
Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.
It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 939912216fa8 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1
("net: avoid skb_warn_bad_offload false positives on UFO").
(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.
Tested
Booted a v4.13 guest kernel with QEMU. On a host kernel before this
patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
enabled, same as on a v4.13 host kernel.
A UFO packet sent from the guest appears on the tap device:
host:
nc -l -p -u 8000 &
tcpdump -n -i tap0
Commit 6fa1ba61520576cf1346c4ff09a056f2950cb3bf partially
implemented the new ethtool API, by replacing get_settings()
with get_link_ksettings(). This breaks ethtool, since the
userspace tool (according to the new API specs) never tries
the legacy set() call, when the new get() call succeeds.
All attempts to chance some setting from userspace result in:
> Cannot set new settings: Operation not supported
Implement the missing set() call.
Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 21 Nov 2017 15:08:57 +0000 (07:08 -0800)]
net: ipv6: Fixup device for anycast routes during copy
Florian reported a breakage with anycast routes due to commit 4832c30d5458 ("net: ipv6: put host and anycast routes on device with
address"). Prior to this commit anycast routes were added against the
loopback device causing repetitive route entries with no insight into
why they existed. e.g.:
$ ip -6 ro ls table local type anycast
anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium
anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium
anycast fe80:: dev lo proto kernel metric 0 pref medium
anycast fe80:: dev lo proto kernel metric 0 pref medium
The point of commit 4832c30d5458 is to add the routes using the device
with the address which is causing the route to be added. e.g.,:
$ ip -6 ro ls table local type anycast
anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium
anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium
anycast fe80:: dev eth2 proto kernel metric 0 pref medium
anycast fe80:: dev eth1 proto kernel metric 0 pref medium
For traffic to work as it did before, the dst device needs to be switched
to the loopback when the copy is created similar to local routes.
Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/smc: Fix preinitialization of buf_desc in __smc_buf_create()
With gcc-4.1.2:
net/smc/smc_core.c: In function ‘__smc_buf_create’:
net/smc/smc_core.c:567: warning: ‘bufsize’ may be used uninitialized in this function
Indeed, if the for-loop is never executed, bufsize is used
uninitialized. In addition, buf_desc is stored for later use, while it
is still a NULL pointer.
Before, error handling was done by checking if buf_desc is non-NULL.
The cleanup changed this to an error check, but forgot to update the
preinitialization of buf_desc to an error pointer.
Update the preinitializatin of buf_desc to fix this.
Fixes: b33982c3a6838d13 ("net/smc: cleanup function __smc_buf_create()") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Tue, 21 Nov 2017 12:23:53 +0000 (13:23 +0100)]
net/smc: use sk_rcvbuf as start for rmb creation
Commit 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers")
merged handling of SMC receive and send buffers. It introduced sk_buf_size
as merged start value for size determination. But since sk_buf_size is not
used at all, sk_sndbuf is erroneously used as start for rmb creation.
This patch makes sure, sk_buf_size is really used as intended, and
sk_rcvbuf is used as start value for rmb creation.
Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>