Andi Kleen [Tue, 22 Aug 2017 18:52:00 +0000 (11:52 -0700)]
perf/x86: Only show format attributes when supported
Only show the Intel format attributes in sysfs when the feature is actually
supported with the current model numbers. This allows programs to probe
what format attributes are available, and give a sensible error message
to users if they are not.
This handles near all cases for intel attributes since Nehalem,
except the (obscure) case when the model number if known, but PEBS
is disabled in PERF_CAPABILITIES.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170822185201.9261-2-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
tracing, perf: Adjust code layout in get_recursion_context()
In an XDP redirect applications using tracepoint xdp:xdp_redirect to
diagnose TX overrun, I noticed perf_swevent_get_recursion_context()
was consuming 2% CPU. This was reduced to 1.85% with this simple
change.
Looking at the annotated asm code, it was clear that the unlikely case
in_nmi() test was chosen (by the compiler) as the most likely
event/branch. This small adjustment makes the compiler (GCC version
7.1.1 20170622 (Red Hat 7.1.1-3)) put in_nmi() as an unlikely branch.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/150342256382.16595.986861478681783732.stgit@firesoul Signed-off-by: Ingo Molnar <mingo@kernel.org>
Oleg Nesterov [Tue, 22 Aug 2017 15:59:28 +0000 (17:59 +0200)]
perf/core: Don't report zero PIDs for exiting tasks
The exiting/dead task has no PIDs and in this case perf_event_pid/tid()
return zero, change them to return -1 to distinguish this case from
idle threads.
Andi Kleen [Wed, 16 Aug 2017 22:21:54 +0000 (15:21 -0700)]
perf/x86: Fix data source decoding for Skylake
Skylake changed the encoding of the PEBS data source field.
Some combinations are not available anymore, but some new cases
e.g. for L4 cache hit are added.
Fix up the conversion table for Skylake, similar as had been done
for Nehalem.
On Skylake server the encoding for L4 actually means persistent
memory. Handle this case too.
To properly describe it in the abstracted perf format I had to add
some new fields. Since a hit can have only one level add a new
field that is an enumeration, not a bit field to describe
the level. It can describe any level. Some numbers are also
used to describe PMEM and LFB.
Also add a new generic remote flag that can be combined with
the generic level to signify a remote cache.
And there is an extension field for the snoop indication to handle
the Forward state.
I didn't add a generic flag for hops because it's not needed
for Skylake.
I changed the existing encodings for older CPUs to also fill in the
new level and remote fields.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: http://lkml.kernel.org/r/20170816222156.19953-3-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Wed, 16 Aug 2017 22:21:53 +0000 (15:21 -0700)]
perf/x86: Move Nehalem PEBS code to flag
Minor cleanup: use an explicit x86_pmu flag to handle the
missing Lock / TLB information on Nehalem, instead of always
checking the model number for each PEBS sample.
Will Deacon [Wed, 16 Aug 2017 16:18:17 +0000 (17:18 +0100)]
perf/aux: Ensure aux_wakeup represents most recent wakeup index
The aux_watermark member of struct ring_buffer represents the period (in
terms of bytes) at which wakeup events should be generated when data is
written to the aux buffer in non-snapshot mode. On hardware that cannot
generate an interrupt when the aux_head reaches an arbitrary wakeup index
(such as ARM SPE), the aux_head sampled from handle->head in
perf_aux_output_{skip,end} may in fact be past the wakeup index. This
can lead to wakeup slowly falling behind the head. For example, consider
the case where hardware can only generate an interrupt on a page-boundary
and the aux buffer is initialised as follows:
and the hardware will be programmed to generate an interrupt at
PAGE_SIZE.
When the interrupt is raised, the hardware head will be at PAGE_SIZE,
so calling perf_aux_output_end(handle, PAGE_SIZE) puts the ring buffer
into the following state:
and then the next call to perf_aux_output_begin will result in:
handle->head = handle->wakeup = PAGE_SIZE
for which the semantics are unclear and, for a smaller aux_watermark
(e.g. PAGE_SIZE / 4), then the wakeup would in fact be behind head at
this point.
This patch fixes the problem by rounding down the aux_head (as sampled
from the handle) to the nearest aux_watermark boundary when updating
rb->aux_wakeup, therefore taking into account any overruns by the
hardware.
Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1502900297-21839-2-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Will Deacon [Wed, 16 Aug 2017 16:18:16 +0000 (17:18 +0100)]
perf/aux: Make aux_{head,wakeup} ring_buffer members long
The aux_head and aux_wakeup members of struct ring_buffer are defined
using the local_t type, despite the fact that they are only accessed via
the perf_aux_output_*() functions, which cannot race with each other for a
given ring buffer.
This patch changes the type of the members to long, so we can avoid
using the local_*() API where it isn't needed.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1502900297-21839-1-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Mark Rutland [Thu, 22 Jun 2017 14:41:38 +0000 (15:41 +0100)]
perf/core: Fix group {cpu,task} validation
Regardless of which events form a group, it does not make sense for the
events to target different tasks and/or CPUs, as this leaves the group
inconsistent and impossible to schedule. The core perf code assumes that
these are consistent across (successfully intialised) groups.
Core perf code only verifies this when moving SW events into a HW
context. Thus, we can violate this requirement for pure SW groups and
pure HW groups, unless the relevant PMU driver happens to perform this
verification itself. These mismatched groups subsequently wreak havoc
elsewhere.
For example, we handle watchpoints as SW events, and reserve watchpoint
HW on a per-CPU basis at pmu::event_init() time to ensure that any event
that is initialised is guaranteed to have a slot at pmu::add() time.
However, the core code only checks the group leader's cpu filter (via
event_filter_match()), and can thus install follower events onto CPUs
violating thier (mismatched) CPU filters, potentially installing them
into a CPU without sufficient reserved slots.
This can be triggered with the below test case, resulting in warnings
from arch backends.
int main(int argc, char *argv[])
{
int leader, ret;
cpu_set_t cpus;
/*
* Force use of CPU0 to ensure our CPU0-bound events get scheduled.
*/
CPU_ZERO(&cpus);
CPU_SET(0, &cpus);
ret = sched_setaffinity(0, sizeof(cpus), &cpus);
if (ret) {
printf("Unable to set cpu affinity\n");
return 1;
}
/* open leader event, bound to this task, CPU0 only */
leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
if (leader < 0) {
printf("Couldn't open leader: %d\n", leader);
return 1;
}
/*
* Open a follower event that is bound to the same task, but a
* different CPU. This means that the group should never be possible to
* schedule.
*/
ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
if (ret < 0) {
printf("Couldn't open mismatched follower: %d\n", ret);
return 1;
} else {
printf("Opened leader/follower with mismastched CPUs\n");
}
/*
* Open as many independent events as we can, all bound to the same
* task, CPU0 only.
*/
do {
ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
} while (ret >= 0);
/*
* Force enable/disble all events to trigger the erronoeous
* installation of the follower event.
*/
printf("Opened all events. Toggling..\n");
for (;;) {
prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
}
return 0;
}
Fix this by validating this requirement regardless of whether we're
moving events.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhou Chengming <zhouchengming1@huawei.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Thu, 24 Aug 2017 22:48:38 +0000 (15:48 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull more rdma fixes from Doug Ledford:
"Well, I thought we were going to be done for this -rc cycle. I should
have known better than to say so though.
We have four additional items that trickled in.
One was a simple mistake on my part. I took a patch into my for-next
thinking that the issue was less severe than it was. I was then
notified that it needed to be in my -rc area instead.
The other three were just found late in testing.
Summary:
- One core fix accidentally applied first to for-next and then cherry
picked back because it needed to be in the -rc cycles instead
- Another core fix
- Two mlx5 fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/mlx5: Always return success for RoCE modify port
IB/mlx5: Fix Raw Packet QP event handler assignment
IB/core: Avoid accessing non-allocated memory when inferring port type
RDMA/uverbs: Initialize cq_context appropriately
Linus Torvalds [Thu, 24 Aug 2017 21:56:20 +0000 (14:56 -0700)]
Merge tag 'acpi-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two recent regressions (in ACPICA and in the ACPI EC driver)
and one bug in code introduced during the 4.12 cycle (ACPI device
properties library routine).
Specifics:
- Fix a regression in the ACPI EC driver causing a kernel to crash
during initialization on some systems due to a code ordering issue
exposed by a recent change (Lv Zheng).
- Fix a recent regression in ACPICA due to a change of the behavior
of a library function in a way that is not backwards compatible
with some existing callers of it (Rafael Wysocki).
- Fix a coding mistake in a library function related to the handling
of ACPI device properties introduced during the 4.12 cycle (Sakari
Ailus)"
* tag 'acpi-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: device property: Fix node lookup in acpi_graph_get_child_prop_value()
ACPICA: Fix acpi_evaluate_object_typed()
ACPI: EC: Fix regression related to wrong ECDT initialization order
Linus Torvalds [Thu, 24 Aug 2017 21:22:27 +0000 (14:22 -0700)]
Merge tag 'kbuild-fixes-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- fix linker script regression caused by dead code elimination support
- fix typos and outdated comments
- specify kselftest-clean as a PHONY target
- fix "make dtbs_install" when $(srctree) includes shell special
characters like '~'
- Move -fshort-wchar to the global option list because defining it
partially emits warnings
* tag 'kbuild-fixes-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: update comments of Makefile.asm-generic
kbuild: Do not use hyphen in exported variable name
Makefile: add kselftest-clean to PHONY target list
Kbuild: use -fshort-wchar globally
fixdep: trivial: typo fix and correction
kbuild: trivial cleanups on the comments
kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured
Linus Torvalds [Thu, 24 Aug 2017 21:08:22 +0000 (14:08 -0700)]
Merge tag 'trace-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various bug fixes:
- Two small memory leaks in error paths.
- A missed return error code on an error path.
- A fix to check the tracing ring buffer CPU when it doesn't exist
(caused by setting maxcpus on the command line that is less than
the actual number of CPUs, and then onlining them manually).
- A fix to have the reset of boot tracers called by lateinit_sync()
instead of just lateinit(). As some of the tracers register via
lateinit(), and if the clear happens before the tracer is
registered, it will never start even though it was told to via the
kernel command line"
* tag 'trace-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix freeing of filter in create_filter() when set_str is false
tracing: Fix kmemleak in tracing_map_array_free()
ftrace: Check for null ret_stack on profile function graph entry function
ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU
tracing: Missing error code in tracer_alloc_buffers()
tracing: Call clear_boot_tracer() at lateinit_sync
Linus Torvalds [Thu, 24 Aug 2017 21:01:18 +0000 (14:01 -0700)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"A small number of bugfixes, again nothing serious.
- Alexander Dahl found multiple bugs in the Atmel memory interface
driver
- A randconfig build fix for at91 was incomplete, the second attempt
fixes the remaining corner case
- One fix for the TI Keystone queue handler
- The Odroid XU4 HDMI port (added in 4.13) needs a small DT fix"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: exynos: add needs-hpd for Odroid-XU3/4
ARM: at91: don't select CONFIG_ARM_CPU_SUSPEND for old platforms
soc: ti: knav: Add a NULL pointer check for kdev in knav_pool_create
memory: atmel-ebi: Fix smc cycle xlate converter
memory: atmel-ebi: Allow t_DF timings of zero ns
memory: atmel-ebi: Fix smc timing return value evaluation
When /dev/ptmx (as opposed to /dev/pts/ptmx) is opened the wrong
vfsmount is passed to dentry_open. Which results in the kernel displaying
the wrong pathname for the peer.
The second is simply by caching the vfsmount and dentry of the peer it leaves
them open, in a way they were not previously Which because of the inreased
reference counts can cause unnecessary behaviour differences resulting in
regressions.
To fix these move the ioctl into tty_io.c at a generic level allowing
the ioctl to have access to the struct file on which the ioctl is
being called. This allows the path of the slave to be derived when
opening the slave through TIOCGPTPEER instead of requiring the path to
the slave be cached. Thus removing the need for caching the path.
A new function devpts_ptmx_path is factored out of devpts_acquire and
used to implement a function devpts_mntget. The new function devpts_mntget
takes a filp to perform the lookup on and fsi so that it can confirm
that the superblock that is found by devpts_ptmx_path is the proper superblock.
v2: Lots of fixes to make the code actually work
v3: Suggestions by Linus
- Removed the unnecessary initialization of filp in ptm_open_peer
- Simplified devpts_ptmx_path as gotos are no longer required
[ This is the fix for the issue that was reverted in commit 143c97cc6529, but this time without breaking 'pbuilder' due to
increased reference counts - Linus ]
Fixes: 54ebbfb16034 ("tty: add TIOCGPTPEER ioctl") Reported-by: Christian Brauner <christian.brauner@canonical.com> Reported-and-tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Majd Dibbiny [Wed, 23 Aug 2017 05:35:42 +0000 (08:35 +0300)]
IB/mlx5: Always return success for RoCE modify port
CM layer calls ib_modify_port() regardless of the link layer.
For the Ethernet ports, qkey violation and Port capabilities
are meaningless. Therefore, always return success for ib_modify_port
calls on the Ethernet ports.
Cc: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Noa Osherovich [Wed, 23 Aug 2017 05:35:40 +0000 (08:35 +0300)]
IB/core: Avoid accessing non-allocated memory when inferring port type
Commit 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types")
introduced the concept of type in ah_attr:
* During ib_register_device, each port is checked for its type which
is stored in ib_device's port_immutable array.
* During uverbs' modify_qp, the type is inferred using the port number
in ib_uverbs_qp_dest struct (address vector) by accessing the
relevant port_immutable array and the type is passed on to
providers.
IB spec (version 1.3) enforces a valid port value only in Reset to
Init. During Init to RTR, the address vector must be valid but port
number is not mentioned as a field in the address vector, so its
value is not validated, which leads to accesses to a non-allocated
memory when inferring the port type.
Save the real port number in ib_qp during modify to Init (when the
comp_mask indicates that the port number is valid) and use this value
to infer the port type.
Avoid copying the address vector fields if the matching bit is not set
in the attr_mask. Address vector can't be modified before the port, so
no valid flow is affected.
Omar Sandoval [Wed, 23 Aug 2017 06:45:59 +0000 (23:45 -0700)]
Btrfs: fix blk_status_t/errno confusion
This fixes several instances of blk_status_t and bare errno ints being
mixed up, some of which are real bugs.
In the normal case, 0 matches BLK_STS_OK, so we don't observe any
effects of the missing conversion, but in case of errors or passes
through the repair/retry paths, the errors get mixed up.
The changes were identified using 'sparse', we don't have reports of the
buggy behaviour.
Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
The function create_filter() is passed a 'filterp' pointer that gets
allocated, and if "set_str" is true, it is up to the caller to free it, even
on error. The problem is that the pointer is not freed by create_filter()
when set_str is false. This is a bug, and it is not up to the caller to free
the filter on error if it doesn't care about the string.
Link: http://lkml.kernel.org/r/1502705898-27571-2-git-send-email-chuhu@redhat.com Cc: stable@vger.kernel.org Fixes: 38b78eb85 ("tracing: Factorize filter creation") Reported-by: Chunyu Hu <chuhu@redhat.com> Tested-by: Chunyu Hu <chuhu@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
ftrace: Check for null ret_stack on profile function graph entry function
There's a small race when function graph shutsdown and the calling of the
registered function graph entry callback. The callback must not reference
the task's ret_stack without first checking that it is not NULL. Note, when
a ret_stack is allocated for a task, it stays allocated until the task exits.
The problem here, is that function_graph is shutdown, and a new task was
created, which doesn't have its ret_stack allocated. But since some of the
functions are still being traced, the callbacks can still be called.
The normal function_graph code handles this, but starting with commit 8861dd303c ("ftrace: Access ret_stack->subtime only in the function
profiler") the profiler code references the ret_stack on function entry, but
doesn't check if it is NULL first.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196611 Cc: stable@vger.kernel.org Fixes: 8861dd303c ("ftrace: Access ret_stack->subtime only in the function profiler") Reported-by: lilydjwg@gmail.com Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Ingo Molnar [Thu, 24 Aug 2017 08:12:59 +0000 (10:12 +0200)]
Merge tag 'perf-core-for-mingo-4.14-20170823' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- Expression parser enhancements for metrics (Andi Kleen)
- Fix buffer overflow while freeing events in 'perf stat' (Andi Kleen)
- Fix static linking with elfutils's libdf and with libunwind
in Debian/Ubuntu (Konstantin Khlebnikov)
- Tighten detection of BPF events, avoiding matching some other PMU
events such as 'cpu/uops_executed.core,cmask=1/' as a .c source
file that ended up being considered a BPF event (Andi Kleen)
- Add Skylake server uncore JSON vendor events (Andi Kleen)
- Add support for printing new mem_info encodings, including
'perf test' checks (Andi Kleen)
- Really install manpages via 'make install-man' (Konstantin Khlebnikov)
- Fix documentation for perf_event_paranoid and perf_event_mlock_kb
sysctls (Konstantin Khlebnikov)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
It turns out that while fixing the ptmx file descriptor to have the
correct 'struct path' to the associated slave pty is a really good
thing, it breaks some user space tools for a very annoying reason.
The problem is that /dev/ptmx and its associated slave pty (/dev/pts/X)
are on different mounts. That was what caused us to have the wrong path
in the first place (we would mix up the vfsmount of the 'ptmx' node,
with the dentry of the pty slave node), but it also means that now while
we use the right vfsmount, having the pty master open also keeps the pts
mount busy.
And it turn sout that that makes 'pbuilder' very unhappy, as noted by
Stefan Lippers-Hollmann:
"This patch introduces a regression for me when using pbuilder
0.228.7[2] (a helper to build Debian packages in a chroot and to
create and update its chroots) when trying to umount /dev/ptmx (inside
the chroot) on Debian/ unstable (full log and pbuilder configuration
file[3] attached).
[...]
Setting up build-essential (12.3) ...
Processing triggers for libc-bin (2.24-15) ...
I: unmounting dev/ptmx filesystem
W: Could not unmount dev/ptmx: umount: /var/cache/pbuilder/build/1340/dev/ptmx: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)"
apparently pbuilder tries to unmount the /dev/pts filesystem while still
holding at least one master node open, which is arguably not very nice,
but we don't break user space even when fixing other bugs.
So this commit has to be reverted.
I'll try to figure out a way to avoid caching the path to the slave pty
in the master pty. The only thing that actually wants that slave pty
path is the "TIOCGPTPEER" ioctl, and I think we could just recreate the
path at that time.
Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Cc: Eric W Biederman <ebiederm@xmission.com> Cc: Christian Brauner <christian.brauner@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hans Verkuil [Wed, 23 Aug 2017 16:24:50 +0000 (18:24 +0200)]
ARM: dts: exynos: add needs-hpd for Odroid-XU3/4
CEC support was added for Exynos5 in 4.13, but for the Odroids we need to set
'needs-hpd' as well since CEC is disabled when there is no HDMI hotplug signal,
just as for the exynos4 Odroid-U3.
This is due to the level-shifter that is disabled when there is no HPD, thus
blocking the CEC signal as well. Same close-but-no-cigar board design as the
Odroid-U3.
Tested with my Odroid XU4.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Linus Torvalds [Wed, 23 Aug 2017 19:05:46 +0000 (12:05 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Late arm64 fixes.
They fix very early boot failures with KASLR where the early mapping
of the kernel is incorrect, so the failure mode looks like a hang with
no output. There's also a signal-handling fix when a uaccess routine
faults with a fatal signal pending, which could be used to create
unkillable user tasks using userfaultfd and finally a state leak fix
for the floating pointer registers across a call to exec().
We're still seeing some random issues crop up (inode memory corruption
and spinlock recursion) but we've not managed to reproduce things
reliably enough to debug or bisect them yet.
Summary:
- Fix very early boot failures with KASLR enabled
- Fix fatal signal handling on userspace access from kernel
- Fix leakage of floating point register state across exec()"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kaslr: Adjust the offset to avoid Image across alignment boundary
arm64: kaslr: ignore modulo offset when validating virtual displacement
arm64: mm: abort uaccess retries upon fatal signal
arm64: fpsimd: Prevent registers leaking across exec
Linus Torvalds [Wed, 23 Aug 2017 18:43:38 +0000 (11:43 -0700)]
Merge tag 'gpio-v4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Here are the (hopefully) last GPIO fixes for v4.13:
- an important core fix to reject invalid GPIOs *before* trying to
obtain a GPIO descriptor for it.
- a driver fix for the mvebu driver IRQ handling"
* tag 'gpio-v4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: mvebu: Fix cause computation in irq handler
gpio: reject invalid gpio before getting gpio_desc
Linus Torvalds [Wed, 23 Aug 2017 18:34:40 +0000 (11:34 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six minor and error leg fixes, plus one major change: the reversion of
scsi-mq as the default.
We're doing the latter temporarily (with a backport to stable) to give
us time to fix all the issues that turned up with this default before
trying again"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: cxgb4i: call neigh_event_send() to update MAC address
Revert "scsi: default to scsi-mq"
scsi: sd_zbc: Write unlock zone from sd_uninit_cmnd()
scsi: aacraid: Fix out of bounds in aac_get_name_resp
scsi: csiostor: fail probe if fw does not support FCoE
scsi: megaraid_sas: fix error handle in megasas_probe_one
Arnd Bergmann [Wed, 23 Aug 2017 14:46:15 +0000 (16:46 +0200)]
ARM: at91: don't select CONFIG_ARM_CPU_SUSPEND for old platforms
My previous patch fixed a link error for all at91 platforms when
CONFIG_ARM_CPU_SUSPEND was not set, however this caused another
problem on a configuration that enabled CONFIG_ARCH_AT91 but none
of the individual SoCs, and that also enabled CPU_ARM720 as
the only CPU:
warning: (ARCH_AT91 && SOC_IMX23 && SOC_IMX28 && ARCH_PXA && MACH_MVEBU_V7 && SOC_IMX6 && ARCH_OMAP3 && ARCH_OMAP4 && SOC_OMAP5 && SOC_AM33XX && SOC_DRA7XX && ARCH_EXYNOS3 && ARCH_EXYNOS4 && EXYNOS5420_MCPM && EXYNOS_CPU_SUSPEND && ARCH_VEXPRESS_TC2_PM && ARM_BIG_LITTLE_CPUIDLE && ARM_HIGHBANK_CPUIDLE && QCOM_PM) selects ARM_CPU_SUSPEND which has unmet direct dependencies (ARCH_SUSPEND_POSSIBLE)
arch/arm/kernel/sleep.o: In function `cpu_resume':
(.text+0xf0): undefined reference to `cpu_arm720_suspend_size'
arch/arm/kernel/suspend.o: In function `__cpu_suspend_save':
suspend.c:(.text+0x134): undefined reference to `cpu_arm720_do_suspend'
This improves the hack some more by only selecting ARM_CPU_SUSPEND
for the part that requires it, and changing pm.c to drop the
contents of unused init functions so we no longer refer to
cpu_resume on at91 platforms that don't need it.
Sakari Ailus [Tue, 22 Aug 2017 20:39:58 +0000 (23:39 +0300)]
ACPI: device property: Fix node lookup in acpi_graph_get_child_prop_value()
acpi_graph_get_child_prop_value() is intended to find a child node with a
certain property value pair. The check
if (!fwnode_property_read_u32(fwnode, prop_name, &nr))
continue;
is faulty: fwnode_property_read_u32() returns zero on success, not on
failure, leading to comparing values only if the searched property was not
found.
Moreover, the check is made against the parent device node instead of
the child one as it should be.
Fixes: 79389a83bc38 (ACPI / property: Add support for remote endpoints) Reported-by: Hyungwoo Yang <hyungwoo.yang@intel.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: 4.12+ <stable@vger.kernel.org> # 4.12+
[ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 2d2a954375a0 (ACPICA: Update two error messages to emit
control method name) causes acpi_evaluate_object_typed() to fail
if its pathname argument is NULL, but some callers of that function
in the kernel, particularly acpi_nondev_subnode_data_ok(), pass
NULL as pathname to it and expect it to work.
For this reason, make acpi_evaluate_object_typed() check if its
pathname argument is NULL and fall back to using the pathname of
its handle argument if that is the case.
Reported-by: Sakari Ailus <sakari.ailus@intel.com> Tested-by: Yang, Hyungwoo <hyungwoo.yang@intel.com> Fixes: 2d2a954375a0 (ACPICA: Update two error messages to emit control method name) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Bharat Potnuri [Tue, 1 Aug 2017 05:28:35 +0000 (10:58 +0530)]
RDMA/uverbs: Initialize cq_context appropriately
Initializing cq_context with ev_queue in create_cq(), leads to NULL pointer
dereference in ib_uverbs_comp_handler(), if application doesnot use completion
channel. This patch fixes the cq_context initialization.
Fixes: 1e7710f3f65 ("IB/core: Change completion channel to use the reworked") Cc: stable@vger.kernel.org # 4.12 Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
(cherry picked from commit 699a2d5b1b880b4e4e1c7d55fa25659322cf5b51)
Linus Torvalds [Tue, 22 Aug 2017 17:21:05 +0000 (10:21 -0700)]
Merge tag 'mfd-fixes-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD fix from Lee Jones:
"Revert duplicate commit in da9062-core"
* tag 'mfd-fixes-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
Revert "mfd: da9061: Fix to remove BBAT_CONT register from chip model"
Catalin Marinas [Tue, 22 Aug 2017 14:39:00 +0000 (15:39 +0100)]
arm64: kaslr: Adjust the offset to avoid Image across alignment boundary
With 16KB pages and a kernel Image larger than 16MB, the current
kaslr_early_init() logic for avoiding mappings across swapper table
boundaries fails since increasing the offset by kimg_sz just moves the
problem to the next boundary.
This patch rounds the offset down to (1 << SWAPPER_TABLE_SHIFT) if the
Image crosses a PMD_SIZE boundary.
Fixes: afd0e5a87670 ("arm64: kaslr: Fix up the kernel image alignment") Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Ard Biesheuvel [Fri, 18 Aug 2017 17:42:30 +0000 (18:42 +0100)]
arm64: kaslr: ignore modulo offset when validating virtual displacement
In the KASLR setup routine, we ensure that the early virtual mapping
of the kernel image does not cover more than a single table entry at
the level above the swapper block level, so that the assembler routines
involved in setting up this mapping can remain simple.
In this calculation we add the proposed KASLR offset to the values of
the _text and _end markers, and reject it if they would end up falling
in different swapper table sized windows.
However, when taking the addresses of _text and _end, the modulo offset
(the physical displacement modulo 2 MB) is already accounted for, and
so adding it again results in incorrect results. So disregard the modulo
offset from the calculation.
Mark Rutland [Tue, 11 Jul 2017 14:19:22 +0000 (15:19 +0100)]
arm64: mm: abort uaccess retries upon fatal signal
When there's a fatal signal pending, arm64's do_page_fault()
implementation returns 0. The intent is that we'll return to the
faulting userspace instruction, delivering the signal on the way.
However, if we take a fatal signal during fixing up a uaccess, this
results in a return to the faulting kernel instruction, which will be
instantly retried, resulting in the same fault being taken forever. As
the task never reaches userspace, the signal is not delivered, and the
task is left unkillable. While the task is stuck in this state, it can
inhibit the forward progress of the system.
To avoid this, we must ensure that when a fatal signal is pending, we
apply any necessary fixup for a faulting kernel instruction. Thus we
will return to an error path, and it is up to that code to make forward
progress towards delivering the fatal signal.
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Steve Capper <steve.capper@arm.com> Tested-by: Steve Capper <steve.capper@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: James Morse <james.morse@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Dave Martin [Fri, 18 Aug 2017 15:57:01 +0000 (16:57 +0100)]
arm64: fpsimd: Prevent registers leaking across exec
There are some tricky dependencies between the different stages of
flushing the FPSIMD register state during exec, and these can race
with context switch in ways that can cause the old task's regs to
leak across. In particular, a context switch during the memset() can
cause some of the task's old FPSIMD registers to reappear.
Disabling preemption for this small window would be no big deal for
performance: preemption is already disabled for similar scenarios
like updating the FPSIMD registers in sigreturn.
So, instead of rearranging things in ways that might swap existing
subtle bugs for new ones, this patch just disables preemption
around the FPSIMD state flushing so that races of this type can't
occur here. This brings fpsimd_flush_thread() into line with other
code paths.
Cc: stable@vger.kernel.org Fixes: 674c242c9323 ("arm64: flush FP/SIMD state correctly after execve()") Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* libunwind-x86_64 must be linked before libunwind
* libunwind requires liblzma
* static libunwind conflicts with static libgcc_eh
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/150322917247.129799.14247751517961953155.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Fix static linking with libdw from elfutils
Fix feature test for static libdw: link required dependencies. Backends
of libebl are not statically linked thus libdl is required.
In Debian/Ubuntu libdw-dev includes libebl.a starting from 0.166-1.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/150322916720.129772.7959925864494283854.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf: Fix documentation for sysctls perf_event_paranoid and perf_event_mlock_kb
Fix misprint CAP_IOC_LOCK -> CAP_IPC_LOCK. This capability have nothing
to do with raw tracepoints. This part is about bypassing mlock limits.
Sysctl kernel.perf_event_paranoid = -1 allows raw and ftrace function
tracepoints without CAP_SYS_ADMIN.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/150322916080.129746.11285255474738558340.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andi Kleen [Wed, 16 Aug 2017 22:21:55 +0000 (15:21 -0700)]
perf tools: Add support for printing new mem_info encodings
Add decoding for the new "lvlx" and "snoopx" meminfo fields added
earlier to the kernel so that "perf mem report" and other tools can
print it properly.
v2: Merge with persistent memory patch.
Switch to new bit encoding for each combination.
Andi Kleen [Fri, 10 Mar 2017 20:51:38 +0000 (12:51 -0800)]
perf vendor events: Add Skylake server uncore event list
Add JSON uncore events for Skylake Server to perf.
Based on JSON list V1.01
This is a much fuller list than with earlier uncores, including
more low level (but also harder to understand) events. It does not
include the "experimential" events. The previous
high level metric (LLC_* etc.) are still available when applicable.
C state power events are not included at this point.
This is useful to fix up problems caused by multiplexing.
- Support | & ^ operators
- Minor cleanups and fixes
- Support an \ escape for operators. This allows to specify event names
like c2-residency
- Support @ as an alternative for / to be able to specify pmus without
conflicts with operators (like msr/tsc/ as msr@tsc@)
Andi Kleen [Fri, 11 Aug 2017 23:26:19 +0000 (16:26 -0700)]
perf bpf: Tighten detection of BPF events
perf stat -e cpu/uops_executed.core,cmask=1/
would be detected as a BPF source event because the .c matches the .c
source BPF pattern.
v2:
Originally I tried to use lex lookahead, but it doesn't seem to work.
This now extends the BPF pattern to match longer events, but then does
an extra check in the C code to reject BPF matches that do not end with
.c/.o/.obj
This uses REJECT, which makes the flex scanner slower, but that
shouldn't be a big problem for the perf events.
It continues doing what was expected, i.e. identifying
/home/acme/bpf/tracepoint.c as a BPF event and activates the clang
machinery to build an eBPF object and then uses sys_bpf() to hook it up
to the raw_syscalls:sys_enter tracepoint, etc.
Andi forgot to add Wang to the CC list, fix it.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20170811232634.30465-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andi Kleen [Fri, 11 Aug 2017 23:26:17 +0000 (16:26 -0700)]
perf evsel: Fix buffer overflow while freeing events
Fix buffer overflow for:
% perf stat -e msr/tsc/,cstate_core/c7-residency/ true
that causes glibc free list corruption. For some reason it doesn't
trigger in valgrind, but it is visible in AS:
=================================================================
==32681==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000003f5c at pc 0x0000005671ef bp 0x7ffdaaac9ac0 sp 0x7ffdaaac9ab0
READ of size 4 at 0x603000003f5c thread T0
#0 0x5671ee in perf_evsel__close_fd util/evsel.c:1196
#1 0x56c57a in perf_evsel__close util/evsel.c:1717
#2 0x55ed5f in perf_evlist__close util/evlist.c:1631
#3 0x4647e1 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:749
#4 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
#5 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
#6 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
#7 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
#8 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
#9 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
#10 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
#11 0x428419 in _start (/home/ak/hle/obj-perf/perf+0x428419)
0x603000003f5c is located 0 bytes to the right of 28-byte region [0x603000003f40,0x603000003f5c)
allocated by thread T0 here:
#0 0x7f0675139020 in calloc (/lib64/libasan.so.3+0xc7020)
#1 0x648a2d in zalloc util/util.h:23
#2 0x648a88 in xyarray__new util/xyarray.c:9
#3 0x566419 in perf_evsel__alloc_fd util/evsel.c:1039
#4 0x56b427 in perf_evsel__open util/evsel.c:1529
#5 0x56c620 in perf_evsel__open_per_thread util/evsel.c:1730
#6 0x461dea in create_perf_stat_counter /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:263
#7 0x4637d7 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:600
#8 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
#9 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
#10 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
#11 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
#12 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
#13 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
#14 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
The event is allocated with cpus == 1, but freed with cpus == real number
When the evsel close function walks the file descriptors it exceeds the
fd xyarray boundaries and reads random memory.
v2:
Now that xyarrays save their original dimensions we can use these to
iterate the two dimensional fd arrays. Fix some users (close, ioctl) in
evsel.c to use these fields directly. This allows simplifying the code
and dropping quite a few function arguments. Adjust all callers by
removing the unneeded arguments.
The actual perf event reading still uses the original values from the
evsel list.
Ingo Molnar [Tue, 22 Aug 2017 10:16:39 +0000 (12:16 +0200)]
Merge tag 'perf-core-for-mingo-4.14-20170821' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
- Support --show-nr-samples in annotate's --stdio and --tui, using
the existing 't' toggle to circulate 'percent', 'total-period' and
'nr-samples' as the first column (Taeung Song)
- Support FCMask and PortMask in JSON vendor events (Andi Kleen)
- Fix off by one string allocation problem in 'perf trace' (Arnaldo Carvalho de Melo)
- Use just one parse events state struct in yyparse(), fixing one
reported segfault when a routine received a different data struct,
smaller than the one it expected to use (Arnaldo Carvalho de Melo)
- Remove unused cpu_relax() macros, they stopped being used when
tools/perf lived in Documentation/ (Arnaldo Carvalho de Melo)
- Fix double file test in libbpf's Makefile (Daniel DÃaz):
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes: b77eb79acca3 ("mfd: da9061: Fix to remove BBAT_CONT register from chip model") Reported-by: Steve Twiss <stwiss.opensource@diasemi.com> Reviewed-by: Steve Twiss <stwiss.opensource@diasemi.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
Thomas Petazzoni [Sun, 13 Aug 2017 21:14:58 +0000 (23:14 +0200)]
sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus()
When building the kernel for Sparc using gcc 7.x, the build fails
with:
arch/sparc/kernel/pcic.c: In function ‘pcibios_fixup_bus’:
arch/sparc/kernel/pcic.c:647:8: error: ‘cmd’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
cmd |= PCI_COMMAND_IO;
^~
I.e, the code assumes that pcic_read_config() will always initialize
cmd. But it's not the case. Looking at pcic_read_config(), if
bus->number is != 0 or if the size is not one of 1, 2 or 4, *val will
not be initialized.
As a simple fix, we initialize cmd to zero at the beginning of
pcibios_fixup_bus.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 21 Aug 2017 20:30:36 +0000 (13:30 -0700)]
Merge tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- PAE40 related updates
- SLC errata for region ops
- intc line masking by default
* tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
arc: Mask individual IRQ lines during core INTC init
ARCv2: PAE40: set MSB even if !CONFIG_ARC_HAS_PAE40 but PAE exists in SoC
ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
ARC: dma: implement dma_unmap_page and sg variant
ARCv2: SLC: Make sure busy bit is set properly for region ops
ARC: [plat-sim] Include this platform unconditionally
ARC: [plat-axs10x]: prepare dts files for enabling PAE40 on axs103
ARC: defconfig: Cleanup from old Kconfig options
2) Fix timer access to freed object in dccp, from Eric Dumazet.
3) Use kmalloc_array() in ptr_ring to avoid overflow cases which are
triggerable by userspace. Also from Eric Dumazet.
4) Fix infinite loop in unmapping cleanup of nfp driver, from Colin Ian
King.
5) Correct datagram peek handling of empty SKBs, from Matthew Dawson.
6) Fix use after free in TIPC, from Eric Dumazet.
7) When replacing a route in ipv6 we need to reset the round robin
pointer, from Wei Wang.
8) Fix bug in pci_find_pcie_root_port() which was unearthed by the
relaxed ordering changes, from Thierry Redding. I made sure to get
an explicit ACK from Bjorn this time around :-)
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
ipv6: repair fib6 tree in failure case
net_sched: fix order of queue length updates in qdisc_replace()
tools lib bpf: improve warning
switchdev: documentation: minor typo fixes
bpf, doc: also add s390x as arch to sysctl description
net: sched: fix NULL pointer dereference when action calls some targets
rxrpc: Fix oops when discarding a preallocated service call
irda: do not leak initialized list.dev to userspace
net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
PCI: Allow PCI express root ports to find themselves
tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set
ipv6: reset fn->rr_ptr when replacing route
sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
tipc: fix use-after-free
tun: handle register_netdevice() failures properly
datagram: When peeking datagrams with offset < 0 don't skip empty skbs
bpf, doc: improve sysctl knob description
netxen: fix incorrect loop counter decrement
nfp: fix infinite loop on umapping cleanup
...
Oleg Nesterov [Mon, 21 Aug 2017 15:35:02 +0000 (17:35 +0200)]
pids: make task_tgid_nr_ns() safe
This was reported many times, and this was even mentioned in commit 52ee2dfdd4f5 ("pids: refactor vnr/nr_ns helpers to make them safe") but
somehow nobody bothered to fix the obvious problem: task_tgid_nr_ns() is
not safe because task->group_leader points to nowhere after the exiting
task passes exit_notify(), rcu_read_lock() can not help.
We really need to change __unhash_process() to nullify group_leader,
parent, and real_parent, but this needs some cleanups. Until then we
can turn task_tgid_nr_ns() into another user of __task_pid_nr_ns() and
fix the problem.
Current max_register setting breaks reading nvram on certain chips and
also reading the standard registers on RX8130 where register map starts
at 0x10.
soc: ti: knav: Add a NULL pointer check for kdev in knav_pool_create
knav_pool_create is an exported function. In the event of a call
before knav_queue_probe, we encounter a NULL pointer dereference
in the following line. Hence return -EPROBE_DEFER to the caller till
the kdev pointer is non-NULL.
Wei Wang [Sat, 19 Aug 2017 00:14:49 +0000 (17:14 -0700)]
ipv6: repair fib6 tree in failure case
In fib6_add(), it is possible that fib6_add_1() picks an intermediate
node and sets the node's fn->leaf to NULL in order to add this new
route. However, if fib6_add_rt2node() fails to add the new
route for some reason, fn->leaf will be left as NULL and could
potentially cause crash when fn->leaf is accessed in fib6_locate().
This patch makes sure fib6_repair_tree() is called to properly repair
fn->leaf in the above failure case.
net_sched: fix order of queue length updates in qdisc_replace()
This important to call qdisc_tree_reduce_backlog() after changing queue
length. Parent qdisc should deactivate class in ->qlen_notify() called from
qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero.
Missed class deactivations leads to crashes/warnings at picking packets
from empty qdisc and corrupting state at reactivating this class in future.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: 86a7996cc8a0 ("net_sched: introduce qdisc_replace() helper") Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Leblond [Sun, 20 Aug 2017 19:48:14 +0000 (21:48 +0200)]
tools lib bpf: improve warning
Signed-off-by: Eric Leblond <eric@regit.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sun, 20 Aug 2017 22:26:03 +0000 (00:26 +0200)]
bpf, doc: also add s390x as arch to sysctl description
Looks like this was accidentally missed, so still add s390x
as supported eBPF JIT arch to bpf_jit_enable.
Fixes: 014cd0a368dc ("bpf: Update sysctl documentation to list all supported architectures") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sat, 19 Aug 2017 21:30:02 +0000 (22:30 +0100)]
kbuild: Do not use hyphen in exported variable name
This definition in Makefile.dtbinst:
export dtbinst-root ?= $(obj)
should define and export dtbinst-root when handling the root dts
directory, and do nothing in the subdirectories. However some shells,
including dash, will not pass through environment variables whose name
includes a hyphen. Usually GNU make does not use a shell to recurse,
but if e.g. $(srctree) contains '~' it will use a shell here.
Rename the variable to dtbinst_root.
References: https://bugs.debian.org/833561 Fixes: 323a028d39cdi ("dts, kbuild: Implement support for dtb vendor subdirs") Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Commit 971a69db7dc0 ("Xen: don't warn about 2-byte wchar_t in efi")
added the --no-wchar-size-warning to the Makefile to avoid this
harmless warning:
arm-linux-gnueabi-ld: warning: drivers/xen/efi.o uses 2-byte wchar_t yet the output is to use 4-byte wchar_t; use of wchar_t values across objects may fail
Changing kbuild to use thin archives instead of recursive linking
unfortunately brings the same warning back during the final link.
The kernel does not use wchar_t string literals at this point, and
xen does not use wchar_t at all (only efi_char16_t), so the flag
has no effect, but as pointed out by Jan Beulich, adding a wchar_t
string literal would be bad here.
Since wchar_t is always defined as u16, independent of the toolchain
default, always passing -fshort-wchar is correct and lets us
remove the Xen specific hack along with fixing the warning.
Link: https://patchwork.kernel.org/patch/9275217/ Fixes: 971a69db7dc0 ("Xen: don't warn about 2-byte wchar_t in efi") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Linus Torvalds [Sun, 20 Aug 2017 20:26:27 +0000 (13:26 -0700)]
Sanitize 'move_pages()' permission checks
The 'move_paghes()' system call was introduced long long ago with the
same permission checks as for sending a signal (except using
CAP_SYS_NICE instead of CAP_SYS_KILL for the overriding capability).
That turns out to not be a great choice - while the system call really
only moves physical page allocations around (and you need other
capabilities to do a lot of it), you can check the return value to map
out some the virtual address choices and defeat ASLR of a binary that
still shares your uid.
So change the access checks to the more common 'ptrace_may_access()'
model instead.
This tightens the access checks for the uid, and also effectively
changes the CAP_SYS_NICE check to CAP_SYS_PTRACE, but it's unlikely that
anybody really _uses_ this legacy system call any more (we hav ebetter
NUMA placement models these days), so I expect nobody to notice.
Famous last words.
Reported-by: Otto Ebeling <otto.ebeling@iki.fi> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Willy Tarreau <w@1wt.eu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 20 Aug 2017 16:36:52 +0000 (09:36 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Another pile of small fixes and updates for x86:
- Plug a hole in the SMAP implementation which misses to clear AC on
NMI entry
- Fix the norandmaps/ADDR_NO_RANDOMIZE logic so the command line
parameter works correctly again
- Use the proper accessor in the startup64 code for next_early_pgt to
prevent accessing of invalid addresses and faulting in the early
boot code.
- Prevent CPU hotplug lock recursion in the MTRR code
- Unbreak CPU0 hotplugging
- Rename overly long CPUID bits which got introduced in this cycle
- Two commits which mark data 'const' and restrict the scope of data
and functions to file scope by making them 'static'"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Constify attribute_group structures
x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'
x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks
x86: Fix norandmaps/ADDR_NO_RANDOMIZE
x86/mtrr: Prevent CPU hotplug lock recursion
x86: Mark various structures and functions as 'static'
x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag
x86/smpboot: Unbreak CPU0 hotplug
x86/asm/64: Clear AC on NMI entries
Linus Torvalds [Sun, 20 Aug 2017 16:20:57 +0000 (09:20 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Two fixes for the perf subsystem:
- Fix an inconsistency of RDPMC mm struct tagging across exec() which
causes RDPMC to fault.
- Correct the timestamp mechanics across IOC_DISABLE/ENABLE which
causes incorrect timestamps and total time calculations"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix time on IOC_ENABLE
perf/x86: Fix RDPMC vs. mm_struct tracking
Linus Torvalds [Sun, 20 Aug 2017 16:07:56 +0000 (09:07 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A pile of smallish changes all over the place:
- Add a missing ISB in the GIC V1 driver
- Remove an ACPI version check in the GIC V3 ITS driver
- Add the missing irq_pm_shutdown function for BRCMSTB-L2 to avoid
spurious wakeups
- Remove the artifical limitation of ITS instances to the number of
NUMA nodes which prevents utilizing the ITS hardware correctly
- Prevent a infinite parsing loop in the GIC-V3 ITS/MSI code
- Honour the force affinity argument in the GIC-V3 driver which is
required to make perf work correctly
- Correctly report allocation failures in GIC-V2/V3 to avoid using
half allocated and initialized interrupts.
- Fixup checks against nr_cpu_ids in the generic IPI code"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/ipi: Fixup checks against nr_cpu_ids
genirq: Restore trigger settings in irq_modify_status()
MAINTAINERS: Remove Jason Cooper's irqchip git tree
irqchip/gic-v3-its-platform-msi: Fix msi-parent parsing loop
irqchip/gic-v3-its: Allow GIC ITS number more than MAX_NUMNODES
irqchip: brcmstb-l2: Define an irq_pm_shutdown function
irqchip/gic: Ensure we have an ISB between ack and ->handle_irq
irqchip/gic-v3-its: Remove ACPICA version check for ACPI NUMA
irqchip/gic-v3: Honor forced affinity setting
irqchip/gic-v3: Report failures in gic_irq_domain_alloc
irqchip/gic-v2: Report failures in gic_irq_domain_alloc
irqchip/atmel-aic: Remove root argument from ->fixup() prototype
irqchip/atmel-aic: Fix unbalanced refcount in aic_common_rtc_irq_fixup()
irqchip/atmel-aic: Fix unbalanced of_node_put() in aic_common_irq_fixup()
Linus Torvalds [Sun, 20 Aug 2017 15:54:30 +0000 (08:54 -0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchdog fix from Thomas Gleixner:
"A fix for the hardlockup watchdog to prevent false positives with
extreme Turbo-Modes which make the perf/NMI watchdog fire faster than
the hrtimer which is used to verify.
Slightly larger than the minimal fix, which just would increase the
hrtimer frequency, but comes with extra overhead of more watchdog
timer interrupts and thread wakeups for all users.
With this change we restrict the overhead to the extreme Turbo-Mode
systems"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kernel/watchdog: Prevent false positives with turbo modes
Alexey Dobriyan [Sat, 19 Aug 2017 09:57:51 +0000 (12:57 +0300)]
genirq/ipi: Fixup checks against nr_cpu_ids
Valid CPU ids are [0, nr_cpu_ids-1] inclusive.
Fixes: 3b8e29a82dd1 ("genirq: Implement ipi_send_mask/single()") Fixes: f9bce791ae2a ("genirq: Add a new function to get IPI reverse mapping") Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170819095751.GB27864@avx2
Xin Long [Fri, 18 Aug 2017 03:01:36 +0000 (11:01 +0800)]
net: sched: fix NULL pointer dereference when action calls some targets
As we know in some target's checkentry it may dereference par.entryinfo
to check entry stuff inside. But when sched action calls xt_check_target,
par.entryinfo is set with NULL. It would cause kernel panic when calling
some targets.
It can be reproduce with:
# tc qd add dev eth1 ingress handle ffff:
# tc filter add dev eth1 parent ffff: u32 match u32 0 0 action xt \
-j ECN --ecn-tcp-remove
It could also crash kernel when using target CLUSTERIP or TPROXY.
By now there's no proper value for par.entryinfo in ipt_init_target,
but it can not be set with NULL. This patch is to void all these
panics by setting it with an ipt_entry obj with all members = 0.
Note that this issue has been there since the very beginning.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Thu, 17 Aug 2017 23:19:42 +0000 (00:19 +0100)]
rxrpc: Fix oops when discarding a preallocated service call
rxrpc_service_prealloc_one() doesn't set the socket pointer on any new call
it preallocates, but does add it to the rxrpc net namespace call list.
This, however, causes rxrpc_put_call() to oops when the call is discarded
when the socket is closed. rxrpc_put_call() needs the socket to be able to
reach the namespace so that it can use a lock held therein.
Fix this by setting a call's socket pointer immediately before discarding
it.
This can be triggered by unloading the kafs module, resulting in an oops
like the following:
Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 17 Aug 2017 22:14:58 +0000 (23:14 +0100)]
irda: do not leak initialized list.dev to userspace
list.dev has not been initialized and so the copy_to_user is copying
data from the stack back to user space which is a potential
information leak. Fix this ensuring all of list is initialized to
zero.
Detected by CoverityScan, CID#1357894 ("Uninitialized scalar variable")
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huy Nguyen [Thu, 17 Aug 2017 15:29:52 +0000 (18:29 +0300)]
net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
enable_4k_uar module parameter was added in patch cited below to
address the backward compatibility issue in SRIOV when the VM has
system's PAGE_SIZE uar implementation and the Hypervisor has 4k uar
implementation.
The above compatibility issue does not exist in the non SRIOV case.
In this patch, we always enable 4k uar implementation if SRIOV
is not enabled on mlx4's supported cards.
Fixes: 76e39ccf9c36 ("net/mlx4_core: Fix backward compatibility on VFs") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thierry Reding [Thu, 17 Aug 2017 11:06:14 +0000 (13:06 +0200)]
PCI: Allow PCI express root ports to find themselves
If the pci_find_pcie_root_port() function is called on a root port
itself, return the root port rather than NULL.
This effectively reverts commit 0e405232871d6 ("PCI: fix oops when
try to find Root Port for a PCI device") which added an extra check
that would now be redundant.
Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum") Signed-off-by: Thierry Reding <treding@nvidia.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Shawn Lin <shawn.lin@rock-chips.com> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Wed, 16 Aug 2017 21:53:36 +0000 (17:53 -0400)]
tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
In some situations tcp_send_loss_probe() can realize that it's unable
to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
realizes that the RTO was eligible to fire immediately or at some
point in the past (delta_us <= 0). Previously in such cases
tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
icsk_rto, which caused needless delays of hundreds of milliseconds
(and non-linear behavior that made reproducible testing
difficult). This commit changes the logic to schedule "overdue" RTOs
ASAP, rather than at now + icsk_rto.
Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)") Suggested-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 18 Aug 2017 23:06:33 +0000 (16:06 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"14 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes
mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM
mm/mempolicy: fix use after free when calling get_mempolicy
mm/cma_debug.c: fix stack corruption due to sprintf usage
signal: don't remove SIGNAL_UNKILLABLE for traced tasks.
mm, oom: fix potential data corruption when oom_reaper races with writer
mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS
slub: fix per memcg cache leak on css offline
mm: discard memblock data later
test_kmod: fix description for -s -and -c parameters
kmod: fix wait on recursive loop
wait: add wait_event_killable_timeout()
kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog
mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()
Wei Wang [Wed, 16 Aug 2017 18:18:09 +0000 (11:18 -0700)]
ipv6: reset fn->rr_ptr when replacing route
syzcaller reported the following use-after-free issue in rt6_select():
BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
Read of size 4 by task syz-executor1/439628
CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 0000000000000000ffff88018fe435b0ffffffff81ca384dffff8801d3588c00 ffff8800bc699380ffff8800bc699500dffffc0000000000ffff8801d40a47c0 ffff88018fe435d8ffffffff81735751ffff88018fe43660ffff8800bc699380
Call Trace:
[<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
[<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
[<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
[<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
[<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
[<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
[<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
[<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
[<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
[<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
[<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
[<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
[<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
[<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
[<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
[<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
[<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
[<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
[<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
[<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
[<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
[<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
[<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
Object at ffff8800bc699380, in cache ip6_dst_cache size: 384
The root cause of it is that in fib6_add_rt2node(), when it replaces an
existing route with the new one, it does not update fn->rr_ptr.
This commit resets fn->rr_ptr to NULL when it points to a route which is
replaced in fib6_add_rt2node().
Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
KMSAN reported use of uninitialized sctp_addr->v4.sin_addr.s_addr and
sctp_addr->v6.sin6_scope_id in sctp_v6_cmp_addr() (see below).
Make sure all fields of an IPv6 address are initialized, which
guarantees that the IPv4 fields are also initialized.
Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 16 Aug 2017 16:41:54 +0000 (09:41 -0700)]
tipc: fix use-after-free
syszkaller reported use-after-free in tipc [1]
When msg->rep skb is freed, set the pointer to NULL,
so that caller does not free it again.
[1]
==================================================================
BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/core/skbuff.c:1466
Read of size 8 at addr ffff8801c6e71e90 by task syz-executor5/4115
Memory state around the buggy address: ffff8801c6e71d80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff8801c6e71e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801c6e71e80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff8801c6e71f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801c6e71f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Memory state around the buggy address: ffff8801d2843a00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc ffff8801d2843a80: 00 00 00 fc fc fc fc fc fb fb fb fb fc fc fc fc
>ffff8801d2843b00: 00 00 00 00 fc fc fc fc fb fb fb fb fc fc fc fc
^ ffff8801d2843b80: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc ffff8801d2843c00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kees Cook [Fri, 18 Aug 2017 22:16:31 +0000 (15:16 -0700)]
mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes
Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000
broke AddressSanitizer. This is a partial revert of:
eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
The AddressSanitizer tool has hard-coded expectations about where
executable mappings are loaded.
The motivation for changing the PIE base in the above commits was to
avoid the Stack-Clash CVEs that allowed executable mappings to get too
close to heap and stack. This was mainly a problem on 32-bit, but the
64-bit bases were moved too, in an effort to proactively protect those
systems (proofs of concept do exist that show 64-bit collisions, but
other recent changes to fix stack accounting and setuid behaviors will
minimize the impact).
The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC
base), so only the 64-bit PIE base needs to be reverted to let x86 and
arm64 ASan binaries run again. Future changes to the 64-bit PIE base on
these architectures can be made optional once a more dynamic method for
dealing with AddressSanitizer is found. (e.g. always loading PIE into
the mmap region for marked binaries.)
Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") Fixes: 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Kostya Serebryany <kcc@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Laura Abbott [Fri, 18 Aug 2017 22:16:27 +0000 (15:16 -0700)]
mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM
Commit 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly") added
use of __GFP_HIGHMEM for allocations. vmalloc_32 may use
GFP_DMA/GFP_DMA32 which does not play nice with __GFP_HIGHMEM and will
trigger a BUG in gfp_zone.
Only add __GFP_HIGHMEM if we aren't using GFP_DMA/GFP_DMA32.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1482249 Link: http://lkml.kernel.org/r/20170816220705.31374-1-labbott@redhat.com Fixes: 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly") Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bytes b4 ffff8801f582d750: ae 01 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
Object ffff8801f582d760: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8801f582d770: 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkk.
Redzone ffff8801f582d778: bb bb bb bb bb bb bb bb ........
Padding ffff8801f582d8b8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Memory state around the buggy address: ffff8801f582d600: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801f582d680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8801f582d700: fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb fc
!shared memory policy is not protected against parallel removal by other
thread which is normally protected by the mmap_sem. do_get_mempolicy,
however, drops the lock midway while we can still access it later.
Early premature up_read is a historical artifact from times when
put_user was called in this path see https://lwn.net/Articles/124754/
but that is gone since 8bccd85ffbaf ("[PATCH] Implement sys_* do_*
layering in the memory policy layer."). but when we have the the
current mempolicy ref count model. The issue was introduced
accordingly.
Fix the issue by removing the premature release.
Link: http://lkml.kernel.org/r/1502950924-27521-1-git-send-email-zhongjiang@huawei.com Signed-off-by: zhong jiang <zhongjiang@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> [2.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prakash Gupta [Fri, 18 Aug 2017 22:16:21 +0000 (15:16 -0700)]
mm/cma_debug.c: fix stack corruption due to sprintf usage
name[] in cma_debugfs_add_one() can only accommodate 16 chars including
NULL to store sprintf output. It's common for cma device name to be
larger than 15 chars. This can cause stack corrpution. If the gcc
stack protector is turned on, this can cause a panic due to stack
corruption.
Fix the short sprintf buffer in cma_debugfs_add_one() by using
scnprintf() instead of sprintf().
Link: http://lkml.kernel.org/r/1502446217-21840-1-git-send-email-guptap@codeaurora.org Fixes: f318dd083c81 ("cma: Store a name in the cma structure") Signed-off-by: Prakash Gupta <guptap@codeaurora.org> Acked-by: Laura Abbott <labbott@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>