netfilter: nft_rbtree: ignore inactive matching element with no descendants
If we find a matching element that is inactive with no descendants, we
jump to the found label, then crash because of nul-dereference on the
left branch.
Fix this by checking that the element is active and not an interval end
and skipping the logic that only applies to the tree iteration.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Anders K. Pedersen <akp@akp.dk>
netfilter: nf_ct_h323: do not re-activate already expired timer
Commit 96d1327ac2e3 ("netfilter: h323: Use mod_timer instead of
set_expect_timeout") just simplify the source codes
if (!del_timer(&exp->timeout))
return 0;
add_timer(&exp->timeout);
to mod_timer(&exp->timeout, jiffies + info->timeout * HZ);
This is not correct, and introduce a race codition:
CPU0 CPU1
- timer expire
process_rcf expectation_timed_out
lock(exp_lock) -
find_exp waiting exp_lock...
re-activate timer!! waiting exp_lock...
unlock(exp_lock) lock(exp_lock)
- unlink expect
- free(expect)
- unlock(exp_lock)
So when the timer expires again, we will access the memory that
was already freed.
Replace mod_timer with mod_timer_pending here to fix this problem.
Fixes: 96d1327ac2e3 ("netfilter: h323: Use mod_timer instead of set_expect_timeout") Cc: Gao Feng <fgao@ikuai8.com> Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Sun, 7 Aug 2016 00:52:00 +0000 (20:52 -0400)]
Merge tag 'mac80211-for-davem-2016-08-05' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
First set of fixes for the current cycle:
* fix 80+80 bandwidth warning
* fix powersave with mac80211 TXQ implementation
* use correct way to free SKBs from multicast buffering
* mesh: fix operation ordering to work with all drivers
* mesh: end service period even when peer goes away
* mesh: correct HT opmode validity checks
* pass hw pointer from mac80211 to driver in TPT method,
fixing a bug (in a bit the wrong way, but that's what
we have right now)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The introduction of pre-allocated hash elements inadvertently broke
the behavior of bpf hash maps where users expected to call
bpf_map_update_elem() without considering that the map can be full.
Some programs do:
old_value = bpf_map_lookup_elem(map, key);
if (old_value) {
... prepare new_value on stack ...
bpf_map_update_elem(map, key, new_value);
}
Before pre-alloc the update() for existing element would work even
in 'map full' condition. Restore this behavior.
The above program could have updated old_value in place instead of
update() which would be faster and most programs use that approach,
but sometimes the values are large and the programs use update()
helper to do atomic replacement of the element.
Note we cannot simply update element's value in-place like percpu
hash map does and have to allocate extra num_possible_cpu elements
and use this extra reserve when the map is full.
Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: b53: Add missing ULL suffix for 64-bit constant
On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):
drivers/net/dsa/b53/b53_common.c: In function ‘b53_arl_read’:
drivers/net/dsa/b53/b53_common.c:1072: warning: integer constant is too large for ‘long’ type
Fixes: 1da6df85c6fbed8f ("net: dsa: b53: Implement ARL add/del/dump operations") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Forster <dforster@brocade.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Wed, 3 Aug 2016 13:11:40 +0000 (14:11 +0100)]
rxrpc: Fix races between skb free, ACK generation and replying
Inside the kafs filesystem it is possible to occasionally have a call
processed and terminated before we've had a chance to check whether we need
to clean up the rx queue for that call because afs_send_simple_reply() ends
the call when it is done, but this is done in a workqueue item that might
happen to run to completion before afs_deliver_to_call() completes.
Further, it is possible for rxrpc_kernel_send_data() to be called to send a
reply before the last request-phase data skb is released. The rxrpc skb
destructor is where the ACK processing is done and the call state is
advanced upon release of the last skb. ACK generation is also deferred to
a work item because it's possible that the skb destructor is not called in
a context where kernel_sendmsg() can be invoked.
To this end, the following changes are made:
(1) kernel_rxrpc_data_consumed() is added. This should be called whenever
an skb is emptied so as to crank the ACK and call states. This does
not release the skb, however. kernel_rxrpc_free_skb() must now be
called to achieve that. These together replace
rxrpc_kernel_data_delivered().
(2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().
This makes afs_deliver_to_call() easier to work as the skb can simply
be discarded unconditionally here without trying to work out what the
return value of the ->deliver() function means.
The ->deliver() functions can, via afs_data_complete(),
afs_transfer_reply() and afs_extract_data() mark that an skb has been
consumed (thereby cranking the state) without the need to
conditionally free the skb to make sure the state is correct on an
incoming call for when the call processor tries to send the reply.
(3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
has finished with a packet and MSG_PEEK isn't set.
(4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().
Because of this, we no longer need to clear the destructor and put the
call before we free the skb in cases where we don't want the ACK/call
state to be cranked.
(5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
the delivery function already), and the caller is now responsible for
producing an abort if that was the last packet.
(6) There are many bits of unmarshalling code where:
ret = afs_extract_data(call, skb, last, ...);
switch (ret) {
case 0: break;
case -EAGAIN: return 0;
default: return ret;
}
is to be found. As -EAGAIN can now be passed back to the caller, we
now just return if ret < 0:
ret = afs_extract_data(call, skb, last, ...);
if (ret < 0)
return ret;
(7) Checks for trailing data and empty final data packets has been
consolidated as afs_data_complete(). So:
if (skb->len > 0)
return -EBADMSG;
if (!last)
return 0;
becomes:
ret = afs_data_complete(call, skb, last);
if (ret < 0)
return ret;
(8) afs_transfer_reply() now checks the amount of data it has against the
amount of data desired and the amount of data in the skb and returns
an error to induce an abort if we don't get exactly what we want.
Without these changes, the following oops can occasionally be observed,
particularly if some printks are inserted into the delivery path:
This doesn't have an immediate effect, but can mess up later
LL_RESERVED_SPACE calculations, such as done in
net/ipv6/mcast.c:mld_newpack. For reference, this issue was found
from a skb_panic raised there after the length calculations had given
the wrong result.
Note the other current users of this interface
(drivers/net/tun.c:tun_set_headroom and
drivers/net/veth.c:veth_set_rx_headroom) are both checking this
correctly thus need no modification.
Thanks to Ben for some pointers from the crash dumps!
Cc: Benjamin Poirier <bpoirier@suse.com> Cc: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414 Signed-off-by: Ian Wienand <iwienand@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maxim Altshul [Thu, 4 Aug 2016 12:43:04 +0000 (15:43 +0300)]
mac80211: Add ieee80211_hw pointer to get_expected_throughput
The variable is added to allow the driver an easy access to
it's own hw->priv when the op is invoked.
This fixes a crash in wlcore because it was relying on a
station pointer that wasn't initialized yet. It's the wrong
way to fix the crash, but it solves the problem for now and
it does make sense to have the hw pointer here.
Masashi Honma [Tue, 2 Aug 2016 08:16:57 +0000 (17:16 +0900)]
mac80211: End the MPSP even if EOSP frame was not acked
If QoS frame with EOSP (end of service period) subfield=1 sent by local
peer was not acked by remote peer, local peer did not end the MPSP. This
prevents local peer from going to DOZE state. And if the remote peer
goes away without closing connection, local peer continues AWAKE state
and wastes battery.
Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Felix Fietkau [Tue, 2 Aug 2016 09:13:41 +0000 (11:13 +0200)]
mac80211: fix purging multicast PS buffer queue
The code currently assumes that buffered multicast PS frames don't have
a pending ACK frame for tx status reporting.
However, hostapd sends a broadcast deauth frame on teardown for which tx
status is requested. This can lead to the "Have pending ack frames"
warning on module reload.
Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.
Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Manish Chopra [Wed, 3 Aug 2016 08:02:02 +0000 (04:02 -0400)]
qlcnic: fix data structure corruption in async mbx command handling
This patch fixes a data structure corruption bug in the SRIOV VF mailbox
handler code. While handling mailbox commands from the atomic context,
driver is accessing and updating qlcnic_async_work_list_struct entry fields
in the async work list. These fields could be concurrently accessed by the
work function resulting in data corruption.
This patch restructures async mbx command handling by using a separate
async command list instead of using a list of work_struct structures.
A single work_struct is used to schedule and handle the async commands
with proper locking mechanism.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: Sony Chacko <sony.chacko@qlogic.com> Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tg3: Report the correct number of RSS queues through tg3_get_rxnfc
This patch remove the wrong substraction from info->data in
tg3_get_rxnfc function. Without this patch, the number of RSS
queues reported is less by one.
Reported-by: Michal Soltys <soltys@ziu.info> Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the rx coalescing time is 0, interrupts
are not generated from the controller and rx path hangs.
To avoid this rx hang, updating the driver to not allow
rx coalescing time to be 0.
Signed-off-by: Satish Baddipadige <satish.baddipadige@broadcom.com> Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We need to get a UNKNOWN_VALUE with imm to force id
generation so lines 0-5 make r7 a valid packet pointer.
We then read two different bytes from the packet and
add them to copies of the constructed packet pointer.
r8 (line 9) and r9 (line 11) will get the same id of 1,
independently. When either of them is validated (line
13) - find_good_pkt_pointers() will also mark the other
as safe. This leads to access on line 14 being mistakenly
considered safe.
Fixes: 969bf05eb3ce ("bpf: direct packet access") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 2 Aug 2016 10:23:29 +0000 (12:23 +0200)]
net: xgene: fix maybe-uninitialized variable
Building with -Wmaybe-uninitialized shows a potential use of
an uninitialized variable:
drivers/net/ethernet/apm/xgene/xgene_enet_hw.c: In function 'xgene_enet_phy_connect':
drivers/net/ethernet/apm/xgene/xgene_enet_hw.c:802:23: warning: 'phy_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]
Although the compiler correctly identified this based on the function,
the current code is still safe as long dev->of_node is non-NULL
for the case of CONFIG_ACPI=n, which is currently the case.
The warning is now disabled by default, but still appears when
building with W=1, and other build test tools should be able to
detect it as well. Adding an #else clause here makes the code
more robust and makes it clear to the compiler that this cannot
happen.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 8089a96f601b ("drivers: net: xgene: Add backward compatibility") Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Tue, 2 Aug 2016 02:36:07 +0000 (19:36 -0700)]
openvswitch: Remove incorrect WARN_ONCE().
ovs_ct_find_existing() issues a warning if an existing conntrack entry
classified as IP_CT_NEW is found, with the premise that this should
not happen. However, a newly confirmed, non-expected conntrack entry
remains IP_CT_NEW as long as no reply direction traffic is seen. This
has resulted into somewhat confusing kernel log messages. This patch
removes this check and warning.
Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.") Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 3 Aug 2016 16:50:06 +0000 (12:50 -0400)]
Merge tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"A few updates and fixes:
- move the suppressing of the __builtin_return_address >0 warning to
the tracing directory only.
- metag recordmcount fix for newer glibc's
- two tracing histogram fixes that were reported by KASAN"
* tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix use-after-free in hist_register_trigger()
tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
Makefile: Mute warning for __builtin_return_address(>0) for tracing only
ftrace/recordmcount: Work around for addition of metag magic but not relocations
mac80211: mesh: flush stations before beacons are stopped
Some drivers (e.g. wl18xx) expect that the last stage in the
de-initialization process will be stopping the beacons, similar to AP flow.
Update ieee80211_stop_mesh() flow accordingly.
As peers can be removed dynamically, this would not impact other drivers.
Linus Torvalds [Tue, 2 Aug 2016 23:47:06 +0000 (19:47 -0400)]
Merge tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux
Pull orangefs update from Martin Brandenburg:
"Kernel side caching and executable bugfix
This allows OrangeFS to utilize the dcache and adds an in kernel
attribute cache. We previously used the user side client for this
purpose.
We see a modest performance increase on small file operations. For
example, without the cache, compiling coreutils takes about 17
minutes. With the patch and a 50 millisecond timeout for
dcache_timeout_msecs and getattr_timeout_msecs (the default),
compiling coreutils takes about 6 minutes 20 seconds. On the same
hardware, compiling coreutils on an xfs filesystem takes 90 seconds.
We see similar improvements with mdtest and a test involving writing,
reading, and deleting a large number of small files.
Interested parties can review more data at the following URL.
The eventual goal of this is to allow getdents to turn into a
readdirplus to the OrangeFS server. The cache will be filled then,
which should provide a performance benefit to the common case of
readdir followed by getattr on each entry (i.e. ls -l).
This also fixes a bug. When orangefs_inode_permission was added, it
did not collect i_size from the OrangeFS server, since this presses an
unnecessary load on the OrangeFS server. However, it left a case
where i_size is never initialized. Then running an executable could
fail.
With this patch, size is always collected to be inserted into the
cache. Thus the bug disappears. If this patch is not accepted during
this merge window, we will send a one-line band-aid for this bug
instead"
* tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux:
Orangefs: update orangefs.txt
orangefs: Account for jiffies wraparound.
orangefs: Change default dcache and getattr timeout to 50 msec.
orangefs: Allow dcache and getattr cache time to be configured.
orangefs: Cache getattr results.
orangefs: Use d_time to avoid excessive lookups
Linus Torvalds [Tue, 2 Aug 2016 23:39:09 +0000 (19:39 -0400)]
Merge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client
Pull Ceph updates from Ilya Dryomov:
"The highlights are:
- RADOS namespace support in libceph and CephFS (Zheng Yan and
myself). The stopgaps added in 4.5 to deny access to inodes in
namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
bit is now fully supported
- A large rework of the MDS cap flushing code (Zheng Yan)
- Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were
overly pessimistic before, bailing at the first sight of LOOKUP_RCU
On top of that we've got a few CephFS bug fixes, a couple of cleanups
and Arnd's workaround for a weird genksyms issue"
* tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
ceph: fix symbol versioning for ceph_monc_do_statfs
ceph: Correctly return NXIO errors from ceph_llseek
ceph: Mark the file cache as unreclaimable
ceph: optimize cap flush waiting
ceph: cleanup ceph_flush_snaps()
ceph: kick cap flushes before sending other cap message
ceph: introduce an inode flag to indicates if snapflush is needed
ceph: avoid sending duplicated cap flush message
ceph: unify cap flush and snapcap flush
ceph: use list instead of rbtree to track cap flushes
ceph: update types of some local varibles
ceph: include 'follows' of pending snapflush in cap reconnect message
ceph: update cap reconnect message to version 3
ceph: mount non-default filesystem by name
libceph: fsmap.user subscription support
ceph: handle LOOKUP_RCU in ceph_d_revalidate
ceph: allow dentry_lease_is_valid to work under RCU walk
ceph: clear d_fsinfo pointer under d_lock
ceph: remove ceph_mdsc_lease_release
ceph: don't use ->d_time
...
Vegard Nossum [Tue, 2 Aug 2016 21:07:30 +0000 (14:07 -0700)]
kcov: allow more fine-grained coverage instrumentation
For more targeted fuzzing, it's better to disable kernel-wide
instrumentation and instead enable it on a per-subsystem basis. This
follows the pattern of UBSAN and allows you to compile in the kcov
driver without instrumenting the whole kernel.
To instrument a part of the kernel, you can use either
# for a single file in the current directory
KCOV_INSTRUMENT_filename.o := y
or
# for all the files in the current directory (excluding subdirectories)
KCOV_INSTRUMENT := y
or
# (same as above)
ccflags-y += $(CFLAGS_KCOV)
or
# for all the files in the current directory (including subdirectories)
subdir-ccflags-y += $(CFLAGS_KCOV)
init/Kconfig: add clarification for out-of-tree modules
It doesn't trim just symbols that are totally unused in-tree - it trims
the symbols unused by any in-tree modules actually built. If you've
done a 'make localmodconfig' and only build a hundred or so modules,
it's pretty likely that your out-of-tree module will come up lacking
something...
Hopefully this will save the next guy from a Homer Simpson "D'oh!"
moment.
Link: http://lkml.kernel.org/r/10177.1469787292@turing-police.cc.vt.edu Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rob Herring [Tue, 2 Aug 2016 21:07:24 +0000 (14:07 -0700)]
config: add android config fragments
Copy the config fragments from the AOSP common kernel android-4.4
branch. It is becoming possible to run mainline kernels with Android,
but the kernel defconfigs don't work as-is and debugging missing config
options is a pain. Adding the config fragments into the kernel tree,
makes configuring a mainline kernel as simple as:
make ARCH=arm multi_v7_defconfig android-base.config android-recommended.config
The following non-upstream config options were removed:
Alexey Dobriyan [Tue, 2 Aug 2016 21:07:21 +0000 (14:07 -0700)]
init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
Doing patches with allmodconfig kernel compiled and committing stuff
into local tree have unfortunate consequence: kernel version changes (as
it should) leading to recompiling and relinking of several files even if
they weren't touched (or interesting at all). This and "git-whatever"
figuring out current version slow down compilation for no good reason.
But lets face it, "allmodconfig" kernels don't care about kernel
version, they are simply compile check guinea pigs.
Make LOCALVERSION_AUTO depend on !COMPILE_TEST, so it doesn't sneak into
allmodconfig .config.
Link: http://lkml.kernel.org/r/20160707214954.GC31678@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akash Goel [Tue, 2 Aug 2016 21:07:18 +0000 (14:07 -0700)]
relay: add global mode support for buffer-only channels
Commit 20d8b67c06fa ("relay: add buffer-only channels; useful for early
logging") added support to use channels with no associated files.
This is useful when the exact location of relay file is not known or the
the parent directory of relay file is not available, while creating the
channel and the logging has to start right from the boot.
But there was no provision to use global mode with buffer-only channels,
which is added by this patch, without modifying the interface where
initially there will be a dummy invocation of create_buf_file callback
through which kernel client can convey the need of a global buffer.
For the use case where drivers/kernel clients want a simple interface
for the userspace, which enables them to capture data/logs from relay
file inorder & without any post processing, support of Global buffer
mode is warranted.
Modules, like i915, using relay_open() in early init would have to later
register their buffer-only relays, once debugfs is available, by calling
relay_late_setup_files(). Hence relay_late_setup_files() symbol also
needs to be exported.
Link: http://lkml.kernel.org/r/1468404563-11653-1-git-send-email-akash.goel@intel.com Signed-off-by: Akash Goel <akash.goel@intel.com> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prarit Bhargava [Tue, 2 Aug 2016 21:07:15 +0000 (14:07 -0700)]
init: allow blacklisting of module_init functions
sprint_symbol_no_offset() returns the string "function_name
[module_name]" where [module_name] is not printed for built in kernel
functions. This means that the blacklisting code will fail when
comparing module function names with the extended string.
This patch adds the functionality to block a module's module_init()
function by finding the space in the string and truncating the
comparison to that length.
Link: http://lkml.kernel.org/r/1466124387-20446-1-git-send-email-prarit@redhat.com Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yang Shi <yang.shi@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Kees Cook <keescook@chromium.org> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit e93762bbf681 ("w1: masters: omap_hdq: add support for 1-wire
mode") added a statement to clear the hdq_irqstatus flags in
hdq_read_byte().
If the hdq reading process is scheduled slowly or interrupts are
disabled for a while the hardware read activity might already be
finished on entry of hdq_read_byte(). And hdq_isr() already has set the
hdq_irqstatus to 0x6 (can be seen in debug mode) denoting that both, the
TXCOMPLETE and RXCOMPLETE interrupts occurred in parallel.
This means there is no need to wait and the hdq_read_byte() can just
read the byte from the hdq controller.
By resetting hdq_irqstatus to 0 the read process is forced to be always
waiting again (because the if statement always succeeds) but the
hardware will not issue another RXCOMPLETE interrupt. This results in a
false timeout.
Andrew F. Davis [Tue, 2 Aug 2016 21:07:09 +0000 (14:07 -0700)]
w1: add helper macro module_w1_family
The helper macro module_w1_family can be used in module drivers that
only register a w1 driver in their module init functions. Add this
macro and use it in all applicable drivers.
Link: http://lkml.kernel.org/r/20160531204313.20979-2-afd@ti.com Signed-off-by: Andrew F. Davis <afd@ti.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew F. Davis [Tue, 2 Aug 2016 21:07:06 +0000 (14:07 -0700)]
w1: remove need for ida and use PLATFORM_DEVID_AUTO
PLATFORM_DEVID_AUTO can be used to have the platform core assign a
unique ID instead of manually creating one with IDA. Do this in all
applicable drivers.
Link: http://lkml.kernel.org/r/20160531204313.20979-1-afd@ti.com Signed-off-by: Andrew F. Davis <afd@ti.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement changes made in RapidIO specification rev.3 to LP-Serial Physical
Layer register definitions:
- use per-port register offset calculations based on LP-Serial Extended
Features Block (EFB) Register Map type (I or II) with different
per-port offset step (0x20 vs 0x40 respectfully).
- remove deprecated Parallel Physical layer definitions and related
code.
Current definition of map_inb() mport operations callback uses u32 type
to specify required inbound window (IBW) size. This is limiting factor
because existing hardware - tsi721 and fsl_rio, both support IBW size up
to 16GB.
Changing type of size parameter to u64 to allow IBW size configurations
larger than 4GB.
[alexandre.bounine@idt.com: remove compiler warning about size of constant] Link: http://lkml.kernel.org/r/20160802184856.2566-1-alexandre.bounine@idt.com Link: http://lkml.kernel.org/r/1469125134-16523-11-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio/tsi721_dma: advance queue processing from transfer submit call
Add advancing transfer queue immediately from transfer submit call. DMA
performance improvement: This will start transfer without waiting for
'issue_pending' command if there is no DMA transfer in progress.
Link: http://lkml.kernel.org/r/1469125134-16523-8-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add module parameter to allow load time configuration of available
RapidIO messaging mailboxes (MBOX1 - MBOX4).
Having a messaging MBOX selector mask allows to define which MBOXes are
controlled by the mport device driver and reserve some of them for
direct use by other drivers.
Link: http://lkml.kernel.org/r/1469125134-16523-7-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Barry Wood <barry.wood@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add PCIe Maximum Read Request Size (MRRS) adjustment parameter to allow
users to override configuration register value set during PCIe bus
initialization.
Performance of Tsi721 device as PCIe bus master can be improved if MRRS
is set to its maximum value (4096 bytes). Some platforms have
limitations for supported MRRS and therefore the default value should be
preserved, unless it is known that given platform supports full set of
MRRS values defined by PCI Express specification.
Link: http://lkml.kernel.org/r/1469125134-16523-6-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio/tsi721_dma: add channel mask and queue size parameters
Add module parameters to allow load time configuration of DMA channels.
Depending on application, performance of DMA data transfers can benefit
from adjusted sizes of buffer descriptor ring and/or transaction
requests queue.
Having HW DMA channel selector mask allows to define which channels
(from seven available) are controlled by the mport device driver and
reserve some of them for direct use by other drivers.
Link: http://lkml.kernel.org/r/1469125134-16523-5-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Barry Wood <barry.wood@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio: fix return value description for dma_prep functions
Update return value description for rio_dma_prep_... functions to
include error-valued pointer that can be returned by HW mport device
drivers. Return values from these functions must be checked using
IS_ERR_OR_NULL macro.
This patch is applicable to kernel versions starting from v4.6-rc1.
Link: http://lkml.kernel.org/r/1469125134-16523-4-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 2 Aug 2016 21:06:28 +0000 (14:06 -0700)]
rapidio: remove unnecessary 0x prefixes before %pa extension uses
Patch series "RapidIO subsystem updates".
This set of patches contains RapidIO subsystem fixes and updates that
have been made since kernel v4.6. The most significant update brings
changes related to the latest revision of RapidIO specification
(rev.3.x) and introduction of next generation of RapidIO switches by IDT
(RXS1632 and RXS2448).
This patch (of 13):
This is RapidIO part of the original patch submitted by Joe Perches.
(see: https://lkml.org/lkml/2016/3/5/19)
Since commit 3cab1e711297 ("lib/vsprintf: refactor duplicate code
to special_hex_number()") %pa uses have been output with a 0x prefix.
These 0x prefixes in the formats are unnecessary.
Link: http://lkml.kernel.org/r/1469125134-16523-2-git-send-email-alexandre.bounine@idt.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add channelized messaging driver to support native RapidIO messaging
exchange between multiple senders/recipients on devices that use kernel
RapidIO subsystem services.
This device driver is the result of collaboration within the RapidIO.org
Software Task Group (STG) between Texas Instruments, Prodrive
Technologies, Nokia Networks, BAE and IDT. Additional input was
received from other members of RapidIO.org.
The objective was to create a character mode driver interface which
exposes messaging capabilities of RapidIO endpoint devices (mports)
directly to applications, in a manner that allows the numerous and
varied RapidIO implementations to interoperate.
This char mode device driver allows user-space applications to setup
messaging communication channels using single shared RapidIO messaging
mailbox.
By default this driver uses RapidIO MBOX_1 (MBOX_0 is reserved for use by
RIONET Ethernet emulation driver).
zhong jiang [Tue, 2 Aug 2016 21:06:22 +0000 (14:06 -0700)]
kexec: add restriction on kexec_load() segment sizes
I hit the following issue when run trinity in my system. The kernel is
3.4 version, but mainline has the same issue.
The root cause is that the segment size is too large so the kerenl
spends too long trying to allocate a page. Other cases will block until
the test case quits. Also, OOM conditions will occur.
Petr Tesarik [Tue, 2 Aug 2016 21:06:19 +0000 (14:06 -0700)]
kexec: allow kdump with crash_kexec_post_notifiers
If a crash kernel is loaded, do not crash the running domain. This is
needed if the kernel is loaded with crash_kexec_post_notifiers, because
panic notifiers are run before __crash_kexec() in that case, and this
Xen hook prevents its being called later.
[akpm@linux-foundation.org: build fix: unconditionally include kexec.h] Link: http://lkml.kernel.org/r/20160713122000.14969.99963.stgit@hananiah.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Petr Tesarik [Tue, 2 Aug 2016 21:06:16 +0000 (14:06 -0700)]
kexec: add a kexec_crash_loaded() function
Provide a wrapper function to be used by kernel code to check whether a
crash kernel is loaded. It returns the same value that can be seen in
/sys/kernel/kexec_crash_loaded by userspace programs.
I'm exporting the function, because it will be used by Xen, and it is
possible to compile Xen modules separately to enable the use of PV
drivers with unmodified bare-metal kernels.
Link: http://lkml.kernel.org/r/20160713121955.14969.69080.stgit@hananiah.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hidehiro Kawai [Tue, 2 Aug 2016 21:06:13 +0000 (14:06 -0700)]
kexec: use core_param for crash_kexec_post_notifiers boot option
crash_kexec_post_notifiers ia a boot option which controls whether the
1st kernel calls panic notifiers or not before booting the 2nd kernel.
However, there is no need to limit it to being modifiable only at boot
time. So, use core_param instead of early_param.
Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:10 +0000 (14:06 -0700)]
ARM: kexec: fix kexec for Keystone 2
Provide kexec with the boot view of memory by overriding the normal
kexec translation functions added in a previous patch. We also need to
fix a call to memblock in machine_kexec_prepare() so that we provide it
with a running-view physical address rather than a boot- view physical
address.
Link: http://lkml.kernel.org/r/E1b8koa-0004Hl-Ey@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Keerthy <j-keerthy@ti.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit adds definition for cpu_on, cpu_off and cpu_suspend
commands. These definitions must match the corresponding PSCI
definitions in boot monitor.
Having those command and corresponding PSCI support in boot monitor
allows run time CPU hot plugin.
Link: http://lkml.kernel.org/r/E1b8koV-0004Hf-2j@rmk-PC.armlinux.org.uk Signed-off-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Vitaly Andrianov <vitalya@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Pratyush Anand <panand@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:04 +0000 (14:06 -0700)]
kexec: allow architectures to override boot mapping
kexec physical addresses are the boot-time view of the system. For
certain ARM systems (such as Keystone 2), the boot view of the system
does not match the kernel's view of the system: the boot view uses a
special alias in the lower 4GB of the physical address space.
To cater for these kinds of setups, we need to translate between the
boot view physical addresses and the normal kernel view physical
addresses. This patch extracts the current transation points into
linux/kexec.h, and allows an architecture to override the functions.
Due to the translations required, we unfortunately end up with six
translation functions, which are reduced down to four that the
architecture can override.
[akpm@linux-foundation.org: kexec.h needs asm/io.h for phys_to_virt()] Link: http://lkml.kernel.org/r/E1b8koP-0004HZ-Vf@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Keerthy <j-keerthy@ti.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:00 +0000 (14:06 -0700)]
kdump: arrange for paddr_vmcoreinfo_note() to return phys_addr_t
On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB
physical on 32-bit architectures, so we need a wider type than "unsigned
long" here. Arrange for paddr_vmcoreinfo_note() to return a
phys_addr_t, thereby allowing it to be located above 4GB.
This makes no difference for kexec-tools, as they already assume a
64-bit type when reading from this file.
Link: http://lkml.kernel.org/r/E1b8koK-0004HS-K9@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:57 +0000 (14:05 -0700)]
kexec: ensure user memory sizes do not wrap
Ensure that user memory sizes do not wrap around when validating the
user input, which can lead to the following input validation working
incorrectly.
[akpm@linux-foundation.org: fix it for kexec-return-error-number-directly.patch] Link: http://lkml.kernel.org/r/E1b8koF-0004HM-5x@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:54 +0000 (14:05 -0700)]
kexec: don't invoke OOM-killer for control page allocation
If we are unable to find a suitable page when allocating the control
page, do not invoke the OOM-killer: killing processes probably isn't
going to help.
Link: http://lkml.kernel.org/r/E1b8ko9-0004HG-R5@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:51 +0000 (14:05 -0700)]
ARM: kexec: advertise location of bootable RAM
Advertise the location of bootable RAM to kexec-tools. kexec needs to
know where it can place the kernel in RAM, and so be executable when the
system needs to jump into it.
Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
tag.
Link: http://lkml.kernel.org/r/E1b8ko4-0004HA-GF@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Advertise a resource which describes where the crash kernel is located
in the boot view of RAM. This allows kexec-tools to have this vital
information.
Link: http://lkml.kernel.org/r/E1b8knz-0004H4-Bd@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minfei Huang [Tue, 2 Aug 2016 21:05:45 +0000 (14:05 -0700)]
kexec: return error number directly
This is a cleanup patch to make kexec more clear to return error number
directly. The variable result is useless, because there is no other
function's return value assignes to it. So remove it.
Link: http://lkml.kernel.org/r/1464179273-57668-1-git-send-email-mnghuan@gmail.com Signed-off-by: Minfei Huang <mnghuan@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Lutomirski [Tue, 2 Aug 2016 21:05:36 +0000 (14:05 -0700)]
signal: consolidate {TS,TLF}_RESTORE_SIGMASK code
In general, there's no need for the "restore sigmask" flag to live in
ti->flags. alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.
Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.
Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.
Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Mahoney [Tue, 2 Aug 2016 21:05:33 +0000 (14:05 -0700)]
reiserfs: fix "new_insert_key may be used uninitialized ..."
new_insert_key only makes any sense when it's associated with a
new_insert_ptr, which is initialized to NULL and changed to a
buffer_head when we also initialize new_insert_key. We can key off of
that to avoid the uninitialized warning.
Link: http://lkml.kernel.org/r/5eca5ffb-2155-8df2-b4a2-f162f105efed@suse.com Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:30 +0000 (14:05 -0700)]
nilfs2: move ioctl interface and disk layout to uapi separately
The header file "include/linux/nilfs2_fs.h" is composed of parts for
ioctl and disk format, and both are intended to be shared with user
space programs.
This moves them to the uapi directory "include/uapi/linux" splitting the
file to "nilfs2_api.h" and "nilfs2_ondisk.h". The following minor
changes are accompanied by this migration:
- nilfs_direct_node struct in nilfs2/direct.h is converged to
nilfs2_ondisk.h because it's an on-disk structure.
- inline functions nilfs_rec_len_from_disk() and
nilfs_rec_len_to_disk() are moved to nilfs2/dir.c.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:25 +0000 (14:05 -0700)]
nilfs2: fix misuse of a semaphore in sysfs code
Variables ns_seg_seq, ns_segnum, ns_nextnum, ns_pseg_offset, ns_cno,
ns_ctime, ns_nongc_ctime, and ns_ndirtyblks, are protected by
ns_segctor_sem, but ns_sem is wrongly used by the nilfs sysfs code when
reading these variables. This fixes the misuse and clarifies which
semaphore protects them in the comment of the_nilfs struct.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:22 +0000 (14:05 -0700)]
nilfs2: refactor parser of snapshot mount option
Move parser of snapshot mount option to a separate function
nilfs_parse_snapshot_option(), replace simple_strtoull() with
kstrtoull() to avoid checkpatch.pl warning "WARNING: simple_strtoull is
obsolete, use kstrtoull instead", and refine the error message of the
parser.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:19 +0000 (14:05 -0700)]
nilfs2: do not use yield()
Use cond_resched() instead of yield() in the loop of
nilfs_transaction_lock() since the usage corresponds to the "be nice for
others" case that the comment of yield() says.
This removes the following checkpatch.pl warning:
"WARNING: Using yield() is generally wrong. See yield() kernel-doc
(sched/core.c)"
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:17 +0000 (14:05 -0700)]
nilfs2: emit error message when I/O error is detected
When nilfs returned -EIO as an error code, it's not always clear if it
came from the underlying block device or not. This will mend the issue
by having low level I/O routines of nilfs output an error message when
they detected an I/O error.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:14 +0000 (14:05 -0700)]
nilfs2: replace nilfs_warning() with nilfs_msg()
Use nilfs_msg() to output warning messages and get rid of
nilfs_warning() function. This also removes function names from the
messages unless we embed them explicitly in format strings. Instead,
some messages are revised to clarify the context.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:10 +0000 (14:05 -0700)]
nilfs2: reduce bare use of printk() with nilfs_msg()
Replace most use of printk() in nilfs2 implementation with nilfs_msg(),
and reduce the following checkpatch.pl warning:
"WARNING: Prefer [subsystem eg: netdev]_crit([subsystem]dev, ...
then dev_crit(dev, ... then pr_crit(... to printk(KERN_CRIT ..."
This patch also fixes a minor checkpatch warning "WARNING: quoted string
split across lines" that often accompanies the prior warning, and amends
message format as needed.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:06 +0000 (14:05 -0700)]
nilfs2: embed a back pointer to super block instance in nilfs object
Insert a back pointer to super block instance in nilfs object so that
functions of nilfs2 easily refer to the super block instance. This
simplifies replacement of printk() in the successive change.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:02 +0000 (14:05 -0700)]
nilfs2: add nilfs_msg() message interface
Define an own output routine to replace bare use of printk() function.
The output routine is implemented with a macro and a helper function,
which are named nilfs_msg() and __nilfs_msg(), respectively.
__nilfs_msg() formats a message like "NILFS (<device-name>): <message>",
prefixing it with a given log level, and terminates the statement with a
newline. The "device-name" is optional to make it available in early
stages; it will be omitted if a NULL pointer is passed to super block
instance argument. nilfs_msg() wraps __nilfs_msg() and is removed if
CONFIG_PRINTK is not set.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:00 +0000 (14:05 -0700)]
nilfs2: hide function name argument from nilfs_error()
Simplify nilfs_error(), an output function used to report critical
issues in file system. This renames the original nilfs_error() function
to __nilfs_error() and redefines it as a macro to hide its function name
argument within the macro.
Every call site of nilfs_error() is changed to strip __func__ argument
except nilfs_bmap_convert_error(); nilfs_bmap_convert_error() directly
calls __nilfs_error() because it inherits caller's function name.
Kees Cook [Tue, 2 Aug 2016 21:04:54 +0000 (14:04 -0700)]
mm: refuse wrapped vm_brk requests
The vm_brk() alignment calculations should refuse to overflow. The ELF
loader depending on this, but it has been fixed now. No other unsafe
callers have been found.
Link: http://lkml.kernel.org/r/1468014494-25291-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Ismael Ripoll Ripoll <iripoll@upv.es> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Tue, 2 Aug 2016 21:04:51 +0000 (14:04 -0700)]
binfmt_elf: fix calculations for bss padding
A double-bug exists in the bss calculation code, where an overflow can
happen in the "last_bss - elf_bss" calculation, but vm_brk internally
aligns the argument, underflowing it, wrapping back around safe. We
shouldn't depend on these bugs staying in sync, so this cleans up the
bss padding handling to avoid the overflow.
This moves the bss padzero() before the last_bss > elf_bss case, since
the zero-filling of the ELF_PAGE should have nothing to do with the
relationship of last_bss and elf_bss: any trailing portion should be
zeroed, and a zero size is already handled by padzero().
Then it handles the math on elf_bss vs last_bss correctly. These need
to both be ELF_PAGE aligned to get the comparison correct, since that's
the expected granularity of the mappings. Since elf_bss already had
alignment-based padding happen in padzero(), the "start" of the new
vm_brk() should be moved forward as done in the original code. However,
since the "end" of the vm_brk() area will already become PAGE_ALIGNed in
vm_brk() then last_bss should get aligned here to avoid hiding it as a
side-effect.
Additionally makes a cosmetic change to the initial last_bss calculation
so it's easier to read in comparison to the load_addr calculation above
it (i.e. the only difference is p_filesz vs p_memsz).
Link: http://lkml.kernel.org/r/1468014494-25291-2-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Ismael Ripoll Ripoll <iripoll@upv.es> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allen Hubbe [Tue, 2 Aug 2016 21:04:45 +0000 (14:04 -0700)]
checkpatch: check signoff when reading stdin
Signoff was not checked if the filename is '-', indicating reading the
patch from stdin. Commands such as the below would not warn about a
missing signoff, because the patch filename is '-'. This change allows
checkpatch to warn about a missing signoff, even if the input filename
is '-', but only if the patch has a commit message.
git show --pretty=email | scripts/checkpatch.pl -
A more common use of checkpatch with stdin is for piping git diff
through checkpatch. The diff output would not contain a commit message,
and therefore it would not contain a signoff line. For this common use
case, a warning should not be printed about the missing signoff. With
this change we will only warn about a missing signoff if the input
contains a commit message.
git diff | scripts/checkpatch.pl -
Before this patch, a workaround for the first command was to refer to
stdin by a name other than '-'. The workaround is not an elegant
solution, because elsewhere checkpatch uses the fact that filename
equals '-', such as in setting '$vname' to 'Your patch' for stdin. The
command below would report "/dev/stdin has style problems" instead of
"Your patch has style problems."
git show --pretty=email | scripts/checkpatch.pl /dev/stdin
Stephen Boyd [Tue, 2 Aug 2016 21:04:28 +0000 (14:04 -0700)]
firmware: support loading into a pre-allocated buffer
Some systems are memory constrained but they need to load very large
firmwares. The firmware subsystem allows drivers to request this
firmware be loaded from the filesystem, but this requires that the
entire firmware be loaded into kernel memory first before it's provided
to the driver. This can lead to a situation where we map the firmware
twice, once to load the firmware into kernel memory and once to copy the
firmware into the final resting place.
This creates needless memory pressure and delays loading because we have
to copy from kernel memory to somewhere else. Let's add a
request_firmware_into_buf() API that allows drivers to request firmware
be loaded directly into a pre-allocated buffer. This skips the
intermediate step of allocating a buffer in kernel memory to hold the
firmware image while it's read from the filesystem. It also requires
that drivers know how much memory they'll require before requesting the
firmware and negates any benefits of firmware caching because the
firmware layer doesn't manage the buffer lifetime.
For a 16MB buffer, about half the time is spent performing a memcpy from
the buffer to the final resting place. I see loading times go from
0.081171 seconds to 0.047696 seconds after applying this patch. Plus
the vmalloc pressure is reduced.
This is based on a patch from Vikram Mulukutla on codeaurora.org:
https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.18/commit/drivers/base/firmware_class.c?h=rel/msm-3.18&id=0a328c5f6cd999f5c591f172216835636f39bcb5
Link: http://lkml.kernel.org/r/20160607164741.31849-4-stephen.boyd@linaro.org Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Vikram Mulukutla <markivx@codeaurora.org> Cc: Mark Brown <broonie@kernel.org> Cc: Ming Lei <ming.lei@canonical.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
firmware: provide infrastructure to make fw caching optional
Some low memory systems with complex peripherals cannot afford to have
the relatively large firmware images taking up valuable memory during
suspend and resume. Change the internal implementation of
firmware_class to disallow caching based on a configurable option. In
the near future, variants of request_firmware will take advantage of
this feature.
Link: http://lkml.kernel.org/r/20160607164741.31849-3-stephen.boyd@linaro.org
[stephen.boyd@linaro.org: Drop firmware_desc design and use flags] Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Ming Lei <ming.lei@canonical.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Boyd [Tue, 2 Aug 2016 21:04:22 +0000 (14:04 -0700)]
firmware: consolidate kmap/read/write logic
Some systems are memory constrained but they need to load very large
firmwares. The firmware subsystem allows drivers to request this
firmware be loaded from the filesystem, but this requires that the
entire firmware be loaded into kernel memory first before it's provided
to the driver. This can lead to a situation where we map the firmware
twice, once to load the firmware into kernel memory and once to copy the
firmware into the final resting place.
This design creates needless memory pressure and delays loading because
we have to copy from kernel memory to somewhere else. This patch sets
adds support to the request firmware API to load the firmware directly
into a pre-allocated buffer, skipping the intermediate copying step and
alleviating memory pressure during firmware loading. The drawback is
that we can't use the firmware caching feature because the memory for
the firmware cache is not managed by the firmware layer.
This patch (of 3):
We use similar structured code to read and write the kmapped firmware
pages. The only difference is read copies from the kmap region and
write copies to it. Consolidate this into one function to reduce
duplication.
Link: http://lkml.kernel.org/r/20160607164741.31849-2-stephen.boyd@linaro.org Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org> Cc: Vikram Mulukutla <markivx@codeaurora.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Ming Lei <ming.lei@canonical.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ross Zwisler [Tue, 2 Aug 2016 21:04:19 +0000 (14:04 -0700)]
radix-tree: fix comment about "exceptional" bits
The bottom two bits of radix tree entries are reserved for special use
by the radix tree code itself. A comment detailing their usage was
added by commit 3bcadd6fa6c4 ("radix-tree: free up the bottom bit of
exceptional entries for reuse")
This comment states that if the bottom two bits are '11', this means
that this is a locked exceptional entry.
It turns out that this bit combination was never actually used. Radix
tree locking for DAX was indeed implemented, but it actually used the
third LSB:
/* We use lowest available exceptional entry bit for locking */
#define RADIX_DAX_ENTRY_LOCK (1 << RADIX_TREE_EXCEPTIONAL_SHIFT)
This locking code was also made specific to the DAX code instead of
being generally implemented in radix-tree.h.
So, fix the comment.
Link: http://lkml.kernel.org/r/1468997731-2155-1-git-send-email-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Tue, 2 Aug 2016 21:04:16 +0000 (14:04 -0700)]
crc32: use ktime_get_ns() for measurement
The crc32 test function measures the elapsed time in nanoseconds, but
uses 'struct timespec' for that. We want to remove timespec from the
kernel for y2038 compatibility, and ktime_get_ns() also helps make the
code simpler here.
It is also slightly better to use monontonic time, as we are only
interested in the time difference.
Link: http://lkml.kernel.org/r/20160617143932.3289626-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "David S . Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>