Linus Torvalds [Thu, 2 May 2013 00:51:54 +0000 (17:51 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,
Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).
7kloc removed.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...
Linus Torvalds [Wed, 1 May 2013 22:51:46 +0000 (15:51 -0700)]
Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/efi changes from Peter Anvin:
"The bulk of these changes are cleaning up the efivars handling and
breaking it up into a tree of files. There are a number of fixes as
well.
The entire changeset is pretty big, but most of it is code movement.
Several of these commits are quite new; the history got very messed up
due to a mismerge with the urgent changes for rc8 which completely
broke IA64, and so Ingo requested that we rebase it to straighten it
out."
* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: remove "kfree(NULL)"
efi: locking fix in efivar_entry_set_safe()
efi, pstore: Read data from variable store before memcpy()
efi, pstore: Remove entry from list when erasing
efi, pstore: Initialise 'entry' before iterating
efi: split efisubsystem from efivars
efivarfs: Move to fs/efivarfs
efivars: Move pstore code into the new EFI directory
efivars: efivar_entry API
efivars: Keep a private global pointer to efivars
efi: move utf16 string functions to efi.h
x86, efi: Make efi_memblock_x86_reserve_range more readable
efivarfs: convert to use simple_open()
The patches in Rusty's modules-next tree such as "CONFIG_SYMBOL_PREFIX:
cleanup." cleans up the whole mess around symbol prefixes, so this patch
just attempts to fix the build in the meantime.
The intermediate definition of SYMBOL_NAME above isn't used and is
incorrect when CONFIG_SYMBOL_PREFIX is defined as CONFIG_SYMBOL_PREFIX
is a quoted string literal, so define __SYMBOL_NAME directly depending
on CONFIG_SYMBOL_PREFIX.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Mea-culpa-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Mike Frysinger <vapier@gentoo.org> Cc: uclinux-dist-devel@blackfin.uclinux.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 12 Apr 2013 17:03:36 +0000 (18:03 +0100)]
proc: Make the PROC_I() and PDE() macros internal to procfs
Make the PROC_I() and PDE() macros internal to procfs. This means making
PDE_DATA() out of line. This could be made more optimal by storing
PDE()->data into inode->i_private.
Also provide a __PDE_DATA() that is inline and internal to procfs.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Fri, 12 Apr 2013 16:27:28 +0000 (17:27 +0100)]
proc: Supply a function to remove a proc entry by PDE
Supply a function (proc_remove()) to remove a proc entry (and any subtree
rooted there) by proc_dir_entry pointer rather than by name and (optionally)
root dir entry pointer. This allows us to eliminate all remaining pde->name
accesses outside of procfs.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Grant Likely <grant.likely@linaro.or>
cc: linux-acpi@vger.kernel.org
cc: openipmi-developer@lists.sourceforge.net
cc: devicetree-discuss@lists.ozlabs.org
cc: linux-pci@vger.kernel.org
cc: netdev@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Fri, 12 Apr 2013 17:54:43 +0000 (18:54 +0100)]
ppc: Clean up scanlog
Clean up the pseries scanlog driver's use of procfs:
(1) Don't need to save the proc_dir_entry pointer as we have the filename to
remove with.
(2) Save the scan log buffer pointer in a static variable (there is only one
of it) and don't save it in the PDE (which doesn't have a destructor).
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: Paul Mackerras <paulus@samba.org>
cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Fri, 12 Apr 2013 23:48:49 +0000 (00:48 +0100)]
ppc: Clean up rtas_flash driver somewhat
Clean up some of the problems with the rtas_flash driver:
(1) It shouldn't fiddle with the internals of the procfs filesystem (altering
pde->count).
(2) If pid namespaces are in effect, then you can get multiple inodes
connected to a single pde, thereby rendering the pde->count > 2 test
useless.
(3) The pde->count fudging doesn't work for forked, dup'd or cloned file
descriptors, so add static mutexes and use them to wrap access to the
driver through read, write and release methods.
(4) The driver can only handle one device, so allocate most of the data
previously attached to the pde->data as static variables instead (though
allocate the validation data buffer with kmalloc).
(5) We don't need to save the pde pointers as long as we have the filenames
available for removal.
(6) Don't try to multiplex what the update file read method does based on the
filename. Instead provide separate file ops and split the function.
Whilst we're at it, tabulate the procfile information and loop through it when
creating or destroying them rather than manually coding each one.
[Folded fixes from Vasant Hegde <hegdevasant@linux.vnet.ibm.com>]
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: Paul Mackerras <paulus@samba.org>
cc: Anton Blanchard <anton@samba.org>
cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Fri, 12 Apr 2013 15:15:07 +0000 (16:15 +0100)]
drm: proc: Use remove_proc_subtree()
Use remove_proc_subtree() rather than remove_proc_entry() to remove a
minor-specific drm proc directory and all its children.
Things could theoretically be improved by storing the drm_minor pointer in the
minor-specific dir proc_dir_entry struct data and then scrapping the list of
proc files - but that's shared with the debugfs interface where you can't do
that, so I don't see an easy way of doing it.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: dri-devel@lists.freedesktop.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Fri, 12 Apr 2013 14:34:31 +0000 (15:34 +0100)]
drm: proc: Use minor->index to label things, not PDE->name
Use minor->index to label things, not the name field from the proc_dir_entry
of the /proc/dwm/<minor>/ directory.
Also, use "%u" not "%d" to render the value and use a 12-byte buffer in which
to render the integer, not a 16-byte buffer. The longest string an unsigned
int can give you is 10 chars (4294967295) plus a NUL, so round up to 12 as the
stack is likely to be 4- or 8-byte aligned.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: dri-devel@lists.freedesktop.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Fri, 12 Apr 2013 10:17:06 +0000 (11:17 +0100)]
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
Don't access the proc_dir_entry in ReiserFS's r_open(), r_start() r_show()
procfs interface functions.
ReiserFS stores the ->show() method pointer in PDE->data and the super_block
pointer in PDE->parent->data. This isn't changing.
Currently, ReiserFS passes the PDE pointer into seq_file::private from
r_open() so that r_start() and r_show() can then access it. Instead, use
seq_open_private() to allocate a two-pointer struct that's passed through
seq_file::private and put the ->show() method and the sb pointers in there.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: reiserfs-devel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Fri, 12 Apr 2013 13:06:01 +0000 (14:06 +0100)]
proc: Supply an accessor for getting the data from a PDE's parent
Supply an accessor function for getting the private data from the parent
proc_dir_entry struct of the proc_dir_entry struct associated with an inode.
ReiserFS, for instance, stores the super_block pointer in the proc directory
it makes for that super_block, and a pointer to the respective seq_file show
function in each of the proc files in that directory.
This allows a reduction in the number of file_operations structs, open
functions and seq_operations structs required. The problem otherwise is that
each show function requires two pieces of data but only has storage for one
per PDE (and this has no release function).
David Howells [Fri, 12 Apr 2013 01:59:48 +0000 (02:59 +0100)]
rtl8187se: Use a dir under /proc/net/r8180/
Create a dir under /proc/net/r8180/ named for the device and create that
device's files under there. This means that there won't be a problem for
multiple devices in the system (if such is possible) and it means we don't
need to save the 'device directory' PDE any more as we can just do a proc
subtree removal.
David Howells [Fri, 12 Apr 2013 01:48:30 +0000 (02:48 +0100)]
proc: Add proc_mkdir_data()
Add proc_mkdir_data() to allow procfs directories to be created that are
annotated at the time of creation with private data rather than doing this
post-creation. This means no access is then required to the proc_dir_entry
struct to set this.
David Howells [Fri, 12 Apr 2013 00:50:06 +0000 (01:50 +0100)]
proc: Split the namespace stuff out into linux/proc_ns.h
Split the proc namespace stuff out into linux/proc_ns.h.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: netdev@vger.kernel.org
cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull networking updates from David Miller:
"Highlights (1721 non-merge commits, this has to be a record of some
sort):
1) Add 'random' mode to team driver, from Jiri Pirko and Eric
Dumazet.
2) Make it so that any driver that supports configuration of multiple
MAC addresses can provide the forwarding database add and del
calls by providing a default implementation and hooking that up if
the driver doesn't have an explicit set of handlers. From Vlad
Yasevich.
3) Support GSO segmentation over tunnels and other encapsulating
devices such as VXLAN, from Pravin B Shelar.
4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.
5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
Dukkipati.
6) In the PHY layer, allow supporting wake-on-lan in situations where
the PHY registers have to be written for it to be configured.
Use it to support wake-on-lan in mv643xx_eth.
From Michael Stapelberg.
7) Significantly improve firewire IPV6 support, from YOSHIFUJI
Hideaki.
8) Allow multiple packets to be sent in a single transmission using
network coding in batman-adv, from Martin Hundebøll.
9) Add support for T5 cxgb4 chips, from Santosh Rastapur.
10) Generalize the VXLAN forwarding tables so that there is more
flexibility in configurating various aspects of the endpoints.
From David Stevens.
11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
from Dmitry Kravkov.
12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
Neira Ayuso.
13) Start adding networking selftests.
14) In situations of overload on the same AF_PACKET fanout socket, or
per-cpu packet receive queue, minimize drop by distributing the
load to other cpus/fanouts. From Willem de Bruijn and Eric
Dumazet.
15) Add support for new payload offset BPF instruction, from Daniel
Borkmann.
16) Convert several drivers over to mdoule_platform_driver(), from
Sachin Kamat.
17) Provide a minimal BPF JIT image disassembler userspace tool, from
Daniel Borkmann.
18) Rewrite F-RTO implementation in TCP to match the final
specification of it in RFC4138 and RFC5682. From Yuchung Cheng.
19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
you like netlink, so I implemented netlink dumping of netlink
sockets.") From Andrey Vagin.
20) Remove ugly passing of rtnetlink attributes into rtnl_doit
functions, from Thomas Graf.
21) Allow userspace to be able to see if a configuration change occurs
in the middle of an address or device list dump, from Nicolas
Dichtel.
22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
Frederic Sowa.
23) Increase accuracy of packet length used by packet scheduler, from
Jason Wang.
24) Beginning set of changes to make ipv4/ipv6 fragment handling more
scalable and less susceptible to overload and locking contention,
from Jesper Dangaard Brouer.
25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
instead. From Hong Zhiguo.
26) Optimize route usage in IPVS by avoiding reference counting where
possible, from Julian Anastasov.
27) Convert IPVS schedulers to RCU, also from Julian Anastasov.
28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
Eitzenberger.
29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
nfnetlink_log, and nfnetlink_queue. From Gao feng.
30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.
31) Support several new r8169 chips, from Hayes Wang.
32) Support tokenized interface identifiers in ipv6, from Daniel
Borkmann.
33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.
34) Add 802.1ad vlan offload support, from Patrick McHardy.
35) Support mmap() based netlink communication, also from Patrick
McHardy.
36) Support HW timestamping in mlx4 driver, from Amir Vadai.
37) Rationalize AF_PACKET packet timestamping when transmitting, from
Willem de Bruijn and Daniel Borkmann.
38) Bring parity to what's provided by /proc/net/packet socket dumping
and the info provided by netlink socket dumping of AF_PACKET
sockets. From Nicolas Dichtel.
39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
Poirier"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
filter: fix va_list build error
af_unix: fix a fatal race with bit fields
bnx2x: Prevent memory leak when cnic is absent
bnx2x: correct reading of speed capabilities
net: sctp: attribute printl with __printf for gcc fmt checks
netlink: kconfig: move mmap i/o into netlink kconfig
netpoll: convert mutex into a semaphore
netlink: Fix skb ref counting.
net_sched: act_ipt forward compat with xtables
mlx4_en: fix a build error on 32bit arches
Revert "bnx2x: allow nvram test to run when device is down"
bridge: avoid OOPS if root port not found
drivers: net: cpsw: fix kernel warn on cpsw irq enable
sh_eth: use random MAC address if no valid one supplied
3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
tg3: fix to append hardware time stamping flags
unix/stream: fix peeking with an offset larger than data in queue
unix/dgram: fix peeking with an offset larger than data in queue
unix/dgram: peek beyond 0-sized skbs
openvswitch: Remove unneeded ovs_netdev_get_ifindex()
...
Xi Wang [Wed, 1 May 2013 20:24:08 +0000 (16:24 -0400)]
filter: fix va_list build error
This patch fixes the following build error.
In file included from include/linux/filter.h:52:0,
from arch/arm/net/bpf_jit_32.c:14:
include/linux/printk.h:54:2: error: unknown type name ‘va_list’
include/linux/printk.h:105:21: error: unknown type name ‘va_list’
include/linux/printk.h:108:30: error: unknown type name ‘va_list’
Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 1 May 2013 20:20:04 +0000 (13:20 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
"Assorted fixes and cleanups to the existing drivers plus a new driver
for IMS Passenger Control Unit device they use for ther in-flight
entertainment system."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (44 commits)
Input: trackpoint - Optimize trackpoint init to use power-on reset
Input: apbps2 - convert to devm_ioremap_resource()
Input: ALPS - use %ph to print buffers
ARM - shmobile: Armadillo800EVA: Move st1232 reset pin handling
Input: st1232 - add reset pin handling
Input: st1232 - convert to devm_* infrastructure
Input: MT - handle semi-mt devices in core
Input: adxl34x - use spi_get_drvdata()
Input: ad7877 - use spi_get_drvdata() and spi_set_drvdata()
Input: ads7846 - use spi_get_drvdata() and spi_set_drvdata()
Input: ims-pcu - fix a memory leak on error
Input: sysrq - supplement reset sequence with timeout functionality
Input: tegra-kbc - support for defining row/columns based on SoC
Input: imx_keypad - switch to using managed resources
Input: arc_ps2 - add support for device tree
Input: mma8450 - fix signed 12bits to 32bits conversion
Input: eeti_ts - remove redundant null check
Input: edt-ft5x06 - remove redundant null check before kfree
Input: ad714x - add CONFIG_PM_SLEEP to suspend/resume functions
Input: adxl34x - add CONFIG_PM_SLEEP to suspend/resume functions
...
Eric Dumazet [Wed, 1 May 2013 05:24:03 +0000 (05:24 +0000)]
af_unix: fix a fatal race with bit fields
Using bit fields is dangerous on ppc64/sparc64, as the compiler [1]
uses 64bit instructions to manipulate them.
If the 64bit word includes any atomic_t or spinlock_t, we can lose
critical concurrent changes.
This is happening in af_unix, where unix_sk(sk)->gc_candidate/
gc_maybe_cycle/lock share the same 64bit word.
This leads to fatal deadlock, as one/several cpus spin forever
on a spinlock that will never be available again.
A safer way would be to use a long to store flags.
This way we are sure compiler/arch wont do bad things.
As we own unix_gc_lock spinlock when clearing or setting bits,
we can use the non atomic __set_bit()/__clear_bit().
recursion_level can share the same 64bit location with the spinlock,
as it is set only with this spinlock held.
[1] bug fixed in gcc-4.8.0 :
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
Reported-by: Ambrose Feinstein <ambrose@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 1 May 2013 19:07:50 +0000 (15:07 -0400)]
Merge branch 'bnx2x'
Yuval Mintz says:
====================
This fixes 2 small bugs - one which may cause an unnecessary link flap,
and the other is a small memory leak when unloading while cnic is not
loaded.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 1 May 2013 04:27:58 +0000 (04:27 +0000)]
bnx2x: Prevent memory leak when cnic is absent
bnx2x driver allocates searcher T2 tables, but it releases that memory
during unload only released if the cnic is loaded.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yaniv Rosner [Wed, 1 May 2013 04:27:57 +0000 (04:27 +0000)]
bnx2x: correct reading of speed capabilities
When the bnx2x driver reads the port configuration - mask irrelevant bits.
Without this change, the unintended bits may cause the driver to needlessly
toggle the link, as a comparison in the link flap avoidance flow will show
that the old link did not advertise the same capabilities and thus cannot
be retained.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 1 May 2013 01:37:20 +0000 (01:37 +0000)]
netlink: kconfig: move mmap i/o into netlink kconfig
Currently, in menuconfig, Netlink's new mmaped IO is the very first
entry under the ``Networking support'' item and comes even before
``Networking options'':
Lets move this into ``Networking options'' under netlink's Kconfig,
since this might be more appropriate. Introduced by commit ccdfcc398
(``netlink: mmaped netlink: ring setup'').
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This resulted from my commit ca99ca14c which introduced a mutex_trylock
operation in a path that could execute in interrupt context. When mutex
debugging is enabled, the above warns the user when we are in fact
exectuting in interrupt context
interrupt context.
After some discussion, It seems that a semaphore is the proper mechanism to use
here. While mutexes are defined to be unusable in interrupt context, no such
condition exists for semaphores (save for the fact that the non blocking api
calls, like up and down_trylock must be used when in irq context).
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Bart Van Assche <bvanassche@acm.org> CC: Bart Van Assche <bvanassche@acm.org> CC: David Miller <davem@davemloft.net> CC: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f9c2288837ba072b21dba955f04a4c97eaa77b1e (netlink:
implement memory mapped recvmsg) increamented skb->users
ref count twice for a dump op which does not look right.
Following patch fixes that.
CC: Patrick McHardy <kaber@trash.net> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pull tile arch changes from Chris Metcalf:
"These are some minor new feature work and other changes that didn't
merit getting pushed up after the 3.9 merge window closed.
There should be a lot more activity in the 3.11 merge window"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: Fix syscall return value passed to tracepoint
tile: comment assumption about __insn_mtspr for <asm/irqflags.h>
tile: ns2cycles should use __raw_get_cpu_var
arch: remove KCORE_ELF again [tile]
tile: remove two outdated Kconfig entries
tile: support atomic64_dec_if_positive()
tile: support TIF_SYSCALL_TRACEPOINT; select HAVE_SYSCALL_TRACEPOINTS
tile: Add definition of NR_syscalls
tile: move declaration of sys_call_table to <asm/syscall.h>
arch/tile: Enable HAVE_ARCH_TRACEHOOK
arch/tile: Call tracehook_report_syscall_{entry,exit} in syscall trace
Steven Rostedt [Wed, 1 May 2013 17:35:51 +0000 (13:35 -0400)]
init: Do not warn on non-zero initcall return
Commit f91eb62f71b3 ("init: scream bloody murder if interrupts are
enabled too early") added three new warnings. The first two seemed
reasonable, but the third included a warning when an initcall returned
non-zero. Although, the third WARN() does include an imbalanced preempt
disabled, or irqs disable, it shouldn't warn if it only had an initcall
that just returns non-zero.
In fact, according to Linus, it shouldn't print at all. As it only
prints with initcall_debug set, and that already shows enough
information to fix things.
Linus Torvalds [Wed, 1 May 2013 16:57:04 +0000 (09:57 -0700)]
Merge branch 'topic/omap3isp' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull omap3isp clk support from Mauro Carvalho Chehab:
"This patch were sent in separate as it depends on a merge from clock
framework, that you merged in commit 362ed48dee50"
* 'topic/omap3isp' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] omap3isp: Use the common clock framework
Linus Torvalds [Wed, 1 May 2013 15:17:51 +0000 (08:17 -0700)]
Merge branch 'ipc-scalability'
Merge IPC cleanup and scalability patches from Andrew Morton.
This cleans up many of the oddities in the IPC code, uses the list
iterator helpers, splits out locking and adds per-semaphore locks for
greater scalability of the IPC semaphore code.
Most normal user-level locking by now uses futexes (ie pthreads, but
also a lot of specialized locks), but SysV IPC semaphores are apparently
still used in some big applications, either for portability reasons, or
because they offer tracking and undo (and you don't need to have a
special shared memory area for them).
Our IPC semaphore scalability was pitiful. We used to lock much too big
ranges, and we used to have a single ipc lock per ipc semaphore array.
Most loads never cared, but some do. There are some numbers in the
individual commits.
* ipc-scalability:
ipc: sysv shared memory limited to 8TiB
ipc/msg.c: use list_for_each_entry_[safe] for list traversing
ipc,sem: fine grained locking for semtimedop
ipc,sem: have only one list in struct sem_queue
ipc,sem: open code and rename sem_lock
ipc,sem: do not hold ipc lock more than necessary
ipc: introduce lockless pre_down ipcctl
ipc: introduce obtaining a lockless ipc object
ipc: remove bogus lock comment for ipc_checkid
ipc/msgutil.c: use linux/uaccess.h
ipc: refactor msg list search into separate function
ipc: simplify msg list search
ipc: implement MSG_COPY as a new receive mode
ipc: remove msg handling from queue scan
ipc: set EFAULT as default error in load_msg()
ipc: tighten msg copy loops
ipc: separate msg allocation from userspace copy
ipc: clamp with min()
Robin Holt [Wed, 1 May 2013 02:15:54 +0000 (19:15 -0700)]
ipc: sysv shared memory limited to 8TiB
Trying to run an application which was trying to put data into half of
memory using shmget(), we found that having a shmall value below 8EiB-8TiB
would prevent us from using anything more than 8TiB. By setting
kernel.shmall greater than 8EiB-8TiB would make the job work.
In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.
[akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int] Signed-off-by: Robin Holt <holt@sgi.com> Reported-by: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ipc/msg.c: use list_for_each_entry_[safe] for list traversing
The ipc/msg.c code does its list operations by hand and it open-codes the
accesses, instead of using for_each_entry_[safe].
Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rik van Riel [Wed, 1 May 2013 02:15:44 +0000 (19:15 -0700)]
ipc,sem: fine grained locking for semtimedop
Introduce finer grained locking for semtimedop, to handle the common case
of a program wanting to manipulate one semaphore from an array with
multiple semaphores.
If the call is a semop manipulating just one semaphore in an array with
multiple semaphores, only take the lock for that semaphore itself.
If the call needs to manipulate multiple semaphores, or another caller is
in a transaction that manipulates multiple semaphores, the sem_array lock
is taken, as well as all the locks for the individual semaphores.
On a 24 CPU system, performance numbers with the semop-multi
test with N threads and N semaphores, look like this:
Rik van Riel [Wed, 1 May 2013 02:15:39 +0000 (19:15 -0700)]
ipc,sem: have only one list in struct sem_queue
Having only one list in struct sem_queue, and only queueing simple
semaphore operations on the list for the semaphore involved, allows us to
introduce finer grained locking for semtimedop.
Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: Emmanuel Benisty <benisty.e@gmail.com> Cc: Jason Low <jason.low2@hp.com> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2) While on an Oracle swingbench DSS (data mining) workload the
improvements are not as exciting as with Rik's benchmark, we can see
some positive numbers. For an 8 socket machine the following are the
percentages of %sys time incurred in the ipc lock:
Davidlohr Bueso [Wed, 1 May 2013 02:15:24 +0000 (19:15 -0700)]
ipc: introduce lockless pre_down ipcctl
Various forms of ipc use ipcctl_pre_down() to retrieve an ipc object and
check permissions, mostly for IPC_RMID and IPC_SET commands.
Introduce ipcctl_pre_down_nolock(), a lockless version of this function.
The locking version is retained, yet modified to call the nolock version
without affecting its semantics, thus transparent to all ipc callers.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: Emmanuel Benisty <benisty.e@gmail.com> Cc: Jason Low <jason.low2@hp.com> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Wed, 1 May 2013 02:15:19 +0000 (19:15 -0700)]
ipc: introduce obtaining a lockless ipc object
Through ipc_lock() and therefore ipc_lock_check() we currently return the
locked ipc object. This is not necessary for all situations and can,
therefore, cause unnecessary ipc lock contention.
Introduce analogous ipc_obtain_object() and ipc_obtain_object_check()
functions that only lookup and return the ipc object.
Both these functions must be called within the RCU read critical section.
[akpm@linux-foundation.org: propagate the ipc_obtain_object() errno from ipc_lock()] Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: Rik van Riel <riel@redhat.com> Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Acked-by: Michel Lespinasse <walken@google.com> Cc: Emmanuel Benisty <benisty.e@gmail.com> Cc: Jason Low <jason.low2@hp.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Wed, 1 May 2013 02:15:14 +0000 (19:15 -0700)]
ipc: remove bogus lock comment for ipc_checkid
This series makes the sysv semaphore code more scalable, by reducing the
time the semaphore lock is held, and making the locking more scalable for
semaphore arrays with multiple semaphores.
The first four patches were written by Davidlohr Buesso, and reduce the
hold time of the semaphore lock.
The last three patches change the sysv semaphore code locking to be more
fine grained, providing a performance boost when multiple semaphores in a
semaphore array are being manipulated simultaneously.
On a 24 CPU system, performance numbers with the semop-multi
test with N threads and N semaphores, look like this:
Linus Torvalds [Wed, 1 May 2013 15:04:12 +0000 (08:04 -0700)]
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Mostly performance and bug fixes, plus some cleanups. The one new
feature this merge window is a new ioctl EXT4_IOC_SWAP_BOOT which
allows installation of a hidden inode designed for boot loaders."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (50 commits)
ext4: fix type-widening bug in inode table readahead code
ext4: add check for inodes_count overflow in new resize ioctl
ext4: fix Kconfig documentation for CONFIG_EXT4_DEBUG
ext4: fix online resizing for ext3-compat file systems
jbd2: trace when lock_buffer in do_get_write_access takes a long time
ext4: mark metadata blocks using bh flags
buffer: add BH_Prio and BH_Meta flags
ext4: mark all metadata I/O with REQ_META
ext4: fix readdir error in case inline_data+^dir_index.
ext4: fix readdir error in the case of inline_data+dir_index
jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset
ext4: mext_insert_extents should update extent block checksum
ext4: move quota initialization out of inode allocation transaction
ext4: reserve xattr index for Rich ACL support
jbd2: reduce journal_head size
ext4: clear buffer_uninit flag when submitting IO
ext4: use io_end for multiple bios
ext4: make ext4_bio_write_page() use BH_Async_Write flags
ext4: Use kstrtoul() instead of parse_strtoul()
ext4: defragmentation code cleanup
...
Linus Torvalds [Wed, 1 May 2013 14:44:37 +0000 (07:44 -0700)]
Merge tag 'tag-for-linus-3.10' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf
Pull dma-buf updates from Sumit Semwal:
"Added debugfs support to dma-buf"
* tag 'tag-for-linus-3.10' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf:
dma-buf: Add debugfs support
dma-buf: replace dma_buf_export() with dma_buf_export_named()
Linus Torvalds [Wed, 1 May 2013 14:43:05 +0000 (07:43 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel
Pull Hexagon fixes from Richard Kuo:
"Changes for the Hexagon architecture (and one touching OpenRISC).
They include various fixes to make use of additional arch features and
cleanups. The largest functional change is a cleanup of the signal
and event return paths"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel: (32 commits)
Hexagon: add v4 CS regs to core copyout macro
Hexagon: use correct translation for VMALLOC_START
Hexagon: use correct translations for DMA mappings
Hexagon: fix return value for notify_resume case in do_work_pending
Hexagon: fix signal number for user mem faults
Hexagon: remove two Kconfig entries
arch: remove CONFIG_GENERIC_FIND_NEXT_BIT again
Hexagon: update copyright dates
Hexagon: add translation types for __vmnewmap
Hexagon: fix signal.c compile error
Hexagon: break up user fn/arg register setting
Hexagon: use generic sys_fork, sys_vfork, and sys_clone
Hexagon: fix psp/sp macro
Hexagon: fix up int enable/disable at ret_from_fork
Hexagon: add IOMEM and _relaxed IO macros
Hexagon: switch to using the device type for IO mappings
Hexagon: don't print info for offline CPU's
Hexagon: add support for single-stepping (v4+)
Hexagon: use correct work mask when checking for more work
Hexagon: add support for additional exceptions
...
Linus Torvalds [Wed, 1 May 2013 14:32:21 +0000 (07:32 -0700)]
tty: fix up atime/mtime mess, take three
We first tried to avoid updating atime/mtime entirely (commit b0de59b5733d: "TTY: do not update atime/mtime on read/write"), and then
limited it to only update it occasionally (commit 37b7f3c76595: "TTY:
fix atime/mtime regression"), but it turns out that this was both
insufficient and overkill.
It was insufficient because we let people attach to the shared ptmx node
to see activity without even reading atime/mtime, and it was overkill
because the "only once a minute" means that you can't really tell an
idle person from an active one with 'w'.
So this tries to fix the problem properly. It marks the shared ptmx
node as un-notifiable, and it lowers the "only once a minute" to a few
seconds instead - still long enough that you can't time individual
keystrokes, but short enough that you can tell whether somebody is
active or not.
Linus Torvalds [Wed, 1 May 2013 14:21:43 +0000 (07:21 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull compat cleanup from Al Viro:
"Mostly about syscall wrappers this time; there will be another pile
with patches in the same general area from various people, but I'd
rather push those after both that and vfs.git pile are in."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
syscalls.h: slightly reduce the jungles of macros
get rid of union semop in sys_semctl(2) arguments
make do_mremap() static
sparc: no need to sign-extend in sync_file_range() wrapper
ppc compat wrappers for add_key(2) and request_key(2) are pointless
x86: trim sys_ia32.h
x86: sys32_kill and sys32_mprotect are pointless
get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
merge compat sys_ipc instances
consolidate compat lookup_dcookie()
convert vmsplice to COMPAT_SYSCALL_DEFINE
switch getrusage() to COMPAT_SYSCALL_DEFINE
switch epoll_pwait to COMPAT_SYSCALL_DEFINE
convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect
make HAVE_SYSCALL_WRAPPERS unconditional
consolidate cond_syscall and SYSCALL_ALIAS declarations
teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long
get rid of duplicate logics in __SC_....[1-6] definitions
Add debugfs support to make it easier to print debug information
about the dma-buf buffers.
Cc: Dave Airlie <airlied@redhat.com>
[minor fixes on init and warning fix] Cc: Dan Carpenter <dan.carpenter@oracle.com>
[remove double unlock in fail case] Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Sumit Semwal [Fri, 22 Mar 2013 12:52:16 +0000 (18:22 +0530)]
dma-buf: replace dma_buf_export() with dma_buf_export_named()
For debugging purposes, it is useful to have a name-string added
while exporting buffers. Hence, dma_buf_export() is replaced with
dma_buf_export_named(), which additionally takes 'exp_name' as a
parameter.
For backward compatibility, and for lazy exporters who don't wish to
name themselves, a #define dma_buf_export() is also made available,
which adds a __FILE__ instead of 'exp_name'.
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
[Thanks for the idea!] Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Paul Bolle [Thu, 21 Mar 2013 10:13:17 +0000 (11:13 +0100)]
Hexagon: remove two Kconfig entries
The Kconfig entries for HEXAGON_VM and HEXAGON_ANGEL_TRAPS were added,
together with the configuration and makefiles for the Hexagon
architecture, in v3.2. They have never been used. They can safely be
removed.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
[rkuo@codeaurora.org: adjust for line changes in Kconfig] Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Richard Kuo [Tue, 29 May 2012 22:23:14 +0000 (17:23 -0500)]
Hexagon: Signal and return path fixes
This fixes the return value of sigreturn and moves the work pending check
into a c routine for readability and fixes the loop for multiple pending
signals. Based on feedback from Al Viro.