Martin Topholm [Thu, 14 Nov 2013 14:35:30 +0000 (15:35 +0100)]
netfilter: synproxy: send mss option to backend
When the synproxy_parse_options is called on the client ack the mss
option will not be present. Consequently mss wont be included in the
backend syn packet, which falls back to 536 bytes mss.
Therefore XT_SYNPROXY_OPT_MSS is explicitly flagged when recovering mss
value from cookie.
Signed-off-by: Martin Topholm <mph@one.com> Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
1) The addition of nftables. No longer will we need protocol aware
firewall filtering modules, it can all live in userspace.
At the core of nftables is a, for lack of a better term, virtual
machine that executes byte codes to inspect packet or metadata
(arriving interface index, etc.) and make verdict decisions.
Besides support for loading packet contents and comparing them, the
interpreter supports lookups in various datastructures as
fundamental operations. For example sets are supports, and
therefore one could create a set of whitelist IP address entries
which have ACCEPT verdicts attached to them, and use the appropriate
byte codes to do such lookups.
Since the interpreted code is composed in userspace, userspace can
do things like optimize things before giving it to the kernel.
Another major improvement is the capability of atomically updating
portions of the ruleset. In the existing netfilter implementation,
one has to update the entire rule set in order to make a change and
this is very expensive.
Userspace tools exist to create nftables rules using existing
netfilter rule sets, but both kernel implementations will need to
co-exist for quite some time as we transition from the old to the
new stuff.
Kudos to Patrick McHardy, Pablo Neira Ayuso, and others who have
worked so hard on this.
2) Daniel Borkmann and Hannes Frederic Sowa made several improvements
to our pseudo-random number generator, mostly used for things like
UDP port randomization and netfitler, amongst other things.
In particular the taus88 generater is updated to taus113, and test
cases are added.
3) Support 64-bit rates in HTB and TBF schedulers, from Eric Dumazet
and Yang Yingliang.
4) Add support for new 577xx tigon3 chips to tg3 driver, from Nithin
Sujir.
5) Fix two fatal flaws in TCP dynamic right sizing, from Eric Dumazet,
Neal Cardwell, and Yuchung Cheng.
6) Allow IP_TOS and IP_TTL to be specified in sendmsg() ancillary
control message data, much like other socket option attributes.
From Francesco Fusco.
7) Allow applications to specify a cap on the rate computed
automatically by the kernel for pacing flows, via a new
SO_MAX_PACING_RATE socket option. From Eric Dumazet.
8) Make the initial autotuned send buffer sizing in TCP more closely
reflect actual needs, from Eric Dumazet.
9) Currently early socket demux only happens for TCP sockets, but we
can do it for connected UDP sockets too. Implementation from Shawn
Bohrer.
10) Refactor inet socket demux with the goal of improving hash demux
performance for listening sockets. With the main goals being able
to use RCU lookups on even request sockets, and eliminating the
listening lock contention. From Eric Dumazet.
11) The bonding layer has many demuxes in it's fast path, and an RCU
conversion was started back in 3.11, several changes here extend the
RCU usage to even more locations. From Ding Tianhong and Wang
Yufen, based upon suggestions by Nikolay Aleksandrov and Veaceslav
Falico.
12) Allow stackability of segmentation offloads to, in particular, allow
segmentation offloading over tunnels. From Eric Dumazet.
13) Significantly improve the handling of secret keys we input into the
various hash functions in the inet hashtables, TCP fast open, as
well as syncookies. From Hannes Frederic Sowa. The key fundamental
operation is "net_get_random_once()" which uses static keys.
Hannes even extended this to ipv4/ipv6 fragmentation handling and
our generic flow dissector.
14) The generic driver layer takes care now to set the driver data to
NULL on device removal, so it's no longer necessary for drivers to
explicitly set it to NULL any more. Many drivers have been cleaned
up in this way, from Jingoo Han.
15) Add a BPF based packet scheduler classifier, from Daniel Borkmann.
16) Improve CRC32 interfaces and generic SKB checksum iterators so that
SCTP's checksumming can more cleanly be handled. Also from Daniel
Borkmann.
17) Add a new PMTU discovery mode, IP_PMTUDISC_INTERFACE, which forces
using the interface MTU value. This helps avoid PMTU attacks,
particularly on DNS servers. From Hannes Frederic Sowa.
18) Use generic XPS for transmit queue steering rather than internal
(re-)implementation in virtio-net. From Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1622 commits)
random32: add test cases for taus113 implementation
random32: upgrade taus88 generator to taus113 from errata paper
random32: move rnd_state to linux/random.h
random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized
random32: add periodic reseeding
random32: fix off-by-one in seeding requirement
PHY: Add RTL8201CP phy_driver to realtek
xtsonic: add missing platform_set_drvdata() in xtsonic_probe()
macmace: add missing platform_set_drvdata() in mace_probe()
ethernet/arc/arc_emac: add missing platform_set_drvdata() in arc_emac_probe()
ipv6: protect for_each_sk_fl_rcu in mem_check with rcu_read_lock_bh
vlan: Implement vlan_dev_get_egress_qos_mask as an inline.
ixgbe: add warning when max_vfs is out of range.
igb: Update link modes display in ethtool
netfilter: push reasm skb through instead of original frag skbs
ip6_output: fragment outgoing reassembled skb properly
MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
net_sched: tbf: support of 64bit rates
ixgbe: deleting dfwd stations out of order can cause null ptr deref
ixgbe: fix build err, num_rx_queues is only available with CONFIG_RPS
...
Linus Torvalds [Wed, 13 Nov 2013 06:45:43 +0000 (15:45 +0900)]
Merge branch 'akpm' (patches from Andrew Morton)
Merge first patch-bomb from Andrew Morton:
"Quite a lot of other stuff is banked up awaiting further
next->mainline merging, but this batch contains:
- Lots of random misc patches
- OCFS2
- Most of MM
- backlight updates
- lib/ updates
- printk updates
- checkpatch updates
- epoll tweaking
- rtc updates
- hfs
- hfsplus
- documentation
- procfs
- update gcov to gcc-4.7 format
- IPC"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (269 commits)
ipc, msg: fix message length check for negative values
ipc/util.c: remove unnecessary work pending test
devpts: plug the memory leak in kill_sb
./Makefile: export initial ramdisk compression config option
init/Kconfig: add option to disable kernel compression
drivers: w1: make w1_slave::flags long to avoid memory corruption
drivers/w1/masters/ds1wm.cuse dev_get_platdata()
drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
drivers/memstick/core/mspro_block.c: fix attributes array allocation
drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
kernel/panic.c: reduce 1 byte usage for print tainted buffer
gcov: reuse kbasename helper
kernel/gcov/fs.c: use pr_warn()
kernel/module.c: use pr_foo()
gcov: compile specific gcov implementation based on gcc version
gcov: add support for gcc 4.7 gcov format
gcov: move gcov structs definitions to a gcc version specific file
kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
...
Linus Torvalds [Wed, 13 Nov 2013 06:34:18 +0000 (15:34 +0900)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:
- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixes
plus misc stuff all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
...
Linus Torvalds [Wed, 13 Nov 2013 06:31:13 +0000 (15:31 +0900)]
Merge tag 'dlm-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm fix from David Teigland:
"This set includes a single fix to resolve to a race that could cause
lockspace shutdown to incorrectly return -EBUSY"
* tag 'dlm-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSY
Linus Torvalds [Wed, 13 Nov 2013 06:29:38 +0000 (15:29 +0900)]
Merge tag 'upstream-3.13-rc1' of git://git.infradead.org/linux-ubi
Pull UBI changes from Artem Bityutskiy:
"A bunch of fixes for the fastmap feature, which is still new and
rather experimental. It looks like it starts getting more users.
No significant changes for the "classical" non-fastmap UBI"
* tag 'upstream-3.13-rc1' of git://git.infradead.org/linux-ubi:
UBI: Add some asserts to ubi_attach_fastmap()
UBI: Fix memory leak in ubi_attach_fastmap() error path
UBI: simplify image sequence test
UBI: fastmap: fix backward compatibility with image_seq
UBI: Call scan_all() with correct offset in error case
UBI: Fix error path in scan_pool()
UBI: fix refill_wl_user_pool()
Linus Torvalds [Wed, 13 Nov 2013 06:28:45 +0000 (15:28 +0900)]
Merge tag 'upstream-3.13-rc1' of git://git.infradead.org/linux-ubifs
Pull ubifs changes from Artem Bityutskiy:
"Mostly fixes for the power cut emulation UBIFS mode, and only one
functional change which fixes a return error code"
* tag 'upstream-3.13-rc1' of git://git.infradead.org/linux-ubifs:
UBIFS: correct data corruption range
UBIFS: fix return code
UBIFS: remove unnecessary code in ubifs_garbage_collect
Linus Torvalds [Wed, 13 Nov 2013 06:27:00 +0000 (15:27 +0900)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
"This adds a ->writepage() implementation to fuse, improving mmaped
writeout and paving the way for buffered writeback.
And there's a patch to add a fix minor number for /dev/cuse, similarly
to /dev/fuse"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: writepages: protect secondary requests from fuse file release
fuse: writepages: update bdi writeout when deleting secondary request
fuse: writepages: crop secondary requests
fuse: writepages: roll back changes if request not found
cuse: add fix minor number to /dev/cuse
fuse: writepage: skip already in flight
fuse: writepages: handle same page rewrites
fuse: writepages: fix aggregation
fuse: fix race in fuse_writepages()
fuse: Implement writepages callback
fuse: don't BUG on no write file
fuse: lock page in mkwrite
fuse: Prepare to handle multiple pages in writeback
fuse: Getting file for writeback helper
Linus Torvalds [Wed, 13 Nov 2013 06:25:47 +0000 (15:25 +0900)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext[23], udf and quota fixes from Jan Kara:
"Assorted fixes in quota, ext2, ext3 & udf.
Probably the most important is a fix of fs corruption issue in ext2
XIP support (OTOH xip is rarely used)"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: Fix fs corruption in ext2_get_xip_mem()
quota: info leak in quota_getquota()
jbd: Revert "jbd: remove dependency on __GFP_NOFAIL"
udf: fix for pathetic mount times in case of invalid file system
ext3: Count journal as bsddf overhead in ext3_statfs
Linus Torvalds [Wed, 13 Nov 2013 06:24:40 +0000 (15:24 +0900)]
Merge tag 'for-f2fs-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"This patch-set includes the following major enhancement patches.
- add a sysfs to control reclaiming free segments
- enhance the f2fs global lock procedures
- enhance the victim selection flow
- wait for selected node blocks during fsync
- add some tracepoints
- add a config to remove abundant BUG_ONs
The other bug fixes are as follows.
- fix deadlock on acl operations
- fix some bugs with respect to orphan inodes
And, there are a bunch of cleanups"
* tag 'for-f2fs-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits)
f2fs: issue more large discard command
f2fs: fix memory leak after kobject init failed in fill_super
f2fs: cleanup waiting routine for writeback pages in cp
f2fs: avoid to use a NULL point in destroy_segment_manager
f2fs: remove unnecessary TestClearPageError when wait pages writeback
f2fs: update f2fs document
f2fs: avoid to wait all the node blocks during fsync
f2fs: check all ones or zeros bitmap with bitops for better mount performance
f2fs: change the method of calculating the number summary blocks
f2fs: fix calculating incorrect free size when update xattr in __f2fs_setxattr
f2fs: add an option to avoid unnecessary BUG_ONs
f2fs: introduce CONFIG_F2FS_CHECK_FS for BUG_ON control
f2fs: fix a deadlock during init_acl procedure
f2fs: clean up acl flow for better readability
f2fs: remove unnecessary segment bitmap updates
f2fs: add tracepoint for vm_page_mkwrite
f2fs: add tracepoint for set_page_dirty
f2fs: remove redundant set_page_dirty from write_compacted_summaries
f2fs: add reclaiming control by sysfs
f2fs: introduce f2fs_balance_fs_bg for some background jobs
...
Linus Torvalds [Wed, 13 Nov 2013 06:21:53 +0000 (15:21 +0900)]
Merge branch 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
"Not too much activity this time around. css_id is finally killed and
a minor update to device_cgroup"
* 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
device_cgroup: remove can_attach
cgroup: kill css_id
memcg: stop using css id
memcg: fail to create cgroup if the cgroup id is too big
memcg: convert to use cgroup id
memcg: convert to use cgroup_is_descendant()
Linus Torvalds [Wed, 13 Nov 2013 06:18:22 +0000 (15:18 +0900)]
Merge branch 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata changes from Tejun Heo:
"Nothing too interesting. Only two minor fixes in libata core. Most
changes are specific to hardware which isn't too common"
* 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: Add Device IDs for Intel Wildcat Point-LP
sata_rcar: Convert to clk_prepare/unprepare
drivers/libata: Set max sector to 65535 for Slimtype DVD A DS8A9SH drive
libata: Add some missing command descriptions
sata_highbank: clear whole array in highbank_initialize_phys()
ahci: disabled FBS prior to issuing software reset
libata: Fix display of sata speed
ahci: imx: setup power saving methods
ata_piix: minor typo and a printk fix
ahci: Changing two module params with static and __read_mostly
Linus Torvalds [Wed, 13 Nov 2013 06:17:16 +0000 (15:17 +0900)]
Merge branch 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu changes from Tejun Heo:
"Two smallish changes for percpu. Two patches to remove unused
this_cpu_xor() and one to fix a bug in percpu init failure path so
that it can reach the proper BUG() instead of oopsing earlier"
* 'for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
x86: remove this_cpu_xor() implementation
percpu: remove this_cpu_xor() implementation
percpu: fix bootmem error handling in pcpu_page_first_chunk()
Mathias Krause [Tue, 12 Nov 2013 23:11:47 +0000 (15:11 -0800)]
ipc, msg: fix message length check for negative values
On 64 bit systems the test for negative message sizes is bogus as the
size, which may be positive when evaluated as a long, will get truncated
to an int when passed to load_msg(). So a long might very well contain a
positive value but when truncated to an int it would become negative.
That in combination with a small negative value of msg_ctlmax (which will
be promoted to an unsigned type for the comparison against msgsz, making
it a big positive value and therefore make it pass the check) will lead to
two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too
small buffer as the addition of alen is effectively a subtraction. 2/ The
copy_from_user() call in load_msg() will first overflow the buffer with
userland data and then, when the userland access generates an access
violation, the fixup handler copy_user_handle_tail() will try to fill the
remainder with zeros -- roughly 4GB. That almost instantly results in a
system crash or reset.
,-[ Reproducer (needs to be run as root) ]--
| #include <sys/stat.h>
| #include <sys/msg.h>
| #include <unistd.h>
| #include <fcntl.h>
|
| int main(void) {
| long msg = 1;
| int fd;
|
| fd = open("/proc/sys/kernel/msgmax", O_WRONLY);
| write(fd, "-1", 2);
| close(fd);
|
| msgsnd(0, &msg, 0xfffffff0, IPC_NOWAIT);
|
| return 0;
| }
'---
Fix the issue by preventing msgsz from getting truncated by consistently
using size_t for the message length. This way the size checks in
do_msgsnd() could still be passed with a negative value for msg_ctlmax but
we would fail on the buffer allocation in that case and error out.
Also change the type of m_ts from int to size_t to avoid similar nastiness
in other code paths -- it is used in similar constructs, i.e. signed vs.
unsigned checks. It should never become negative under normal
circumstances, though.
Setting msg_ctlmax to a negative value is an odd configuration and should
be prevented. As that might break existing userland, it will be handled
in a separate commit so it could easily be reverted and reworked without
reintroducing the above described bug.
Hardening mechanisms for user copy operations would have catched that bug
early -- e.g. checking slab object sizes on user copy operations as the
usercopy feature of the PaX patch does. Or, for that matter, detect the
long vs. int sign change due to truncation, as the size overflow plugin
of the very same patch does.
Ilija Hadzic [Tue, 12 Nov 2013 23:11:45 +0000 (15:11 -0800)]
devpts: plug the memory leak in kill_sb
When devpts is unmounted, there may be a no-longer-used IDR tree hanging
off the superblock we are about to kill. This needs to be cleaned up
before destroying the SB.
The leak is usually not a big deal because unmounting devpts is typically
done when shutting down the whole machine. However, shutting down an LXC
container instead of a physical machine exposes the problem (the garbage
is detectable with kmemleak).
Make menuconfig allows one to choose compression format of an initial
ramdisk image. But this choice does not result in duly compressed ramdisk
image. Because - $ make install - does not pass on the selected
compression choice to the dracut(8) tool, which creates the initramfs
file. dracut(8) generates the image with the default compression, ie.
gzip(1).
This patch exports the selected compression option to a sub-shell
environment, so that it could be used by dracut(8) tool to generate
appropriately compressed initramfs images.
There isn't a straightforward way to pass on options to dracut(8) via
positional parameters. Because it is indirectly invoked at the end of a $
make install sequence.
Signed-off-by: P J P <ppandit@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
init/Kconfig: add option to disable kernel compression
Some ARC users say they can boot faster with without kernel compression.
This probably depends on things like the FLASH chip they use etc.
Until now, kernel compression can only be disabled by removing "select
HAVE_<compression>" lines from the architecture Kconfig. So add the
Kconfig logic to permit disabling of kernel compression.
Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers: w1: make w1_slave::flags long to avoid memory corruption
On architectures where long is more then 32 bits, modifying a 32-bit field
with set_bit (and other atomic bit operations) may cause bytes following
the field to by modified.
Because the endianness of the bits within a field is the native endianness
of the CPU[1], on big-endian machines, bit number zero is in the last byte
of the field.
Therefore, `set_bit(0, ptr)' on a 64-bit big-endian machine is roughly
equivalent to `((char *)ptr)[7] |= 1', and since w1 driver uses a 32-bit
field for holding the flags, this causes bytes beyond the field to be
modified.
[1] From Documentation/atomic_ops.txt:
Native atomic bit operations are defined to operate on objects
aligned to the size of an "unsigned long" C data type, and are
least of that size. The endianness of the bits within each
"unsigned long" are the native endianness of the cpu.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:11:41 +0000 (15:11 -0800)]
drivers/w1/masters/ds1wm.cuse dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roger Tseng [Tue, 12 Nov 2013 23:11:40 +0000 (15:11 -0800)]
drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
In h_msb_read_page() in ms_block.c, flow never reaches case
MSB_RP_RECIVE_STATUS_REG. This causes error when MEMSTICK_INT_ERR is
encountered and status error bits are going to be examined, but the status
will never be copied back.
Fix it by transitioning to MSB_RP_RECIVE_STATUS_REG right after
MSB_RP_SEND_READ_STATUS_REG.
Signed-off-by: Roger Tseng <rogerable@realtek.com> Acked-by: Maxim Levitsky <maximlevitsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
attrs field of attribute_group structure is a pointer to a pointer (as in
an array of pointers) rather than pointer to attribute struct (as in an
array of structures), so when allocating size of the pointer sholud be
used instead of the structure it is pointing to.
While at it, also change the call to use kcalloc rather than kzalloc.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Cc: Tejun Heo <tj@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alex Dubov <oakad@yahoo.com> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Tue, 12 Nov 2013 23:11:31 +0000 (15:11 -0800)]
gcov: reuse kbasename helper
To get name of the file from a pathname let's use kbasename() helper.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jingoo Han <jg1.han@samsung.com> Cc: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frantisek Hrbata [Tue, 12 Nov 2013 23:11:26 +0000 (15:11 -0800)]
gcov: add support for gcc 4.7 gcov format
The gcov in-memory format changed in gcc 4.7. The biggest change, which
requires this special implementation, is that gcov_info no longer contains
array of counters for each counter type for all functions and gcov_fn_info
is not used for mapping of function's counters to these arrays(offset).
Now each gcov_fn_info contans it's counters, which makes things a little
bit easier.
This is heavily based on the previous gcc_3_4.c implementation and patches
provided by Peter Oberparleiter. Specially the buffer gcda implementation
for iterator.
[akpm@linux-foundation.org: use kmemdup() and kcalloc()]
[oberpar@linux.vnet.ibm.com: gcc_4_7.c needs vmalloc.h] Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> Cc: Jan Stancek <jstancek@redhat.com> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Gospodarek <agospoda@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frantisek Hrbata [Tue, 12 Nov 2013 23:11:24 +0000 (15:11 -0800)]
gcov: move gcov structs definitions to a gcc version specific file
Since also the gcov structures(gcov_info, gcov_fn_info, gcov_ctr_info) can
change between gcc releases, as shown in gcc 4.7, they cannot be defined
in a common header and need to be moved to a specific gcc implemention
file. This also requires to make the gcov_info structure opaque for the
common code and to introduce simple helpers for accessing data inside
gcov_info.
Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com> Cc: Jan Stancek <jstancek@redhat.com> Cc: Kees Cook <keescook@chromium.org> Acked-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Andy Gospodarek <agospoda@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chen Gang [Tue, 12 Nov 2013 23:11:23 +0000 (15:11 -0800)]
kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
For registering in add_del_listener(), when kmalloc_node() fails, need
return -ENOMEM instead of success code, and cmd_attr_register_cpumask()
wants to know about it.
After modification, give a simple common test "build -> boot up ->
kernel/controllers/cgroup/getdelays by LTP tools".
Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chen Gang [Tue, 12 Nov 2013 23:11:22 +0000 (15:11 -0800)]
kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
snprintf() will return the 'ideal' length which may be larger than real
buffer length, if we only want to use real length, need use scnprintf()
instead of.
Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The iterator rbtree_postorder_for_each_entry_safe() relies on pointer
underflow behavior when testing for loop termination. In particular it
expects that
&rb_entry(NULL, type, field)->field
is NULL. But the result of this expression is not defined by a C standard
and some gcc versions (e.g. 4.3.4) assume the above expression can never
be equal to NULL. The net result is an oops because the iteration is not
properly terminated.
Fix the problem by modifying the iterator to avoid pointer underflows.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Michel Lespinasse <walken@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: <stable@vger.kernel.org> [3.12.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kees Cook [Tue, 12 Nov 2013 23:11:17 +0000 (15:11 -0800)]
exec/ptrace: fix get_dumpable() incorrect tests
The get_dumpable() return value is not boolean. Most users of the
function actually want to be testing for non-SUID_DUMP_USER(1) rather than
SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a
protected state. Almost all places did this correctly, excepting the two
places fixed in this patch.
Wrong logic:
if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
or
if (dumpable == 0) { /* be protective */ }
or
if (!dumpable) { /* be protective */ }
Correct logic:
if (dumpable != SUID_DUMP_USER) { /* be protective */ }
or
if (dumpable != 1) { /* be protective */ }
Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
user was able to ptrace attach to processes that had dropped privileges to
that user. (This may have been partially mitigated if Yama was enabled.)
The macros have been moved into the file that declares get/set_dumpable(),
which means things like the ia64 code can see them too.
Josh Triplett [Tue, 12 Nov 2013 23:11:14 +0000 (15:11 -0800)]
Documentation/ABI: document the non-ABI status of Kconfig and symbols
Discussion at Kernel Summit made it clear that the presence or absence of
specific Kconfig symbols are not considered ABI, and that no userspace (or
bootloader, etc) should rely on them.
In addition, kernel-internal symbols are well established as non-ABI, per
Documentation/stable_api_nonsense.txt.
Document both of these in Documentation/ABI/README, in a new section for
notable bits of non-ABI.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Rob Landley <rob@landley.net> Cc: Tao Ma <boyu.mt@taobao.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Richard Weinberger <richard.weinberger@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Berg [Tue, 12 Nov 2013 23:11:12 +0000 (15:11 -0800)]
kernel-doc: improve "no structured comments found" error
When using '!Ffile function' in a docbook template, and the function no
longer exists, you get a "no structured comments found" error from the
kernel-doc processing script. It's useful to know which functions it was
looking for, so print them out in this case. Also do the same for '!Pfile
doc-section'
The same error also happens when using '!Efile' when some exported
functions aren't documented (in the same file.) There's a very large
number of such functions though, so don't print the message in this case
-- right now it would give ~850 messages.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Rob Landley <rob@landley.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stefan Raspl [Tue, 12 Nov 2013 23:11:11 +0000 (15:11 -0800)]
Documentation/trace/tracepoints.txt: add links to TRACE_EVENT documentation
Existing tracepoint documentation doesn't mention the popular
TRACE_EVENT macro. Since an excellent series of articles on proper
usage already exists, respective links are added to the existing
documentation.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com> Cc: Rob Landley <rob@landley.net> Cc: Jiri Kosina <jkosina@suse.cz> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Zoltan Kiss <zoltan.kiss@citrix.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Documentation/filesystems/vfat.txt: fix directory entry example
Slots can store up to 13 characters for the file name but one of the
examples has one character too many.
Signed-off-by: Luis Ortega Perez de Villar <luiorpe1@upv.es> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement functionality of creation AttributesFile metadata file on HFS+
volume in the case of absence of it.
It makes trying to open AttributesFile's B-tree during mount of HFS+
volume. If HFS+ volume hasn't AttributesFile then a pointer on
AttributesFile's B-tree keeps as NULL. Thereby, when it is discovered
absence of AttributesFile on HFS+ volume in the begin of xattr creation
operation then AttributesFile will be created.
The creation of AttributesFile will have success in the case of
availability (2 * clump) free blocks on HFS+ volume. Otherwise,
creation operation is ended with error (-ENOSPC).
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are situation when HFS+ volume had been created without
AttributesFile. Such situation can take place because of using old
mkfs.hfs utility or creation HFS+ volume without taking in mind
necessity to use xattrs. For example, Mac OS X 10.4 (Tiger) doesn't
create AttributesFile during mkfs phase. Also it is a very frequent
situation for the case of users that created HFS+ volumes under Linux.
As a result, xattrs and POSIX ACLs on HFS+ volume are unavailable for
such users.
This patchset implements functionality of AttributesFile creation on
HFS+ volume in the case of this metadata file absence during operation
of xattr creation.
This patch:
Add functionality of metadata file's clump size calculation. Operation
of AttributesFile creation needs in clump size setting. This value will
be used when AttributesFile will be extended.
This code is adopted from code of newfs_hfs utility of diskdev_cmds packet
http://opensource.apple.com/tarballs/diskdev_cmds/.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sangbeom Kim [Tue, 12 Nov 2013 23:11:04 +0000 (15:11 -0800)]
rtc: s5m-rtc: add real-time clock driver for s5m8767
Add real-time clock driver for s5m8767.
Signed-off-by: Sangbeom Kim <sbkim73@samsung.com> Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Cc: Todd Broch <tbroch@chromium.org> Cc: Mark Brown <broonie@kernel.org> Acked-by: Lee Jones <lee.jones@linaro.org> [mfd parts] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sangjung Woo [Tue, 12 Nov 2013 23:11:00 +0000 (15:11 -0800)]
drivers/rtc/rtc-puv3.c: use dev_dbg() instead of pr_debug()
Because dev_*() are used along with pr_debug() function in this code, the
debug message is not tidy. This patch converts from pr_debug() to
dev_dbg() since dev_*() are encouraged to use in device driver code.
Jingoo Han [Tue, 12 Nov 2013 23:10:53 +0000 (15:10 -0800)]
drivers/rtc/rtc-v3020.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:52 +0000 (15:10 -0800)]
drivers/rtc/rtc-sh.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:51 +0000 (15:10 -0800)]
drivers/rtc/rtc-rs5c348.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:50 +0000 (15:10 -0800)]
drivers/rtc/rtc-pcf2123.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:44 +0000 (15:10 -0800)]
drivers/rtc/rtc-m48t86.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:44 +0000 (15:10 -0800)]
drivers/rtc/rtc-m48t59.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:43 +0000 (15:10 -0800)]
drivers/rtc/rtc-ep93xx.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:42 +0000 (15:10 -0800)]
drivers/rtc/rtc-ds2404.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:41 +0000 (15:10 -0800)]
drivers/rtc/rtc-ds1307.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:40 +0000 (15:10 -0800)]
drivers/rtc/rtc-ds1305.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:39 +0000 (15:10 -0800)]
drivers/rtc/rtc-da9055.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:38 +0000 (15:10 -0800)]
drivers/rtc/rtc-cmos.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:37 +0000 (15:10 -0800)]
drivers/rtc/rtc-88pm860x.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Acked-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jingoo Han [Tue, 12 Nov 2013 23:10:36 +0000 (15:10 -0800)]
drivers/rtc/rtc-88pm80x.c: use dev_get_platdata()
Use the wrapper function for retrieving the platform data instead of
accessing dev->platform_data directly. This is a cosmetic change to make
the code simpler and enhance the readability.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sachin Kamat [Tue, 12 Nov 2013 23:10:34 +0000 (15:10 -0800)]
drivers/rtc/rtc-vr41xx.c: fix checkpatch warnings
Fixes the following warnings:
WARNING: Use #include <linux/io.h> instead of <asm/io.h>
WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Cc: Yoichi Yuasa <yuasa@linux-mips.org> Cc: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
P J P [Tue, 12 Nov 2013 23:10:22 +0000 (15:10 -0800)]
initramfs: read CONFIG_RD_ variables for initramfs compression
When expert configuration option(CONFIG_EXPERT) is enabled, menuconfig
offers a choice of compression algorithm to compress initial ramfs image;
This choice is stored into CONFIG_RD_* variables. But usr/Makefile uses
earlier INITRAMFS_COMPRESSION_* macros to build initial ramfs file. Since
none of them is defined, resulting 'initramfs_data.cpio' file remains
un-compressed.
This patch updates the Makefile to use CONFIG_RD_* variables and adds
support for LZ4 compression algorithm. Also updates the
'gen_initramfs_list.sh' script to check whether a selected compression
command is accessible or not. And fall-back to default gzip(1)
compression when it is not.
Signed-off-by: P J P <prasad@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch proposes to make init failures more explicit.
Before this, the "No init found" message didn't help much. It could
sometimes be misleading and actually mean "No *working* init found".
This message could hide many different issues:
- no init program candidates found at all
- some init program candidates exist but can't be executed (missing
execute permissions, failed to load shared libraries, executable
compiled for an unknown architecture...)
This patch notifies the kernel user when a candidate init program is found
but can't be executed. In each failure situation, the error code is
displayed, to quickly find the root cause. "No init found" is also
replaced by "No working init found", which is more correct.
This will help embedded Linux developers (especially the newcomers),
regularly making and debugging new root filesystems.
Credits to Geert Uytterhoeven and Janne Karhunen for their improvement
suggestions.
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Janne Karhunen <Janne.Karhunen@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
P J P [Tue, 12 Nov 2013 23:10:20 +0000 (15:10 -0800)]
init/do_mounts_rd.c: fix NULL pointer dereference while loading initramfs
Make menuconfig allows one to choose compression format of an initial
ramdisk image. But this choice does not result in duly compressed initial
ramdisk image. Because - $ make install - does not pass on the selected
compression choice to the dracut(8) tool, which creates the initramfs
file. dracut(8) generates the image with the default compression, ie.
gzip(1).
If a user chose any other compression instead of gzip(1), it leads to a
crash due to NULL pointer dereference in crd_load(), caused by a NULL
function pointer returned by the 'decompress_method()' routine. Because
the initramfs image is gzip(1) compressed, whereas the kernel knows only
to decompress the chosen format and not gzip(1).
This patch replaces the crash by an explicit panic() call with an
appropriate error message. This shall prevent the kernel from
eventually panicking in: init/do_mounts.c: mount_block_root() with
-> panic("VFS: Unable to mount root fs on %s", b);
[akpm@linux-foundation.org: mention that the problem is with the ramdisk, don't print known-to-be-NULL value] Signed-off-by: P J P <prasad@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jason Baron [Tue, 12 Nov 2013 23:10:18 +0000 (15:10 -0800)]
epoll: do not take global 'epmutex' for simple topologies
When calling EPOLL_CTL_ADD for an epoll file descriptor that is attached
directly to a wakeup source, we do not need to take the global 'epmutex',
unless the epoll file descriptor is nested. The purpose of taking the
'epmutex' on add is to prevent complex topologies such as loops and deep
wakeup paths from forming in parallel through multiple EPOLL_CTL_ADD
operations. However, for the simple case of an epoll file descriptor
attached directly to a wakeup source (with no nesting), we do not need to
hold the 'epmutex'.
This patch along with 'epoll: optimize EPOLL_CTL_DEL using rcu' improves
scalability on larger systems. Quoting Nathan Zimmer's mail on SPECjbb
performance:
"On the 16 socket run the performance went from 35k jOPS to 125k jOPS. In
addition the benchmark when from scaling well on 10 sockets to scaling
well on just over 40 sockets.
...
Currently the benchmark stops scaling at around 40-44 sockets but it seems like
I found a second unrelated bottleneck."
[akpm@linux-foundation.org: use `bool' for boolean variables, remove unneeded/undesirable cast of void*, add missed ep_scan_ready_list() kerneldoc] Signed-off-by: Jason Baron <jbaron@akamai.com> Tested-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jason Baron [Tue, 12 Nov 2013 23:10:16 +0000 (15:10 -0800)]
epoll: optimize EPOLL_CTL_DEL using rcu
Nathan Zimmer found that once we get over 10+ cpus, the scalability of
SPECjbb falls over due to the contention on the global 'epmutex', which is
taken in on EPOLL_CTL_ADD and EPOLL_CTL_DEL operations.
Patch #1 removes the 'epmutex' lock completely from the EPOLL_CTL_DEL path
by using rcu to guard against any concurrent traversals.
Patch #2 remove the 'epmutex' lock from EPOLL_CTL_ADD operations for
simple topologies. IE when adding a link from an epoll file descriptor to
a wakeup source, where the epoll file descriptor is not nested.
This patch (of 2):
Optimize EPOLL_CTL_DEL such that it does not require the 'epmutex' by
converting the file->f_ep_links list into an rcu one. In this way, we can
traverse the epoll network on the add path in parallel with deletes.
Since deletes can't create loops or worse wakeup paths, this is safe.
This patch in combination with the patch "epoll: Do not take global 'epmutex'
for simple topologies", shows a dramatic performance improvement in
scalability for SPECjbb.
Signed-off-by: Jason Baron <jbaron@akamai.com> Tested-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> CC: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 12 Nov 2013 23:10:14 +0000 (15:10 -0800)]
checkpatch: don't require kernel style __attribute__ shortcuts in uapi paths
Avoid prescribing kernel styled shortcuts for gcc extensions of
__attribute__((foo)) in the uapi include paths.
Fix $realfile filename when using -f/--file to not remove first level
directory as if the filename was used in a -P1 patch. Only strip the
first level directory (typically a or b) for P1 patches.
Signed-off-by: Joe Perches <joe@perches.com> Cc: "Dixit, Ashutosh" <ashutosh.dixit@intel.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 12 Nov 2013 23:10:13 +0000 (15:10 -0800)]
checkpatch: improve "return is not a function" test
Find a few more cases where parentheses are used around the value of a
return statement.
This now uses the "$balanced_parens" test and also makes the test depend
on perl v5.10 and higher.
This now finds return with parenthesis uses the old code did not find
like:
ERROR: return is not a function, parentheses are not required
#211: FILE: arch/m68k/include/asm/sun3xflop.h:211:
+ return ((error == 0) ? 0 : -1);
Signed-off-by: Joe Perches <joe@perches.com> Tested-by: David Cohen <david.a.cohen@linux.intel.com> Acked-by: David Cohen <david.a.cohen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Triplett [Tue, 12 Nov 2013 23:10:12 +0000 (15:10 -0800)]
checkpatch.pl: check for the FSF mailing address
Kernel maintainers reject new instances of the GPL boilerplate paragraph
directing people to write to the FSF for a copy of the GPL, since the FSF
has moved in the past and may do so again.
Make this an error for new code, but just a --strict CHK in --file mode;
anyone interested in doing tree-wide cleanups of this form can enable this
test explicitly.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 12 Nov 2013 23:10:11 +0000 (15:10 -0800)]
checkpatch: make the memory barrier test noisier
Peter Zijlstra prefers that comments be required near uses of memory
barriers.
Change the message level for memory barrier uses from a --strict test only
to a normal WARN so it's always emitted.
This might produce false positives around insertions of memory barriers
when a comment is outside the patch context block.
And checkpatch is still stupid, it only looks for existence of any
comment, not at the comment content.
Signed-off-by: Joe Perches <joe@perches.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Oliver Neukum <oliver@neukum.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>