kbuild: LLVMLinux: Add Kbuild support for building kernel with Clang
Add support to toplevel Makefile for compiling with clang, both for
HOSTCC and CC. Use cc-option to prevent gcc option from breaking clang, and
from clang options from breaking gcc.
Clang 3.4 semantics are the same as gcc semantics for unsupported flags. For
unsupported warnings clang 3.4 returns true but shows a warning and gcc shows
a warning and returns false.
Signed-off-by: Behan Webster <behanw@converseincode.com> Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de> Signed-off-by: Mark Charlebois <charlebm@gmail.com> Cc: PaX Team <pageexec@freemail.hu>
Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"Here is the pull request from the i2c subsystem. It got a little
delayed because I needed to wait for a dependency to be included
(commit b424080a9e08: "reset: Add optional resets and stubs"). Plus,
I had some email problems. All done now, the highlights are:
- drivers can now deprecate their use of i2c classes. That shouldn't
be used on embedded platforms anyhow and was often blindly
copy&pasted. This mechanism gives users time to switch away and
ultimately boot faster once the use of classes for those drivers is
gone for good.
- new drivers for QUP, Cadence, efm32
- tracepoint support for I2C and SMBus
- bigger cleanups for the mv64xxx, nomadik, and designware drivers
And the usual bugfixes, cleanups, feature additions. Most stuff has
been in linux-next for a while. Just some hot fixes and new drivers
were added a bit more recently."
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (63 commits)
i2c: cadence: fix Kconfig dependency
i2c: Add driver for Cadence I2C controller
i2c: cadence: Document device tree bindings
Documentation: i2c: improve section about flags mangling the protocol
i2c: qup: use proper type fro clk_freq
i2c: qup: off by ones in qup_i2c_probe()
i2c: efm32: fix binding doc
MAINTAINERS: update I2C web resources
i2c: qup: New bus driver for the Qualcomm QUP I2C controller
i2c: qup: Add device tree bindings information
i2c: i2c-xiic: deprecate class based instantiation
i2c: i2c-sirf: deprecate class based instantiation
i2c: i2c-mv64xxx: deprecate class based instantiation
i2c: i2c-designware-platdrv: deprecate class based instantiation
i2c: i2c-davinci: deprecate class based instantiation
i2c: i2c-bcm2835: deprecate class based instantiation
i2c: mv64xxx: Fix reset controller handling
i2c: omap: fix usage of IS_ERR_VALUE with pm_runtime_get_sync
i2c: efm32: new bus driver
i2c: exynos5: remove unnecessary cast of void pointer
...
Merge tag 'mmc-updates-for-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
Pull MMC updates from Chris Ball:
"MMC highlights for 3.15:
Core:
- CONFIG_MMC_UNSAFE_RESUME=y is now default behavior
- DT bindings for SDHCI UHS, eMMC HS200, high-speed DDR, at 1.8/1.2V
- Add GPIO descriptor based slot-gpio card detect API
Drivers:
- dw_mmc: Refactor SOCFPGA support as a variant inside dw_mmc-pltfm.c
- mmci: Support HW busy detection on ux500
- omap: Support MMC_ERASE
- omap_hsmmc: Support MMC_PM_KEEP_POWER, MMC_PM_WAKE_SDIO_IRQ, (a)cmd23
- rtsx: Support pre-req/post-req async
- sdhci: Add support for Realtek RTS5250 controllers
- sdhci-acpi: Add support for 80860F16, fix 80860F14/SDIO card detect
- sdhci-msm: Add new driver for Qualcomm SDHCI chipset support
- sdhci-pxav3: Add support for Marvell Armada 380 and 385 SoCs"
* tag 'mmc-updates-for-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (102 commits)
mmc: sdhci-acpi: Intel SDIO has broken card detect
mmc: sdhci-pxav3: add support for the Armada 38x SDHCI controller
mmc: sdhci-msm: Add platform_execute_tuning implementation
mmc: sdhci-msm: Initial support for Qualcomm chipsets
mmc: sdhci-msm: Qualcomm SDHCI binding documentation
sdhci: only reprogram retuning timer when flag is set
mmc: rename ARCH_BCM to ARCH_BCM_MOBILE
mmc: sdhci: Allow for irq being shared
mmc: sdhci-acpi: Add device id 80860F16
mmc: sdhci-acpi: Fix broken card detect for ACPI HID 80860F14
mmc: slot-gpio: Add GPIO descriptor based CD GPIO API
mmc: slot-gpio: Split out CD IRQ request into a separate function
mmc: slot-gpio: Record GPIO descriptors instead of GPIO numbers
Revert "dts: socfpga: Add support for SD/MMC on the SOCFPGA platform"
mmc: sdhci-spear: use generic card detection gpio support
mmc: sdhci-spear: remove support for power gpio
mmc: sdhci-spear: simplify resource handling
mmc: sdhci-spear: fix platform_data usage
mmc: sdhci-spear: fix error handling paths for DT
mmc: sdhci-bcm-kona: fix build errors when built-in
...
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull more powerpc updates from Ben Herrenschmidt:
"Here are a few more powerpc things for you.
So you'll find here the conversion of the two new firmware sysfs
interfaces to the new API for self-removing files that Greg and Tejun
introduced, so they can finally remove the old one.
I'm also reverting the hwmon driver for powernv. I shouldn't have
merged it, I got a bit carried away here. I hadn't realized it was
never CCed to the relevant maintainer(s) and list(s), and happens to
have some issues so I'm taking it out and it will come back via the
proper channels.
The rest is a bunch of LE fixes (argh, some of the new stuff was
broken on LE, I really need to start testing LE myself !) and various
random fixes here and there.
Finally one bit that's not strictly a fix, which is the HVC OPAL
change to "kick" the HVC thread when the firmware tells us there is
new incoming data. I don't feel like waiting for this one, it's
simple enough, and it makes a big difference in console responsiveness
which is good for my nerves"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (26 commits)
powerpc/powernv Adapt opal-elog and opal-dump to new sysfs_remove_file_self
Revert "powerpc/powernv: hwmon driver for power values, fan rpm and temperature"
power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update
powerpc/le: Avoid creatng R_PPC64_TOCSAVE relocations for modules.
arch/powerpc: Use RCU_INIT_POINTER(x, NULL) in platforms/cell/spu_syscalls.c
powerpc/opal: Add missing include
powerpc: Convert last uses of __FUNCTION__ to __func__
powerpc: Add lq/stq emulation
powerpc/powernv: Add invalid OPAL call
powerpc/powernv: Add OPAL message log interface
powerpc/book3s: Fix mc_recoverable_range buffer overrun issue.
powerpc: Remove dead code in sycall entry
powerpc: Use of_node_init() for the fakenode in msi_bitmap.c
powerpc/mm: NUMA pte should be handled via slow path in get_user_pages_fast()
powerpc/powernv: Fix endian issues with sensor code
powerpc/powernv: Fix endian issues with OPAL async code
tty/hvc_opal: Kick the HVC thread on OPAL console events
powerpc/powernv: Add opal_notifier_unregister() and export to modules
powerpc/ppc64: Do not turn AIL (reloc-on interrupts) too early
powerpc/ppc64: Gracefully handle early interrupts
...
Jan Stancek reported:
"pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
occasionally fails, because some threads fail to wake up.
Testcase creates 5 threads, which are all waiting on same condition.
Main thread then calls pthread_cond_broadcast() without holding mutex,
which calls:
This immediately wakes up single thread A, which unlocks mutex and
tries to wake up another thread:
futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)
If thread A manages to call futex_wake() before any waiters are
requeued for uaddr2, no other thread is woken up"
The ordering constraints for the hash bucket waiter counting are that
the waiter counts have to be incremented _before_ getting the spinlock
(because the spinlock acts as part of the memory barrier), but the
"requeue" operation didn't honor those rules, and nobody had even
thought about that case.
This fairly simple patch just increments the waiter count for the target
hash bucket (hb2) when requeing a futex before taking the locks. It
then decrements them again after releasing the lock - the code that
actually moves the futex(es) between hash buckets will do the additional
required waiter count housekeeping.
Reported-and-tested-by: Jan Stancek <jstancek@redhat.com> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # 3.14 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This driver wasn't merged via the proper maintainers (my fault ... ooops !)
and has serious issues so let's take it out for now and have a new better
one be merged the right way
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
This was caused by that 'sd->groups == NULL' after building groups, which
was caused by the empty 'sd->span'.
The cpu's domain contained nothing because the cpu was assigned to a wrong
node, due to the following unfortunate sequence of events:
1. The hypervisor sent a topology update to the guest OS, to notify changes
to the cpu-node mapping. However, the update was actually redundant - i.e.,
the "new" mapping was exactly the same as the old one.
2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting
the 'for-loop' in arch_update_cpu_topology().
3. So we ended up calling stop-machine() with an empty cpumask list, which made
stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as
the cpu to run the payload (the update_cpu_topology() function).
4. This causes update_cpu_topology() to be run by CPU0. And since 'updates'
is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology()
finds update->cpu as well as update->new_nid to be 0. In other words, we
end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly.
Along with the following wrong updating, it causes the sched-domain rebuild
code to break and crash the system.
Fix this by skipping the topology update in cases where we find that
the topology has not actually changed in reality (ie., spurious updates).
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com> CC: Stephen Rothwell <sfr@canb.auug.org.au> CC: Andrew Morton <akpm@linux-foundation.org> CC: Robert Jennings <rcj@linux.vnet.ibm.com> CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com> CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> CC: Alistair Popple <alistair@popple.id.au> Suggested-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tony Breeds [Wed, 12 Mar 2014 04:12:01 +0000 (15:12 +1100)]
powerpc/le: Avoid creatng R_PPC64_TOCSAVE relocations for modules.
When building modules with a native le toolchain the linker will
generate R_PPC64_TOCSAVE relocations when it's safe to omit saving r2 on
a plt call. This isn't helpful in the conext of a kernel module and the
kernel will fail to load those modules with an error like:
nf_conntrack: Unknown ADD relocation: 109
This patch tells the linker to avoid createing R_PPC64_TOCSAVE
relocations allowing modules to load.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Monam Agarwal [Sat, 22 Mar 2014 06:50:56 +0000 (12:20 +0530)]
arch/powerpc: Use RCU_INIT_POINTER(x, NULL) in platforms/cell/spu_syscalls.c
Here rcu_assign_pointer() is ensuring that the
initialization of a structure is carried out before storing a pointer
to that structure.
So, rcu_assign_pointer(p, NULL) can always safely be converted to
RCU_INIT_POINTER(p, NULL).
Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Neuling [Tue, 25 Mar 2014 00:43:08 +0000 (11:43 +1100)]
powerpc/opal: Add missing include
next-20140324 currently fails compiling celleb_defconfig with:
arch/powerpc/include/asm/opal.h:894:42: error: 'struct notifier_block' declared inside parameter list [-Werror]
arch/powerpc/include/asm/opal.h:894:42: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
arch/powerpc/include/asm/opal.h:896:14: error: 'struct notifier_block' declared inside parameter list [-Werror]
This is due to a missing include which is added here.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Joel Stanley [Tue, 1 Apr 2014 03:58:20 +0000 (14:28 +1030)]
powerpc/powernv: Add invalid OPAL call
This call will not be understood by OPAL, and cause it to add an error
to it's log. Among other things, this is useful for testing the
behaviour of the log as it fills up.
Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we wrongly allocate mc_recoverable_range buffer (to hold
recoverable ranges) based on size of the property "mcheck-recoverable-ranges".
This results in allocating less memory to hold available recoverable range
entries from /proc/device-tree/ibm,opal/mcheck-recoverable-ranges.
This patch fixes this issue by allocating mc_recoverable_range buffer based
on number of entries of recoverable ranges instead of device property size.
Without this change we end up allocating less memory and run into memory
corruption issue.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge branch 'for-3.15' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Highlights:
- server-side nfs/rdma fixes from Jeff Layton and Tom Tucker
- xdr fixes (a larger xdr rewrite has been posted but I decided it
would be better to queue it up for 3.16).
- miscellaneous fixes and cleanup from all over (thanks especially to
Kinglong Mee)"
* 'for-3.15' of git://linux-nfs.org/~bfields/linux: (36 commits)
nfsd4: don't create unnecessary mask acl
nfsd: revert v2 half of "nfsd: don't return high mode bits"
nfsd4: fix memory leak in nfsd4_encode_fattr()
nfsd: check passed socket's net matches NFSd superblock's one
SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed
NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp
SUNRPC: New helper for creating client with rpc_xprt
NFSD: Free backchannel xprt in bc_destroy
NFSD: Clear wcc data between compound ops
nfsd: Don't return NFS4ERR_STALE_STATEID for NFSv4.1+
nfsd4: fix nfs4err_resource in 4.1 case
nfsd4: fix setclientid encode size
nfsd4: remove redundant check from nfsd4_check_resp_size
nfsd4: use more generous NFS4_ACL_MAX
nfsd4: minor nfsd4_replay_cache_entry cleanup
nfsd4: nfsd4_replay_cache_entry should be static
nfsd4: update comments with obsolete function name
rpc: Allow xdr_buf_subsegment to operate in-place
NFSD: Using free_conn free connection
SUNRPC: fix memory leak of peer addresses in XPRT
...
Merge a few more patches from Andrew Morton:
"A few leftovers"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
fs/ncpfs/dir.c: fix indenting in ncp_lookup()
ncpfs/inode.c: fix mismatch printk formats and arguments
ncpfs: remove now unused PRINTK macro
ncpfs: convert PPRINTK to ncp_vdbg
ncpfs: convert DPRINTK/DDPRINTK to ncp_dbg
ncpfs: Add pr_fmt and convert printks to pr_<level>
arch/x86/mm/kmemcheck/kmemcheck.c: use kstrtoint() instead of sscanf()
lib/percpu_counter.c: fix bad percpu counter state during suspend
autofs4: check dev ioctl size before allocating
mm: vmscan: do not swap anon pages just because free+file is low
Dan Carpenter [Tue, 8 Apr 2014 23:04:19 +0000 (16:04 -0700)]
fs/ncpfs/dir.c: fix indenting in ncp_lookup()
My static checker suggests adding curly braces here. Probably that was
the intent, but actually the code works the same either way. I've just
changed the indenting and left the code as-is.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Petr Vandrovec <petr@vandrovec.name> Acked-by: Dave Chiluk <chiluk@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lib/percpu_counter.c: fix bad percpu counter state during suspend
I got a bug report yesterday from Laszlo Ersek in which he states that
his kvm instance fails to suspend. Laszlo bisected it down to this
commit 1cf7e9c68fe8 ("virtio_blk: blk-mq support") where virtio-blk is
converted to use the blk-mq infrastructure.
After digging a bit, it became clear that the issue was with the queue
drain. blk-mq tracks queue usage in a percpu counter, which is
incremented on request alloc and decremented when the request is freed.
The initial hunt was for an inconsistency in blk-mq, but everything
seemed fine. In fact, the counter only returned crazy values when
suspend was in progress.
When a CPU is unplugged, the percpu counters merges that CPU state with
the general state. blk-mq takes care to register a hotcpu notifier with
the appropriate priority, so we know it runs after the percpu counter
notifier. However, the percpu counter notifier only merges the state
when the CPU is fully gone. This leaves a state transition where the
CPU going away is no longer in the online mask, yet it still holds
private values. This means that in this state, percpu_counter_sum()
returns invalid results, and the suspend then hangs waiting for
abs(dead-cpu-value) requests to complete which of course will never
happen.
Fix this by clearing the state earlier, so we never have a case where
the CPU isn't in online mask but still holds private state. This bug
has been there since forever, I guess we don't have a lot of users where
percpu counters needs to be reliable during the suspend cycle.
Johannes Weiner [Tue, 8 Apr 2014 23:04:10 +0000 (16:04 -0700)]
mm: vmscan: do not swap anon pages just because free+file is low
Page reclaim force-scans / swaps anonymous pages when file cache drops
below the high watermark of a zone in order to prevent what little cache
remains from thrashing.
However, on bigger machines the high watermark value can be quite large
and when the workload is dominated by a static anonymous/shmem set, the
file set might just be a small window of used-once cache. In such
situations, the VM starts swapping heavily when instead it should be
recycling the no longer used cache.
This is a longer-standing problem, but it's more likely to trigger after
commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy")
because file pages can no longer accumulate in a single zone and are
dispersed into smaller fractions among the available zones.
To resolve this, do not force scan anon when file pages are low but
instead rely on the scan/rotation ratios to make the right prediction.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Rafael Aquini <aquini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: <stable@kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1) If a VXLAN interface is created with no groups, we can crash on
reception of packets. Fix from Mike Rapoport.
2) Missing includes in CPTS driver, from Alexei Starovoitov.
3) Fix string validations in isdnloop driver, from YOSHIFUJI Hideaki
and Dan Carpenter.
4) Missing irq.h include in bnxw2x, enic, and qlcnic drivers. From
Josh Boyer.
5) AF_PACKET transmit doesn't statistically count TX drops, from Daniel
Borkmann.
6) Byte-Queue-Limit enabled drivers aren't handled properly in
AF_PACKET transmit path, also from Daniel Borkmann.
Same problem exists in pktgen, and Daniel fixed it there too.
7) Fix resource leaks in driver probe error paths of new sxgbe driver,
from Francois Romieu.
8) Truesize of SKBs can gradually get more and more corrupted in NAPI
packet recycling path, fix from Eric Dumazet.
9) Fix uniprocessor netfilter build, from Florian Westphal. In the
longer term we should perhaps try to find a way for ARRAY_SIZE() to
work even with zero sized array elements.
10) Fix crash in netfilter conntrack extensions due to mis-estimation of
required extension space. From Andrey Vagin.
11) Since we commit table rule updates before trying to copy the
counters back to userspace (it's the last action we perform), we
really can't signal the user copy with an error as we are beyond the
point from which we can unwind everything. This causes all kinds of
use after free crashes and other mysterious behavior.
From Thomas Graf.
12) Restore previous behvaior of div/mod by zero in BPF filter
processing. From Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
net: sctp: wake up all assocs if sndbuf policy is per socket
isdnloop: several buffer overflows
netdev: remove potentially harmful checks
pktgen: fix xmit test for BQL enabled devices
net/at91_ether: avoid NULL pointer dereference
tipc: Let tipc_release() return 0
at86rf230: fix MAX_CSMA_RETRIES parameter
mac802154: fix duplicate #include headers
sxgbe: fix duplicate #include headers
net: filter: be more defensive on div/mod by X==0
netfilter: Can't fail and free after table replacement
xen-netback: Trivial format string fix
net: bcmgenet: Remove unnecessary version.h inclusion
net: smc911x: Remove unused local variable
bonding: Inactive slaves should keep inactive flag's value
netfilter: nf_tables: fix wrong format in request_module()
netfilter: nf_tables: set names cannot be larger than 15 bytes
netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
netfilter: Add {ipt,ip6t}_osf aliases for xt_osf
netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks
...
Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull more staging patches from Greg KH:
"Here are some more staging patches for 3.15-rc1.
They include a late-submission of a wireless driver that a bunch of
people seem to have the hardware for now. As it's stand-alone, it
should be fine (now passes the 0-day random build bot tests).
There are also some fixes for the unisys drivers, as they were causing
havoc on a number of different machines. To resolve all of those
issues, we just mark the driver as BROKEN now, and we can fix it up
"properly" over time"
* tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8723au: The 8723 only has two paths
Staging: unisys: mark drivers as BROKEN
Staging: unisys: verify that a control channel exists
staging: unisys: Add missing close parentheses in filexfer.c
staging: r8723au: Fix build problem when RFKILL is not selected
staging: r8723au: Fix randconfig build errors
staging: r8723au: Turn on build of new driver
staging: r8723au: Additional source patches
staging: r8723au: Add source files for new driver - part 4
staging: r8723au: Add source files for new driver - part 3
staging: r8723au: Add source files for new driver - part 2
staging: r8723au: Add source files for new driver - part 1
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull second set of arm64 updates from Catalin Marinas:
"A second pull request for this merging window, mainly with fixes and
docs clarification:
- Documentation clarification on CPU topology and booting
requirements
- Additional cache flushing during boot (needed in the presence of
external caches or under virtualisation)
- DMA range invalidation fix for non cache line aligned buffers
- Build failure fix with !COMPAT
- Kconfig update for STRICT_DEVMEM"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix DMA range invalidation for cache line unaligned buffers
arm64: Add missing Kconfig for CONFIG_STRICT_DEVMEM
arm64: fix !CONFIG_COMPAT build failures
Revert "arm64: virt: ensure visibility of __boot_cpu_mode"
arm64: Relax the kernel cache requirements for boot
arm64: Update the TCR_EL1 translation granule definitions for 16K pages
ARM: topology: Make it clear that all CPUs need to be described
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull second set of s390 patches from Martin Schwidefsky:
"The second part of Heikos uaccess rework, the page table walker for
uaccess is now a thing of the past (yay!)
The code change to fix the theoretical TLB flush problem allows us to
add a TLB flush optimization for zEC12, this machine has new
instructions that allow to do CPU local TLB flushes for single pages
and for all pages of a specific address space.
Plus the usual bug fixing and some more cleanup"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/uaccess: rework uaccess code - fix locking issues
s390/mm,tlb: optimize TLB flushing for zEC12
s390/mm,tlb: safeguard against speculative TLB creation
s390/irq: Use defines for external interruption codes
s390/irq: Add defines for external interruption codes
s390/sclp: add timeout for queued requests
kvm/s390: also set guest pages back to stable on kexec/kdump
lcs: Add missing destroy_timer_on_stack()
s390/tape: Add missing destroy_timer_on_stack()
s390/tape: Use del_timer_sync()
s390/3270: fix crash with multiple reset device requests
s390/bitops,atomic: add missing memory barriers
s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6
Daniel Borkmann [Tue, 8 Apr 2014 15:26:13 +0000 (17:26 +0200)]
net: sctp: wake up all assocs if sndbuf policy is per socket
SCTP charges chunks for wmem accounting via skb->truesize in
sctp_set_owner_w(), and sctp_wfree() respectively as the
reverse operation. If a sender runs out of wmem, it needs to
wait via sctp_wait_for_sndbuf(), and gets woken up by a call
to __sctp_write_space() mostly via sctp_wfree().
__sctp_write_space() is being called per association. Although
we assign sk->sk_write_space() to sctp_write_space(), which
is then being done per socket, it is only used if send space
is increased per socket option (SO_SNDBUF), as SOCK_USE_WRITE_QUEUE
is set and therefore not invoked in sock_wfree().
Commit 4c3a5bdae293 ("sctp: Don't charge for data in sndbuf
again when transmitting packet") fixed an issue where in case
sctp_packet_transmit() manages to queue up more than sndbuf
bytes, sctp_wait_for_sndbuf() will never be woken up again
unless it is interrupted by a signal. However, a still
remaining issue is that if net.sctp.sndbuf_policy=0, that is
accounting per socket, and one-to-many sockets are in use,
the reclaimed write space from sctp_wfree() is 'unfairly'
handed back on the server to the association that is the lucky
one to be woken up again via __sctp_write_space(), while
the remaining associations are never be woken up again
(unless by a signal).
The effect disappears with net.sctp.sndbuf_policy=1, that
is wmem accounting per association, as it guarantees a fair
share of wmem among associations.
Therefore, if we have reclaimed memory in case of per socket
accounting, wake all related associations to a socket in a
fair manner, that is, traverse the socket association list
starting from the current neighbour of the association and
issue a __sctp_write_space() to everyone until we end up
waking ourselves. This guarantees that no association is
preferred over another and even if more associations are
taken into the one-to-many session, all receivers will get
messages from the server and are not stalled forever on
high load. This setting still leaves the advantage of per
socket accounting in touch as an association can still use
up global limits if unused by others.
Fixes: 4eb701dfc618 ("[SCTP] Fix SCTP sendbuffer accouting.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"Highlights:
- drm:
Generic display port aux features, primary plane support, drm
master management fixes, logging cleanups, enforced locking checks
(instead of docs), documentation improvements, minor number
handling cleanup, pseudofs for shared inodes.
- ttm:
add ability to allocate from both ends
- i915:
broadwell features, power domain and runtime pm, per-process
address space infrastructure (not enabled)
- msm:
power management, hdmi audio support
- nouveau:
ongoing GPU fault recovery, initial maxwell support, random fixes
- exynos:
refactored driver to clean up a lot of abstraction, DP support
moved into drm, LVDS bridge support added, parallel panel support
- gma500:
SGX MMU support, SGX irq handling, asle irq work fixes
- radeon:
video engine bringup, ring handling fixes, use dp aux helpers
Dan Carpenter [Tue, 8 Apr 2014 09:23:09 +0000 (12:23 +0300)]
isdnloop: several buffer overflows
There are three buffer overflows addressed in this patch.
1) In isdnloop_fake_err() we add an 'E' to a 60 character string and
then copy it into a 60 character buffer. I have made the destination
buffer 64 characters and I'm changed the sprintf() to a snprintf().
2) In isdnloop_parse_cmd(), p points to a 6 characters into a 60
character buffer so we have 54 characters. The ->eazlist[] is 11
characters long. I have modified the code to return if the source
buffer is too long.
3) In isdnloop_command() the cbuf[] array was 60 characters long but the
max length of the string then can be up to 79 characters. I made the
cbuf array 80 characters long and changed the sprintf() to snprintf().
I also removed the temporary "dial" buffer and changed it to use "p"
directly.
Unfortunately, we pass the "cbuf" string from isdnloop_command() to
isdnloop_writecmd() which truncates anything over 60 characters to make
it fit in card->omsg[]. (It can accept values up to 255 characters so
long as there is a '\n' character every 60 characters). For now I have
just fixed the memory corruption bug and left the other problems in this
driver alone.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
arm64: Fix DMA range invalidation for cache line unaligned buffers
If the buffer needing cache invalidation for inbound DMA does start or
end on a cache line aligned address, we need to use the non-destructive
clean&invalidate operation. This issue was introduced by commit 7363590d2c46 (arm64: Implement coherent DMA API based on swiotlb).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Jon Medhurst (Tixy) <tixy@linaro.org>
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3 improvements, cleanups, reiserfs fix from Jan Kara:
"various cleanups for ext2, ext3, udf, isofs, a documentation update
for quota, and a fix of a race in reiserfs readdir implementation"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: fix race in readdir
ext2: acl: remove unneeded include of linux/capability.h
ext3: explicitly remove inode from orphan list after failed direct io
fs/isofs/inode.c add __init to init_inodecache()
ext3: Speedup WB_SYNC_ALL pass
fs/quota/Kconfig: Update filesystems
ext3: Update outdated comment before ext3_ordered_writepage()
ext3: Update PF_MEMALLOC handling in ext3_write_inode()
ext2/3: use prandom_u32() instead of get_random_bytes()
ext3: remove an unneeded check in ext3_new_blocks()
ext3: remove unneeded check in ext3_ordered_writepage()
fs: Mark function as static in ext3/xattr_security.c
fs: Mark function as static in ext3/dir.c
fs: Mark function as static in ext2/xattr_security.c
ext3: Add __init macro to init_inodecache
ext2: Add __init macro to init_inodecache
udf: Add __init macro to init_inodecache
fs: udf: parse_options: blocksize check
Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild changes from Michal Marek:
- cleanups in the main Makefiles and Documentation/DocBook/Makefile
- make O=... directory is automatically created if needed
- mrproper/distclean removes the old include/linux/version.h to make
life easier when bisecting across the commit that moved the version.h
file
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: docbook: fix the include error when executing "make help"
kbuild: create a build directory automatically for out-of-tree build
kbuild: remove redundant '.*.cmd' pattern from make distclean
kbuild: move "quote" to Kbuild.include to be consistent
kbuild: docbook: use $(obj) and $(src) rather than specific path
kbuild: unconditionally clobber include/linux/version.h on distclean
kbuild: docbook: specify KERNELDOC dependency correctly
kbuild: docbook: include cmd files more simply
kbuild: specify build_docproc as a phony target
Merge tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC changes from Vineet Gupta:
- Support for external initrd from Noam
- Fix broken serial console in nsimosci Virtual Platform
- Reuse of ENTRY/END assembler macros across hand asm code
- Other minor fixes here and there
* tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: [nsimosci] Unbork console
ARC: [nsimosci] Change .dts to use generic 8250 UART
ARC: [SMP] General Fixes
ARC: Remove unused DT template file
ARC: [clockevent] simplify timer ISR
ARC: [clockevent] can't be SoC specific
ARC: Remove ARC_HAS_COH_RTSC
ARC: switch to generic ENTRY/END assembler annotations
ARC: support external initrd
ARC: add uImage to .gitignore
ARC: [arcfpga] Fix __initconst data const-correctness
Russell King [Mon, 7 Apr 2014 11:00:17 +0000 (12:00 +0100)]
DRM: armada: fix corruption while loading cursors
Loading cursors to the LCD controller's SRAM can be corrupted when the
configured pixel clock is relatively slow. This seems to be caused
when we write back-to-back to the SRAM registers.
There doesn't appear to be any status register we can read to check
when an access has completed.
Inserting a dummy read between the writes appears to fix the problem.
Cc: <stable@vger.kernel.org> # 3.13 Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Dave Airlie <airlied@redhat.com>
Merge second patch-bomb from Andrew Morton:
- the rest of MM
- zram updates
- zswap updates
- exit
- procfs
- exec
- wait
- crash dump
- lib/idr
- rapidio
- adfs, affs, bfs, ufs
- cris
- Kconfig things
- initramfs
- small amount of IPC material
- percpu enhancements
- early ioremap support
- various other misc things
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
MAINTAINERS: update Intel C600 SAS driver maintainers
fs/ufs: remove unused ufs_super_block_third pointer
fs/ufs: remove unused ufs_super_block_second pointer
fs/ufs: remove unused ufs_super_block_first pointer
fs/ufs/super.c: add __init to init_inodecache()
doc/kernel-parameters.txt: add early_ioremap_debug
arm64: add early_ioremap support
arm64: initialize pgprot info earlier in boot
x86: use generic early_ioremap
mm: create generic early_ioremap() support
x86/mm: sparse warning fix for early_memremap
lglock: map to spinlock when !CONFIG_SMP
percpu: add preemption checks to __this_cpu ops
vmstat: use raw_cpu_ops to avoid false positives on preemption checks
slub: use raw_cpu_inc for incrementing statistics
net: replace __this_cpu_inc in route.c with raw_cpu_inc
modules: use raw_cpu_write for initialization of per cpu refcount.
mm: use raw_cpu ops for determining current NUMA node
percpu: add raw_cpu_ops
slub: fix leak of 'name' in sysfs_slab_add
...
Pointer 'usb3' to struct ufs_super_block_third acquired via
ubh_get_usb_third() is never used in function
ufs_read_cylinder_structures(). Thus remove it.
Detected by Coverity: CID 139939.
Signed-off-by: Christian Engelmayer <cengelma@gmx.at> Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Salter [Mon, 7 Apr 2014 22:39:52 +0000 (15:39 -0700)]
arm64: add early_ioremap support
Add support for early IO or memory mappings which are needed before the
normal ioremap() is usable. This also adds fixmap support for permanent
fixed mappings such as that used by the earlyprintk device register
region.
Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Salter [Mon, 7 Apr 2014 22:39:51 +0000 (15:39 -0700)]
arm64: initialize pgprot info earlier in boot
Presently, paging_init() calls init_mem_pgprot() to initialize pgprot
values used by macros such as PAGE_KERNEL, PAGE_KERNEL_EXEC, etc.
The new fixmap and early_ioremap support also needs to use these macros
before paging_init() is called. This patch moves the init_mem_pgprot()
call out of paging_init() and into setup_arch() so that pgprot_default
gets initialized in time for fixmap and early_ioremap.
Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Salter [Mon, 7 Apr 2014 22:39:48 +0000 (15:39 -0700)]
mm: create generic early_ioremap() support
This patch creates a generic implementation of early_ioremap() support
based on the existing x86 implementation. early_ioremp() is useful for
early boot code which needs to temporarily map I/O or memory regions
before normal mapping functions such as ioremap() are available.
Some architectures have optional MMU. In the no-MMU case, the remap
functions simply return the passed in physical address and the unmap
functions do nothing.
Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Young [Mon, 7 Apr 2014 22:39:46 +0000 (15:39 -0700)]
x86/mm: sparse warning fix for early_memremap
This patch series takes the common bits from the x86 early ioremap
implementation and creates a generic implementation which may be used by
other architectures. The early ioremap interfaces are intended for
situations where boot code needs to make temporary virtual mappings
before the normal ioremap interfaces are available. Typically, this
means before paging_init() has run.
This patch (of 6):
There's a lot of sparse warnings for code like below: void *a =
early_memremap(phys_addr, size);
early_memremap intend to map kernel memory with ioremap facility, the
return pointer should be a kernel ram pointer instead of iomem one.
For making the function clearer and supressing sparse warnings this patch
do below two things:
1. cast to (__force void *) for the return value of early_memremap
2. add early_memunmap function and pass (__force void __iomem *) to iounmap
From Boris:
"Ingo told me yesterday, it makes sense too. I'd guess we can try it.
FWIW, all callers of early_memremap use the memory they get remapped
as normal memory so we should be safe"
Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the system has only one CPU, lglock is effectively a spinlock; map
it directly to spinlock to eliminate the indirection and duplicate code.
In addition to removing overhead, this drops 1.6k of code with a
defconfig modified to have !CONFIG_SMP, and 1.1k with a minimal config.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Marek <mmarek@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We define a check function in order to avoid trouble with the include
files. Then the higher level __this_cpu macros are modified to invoke
the preemption check.
vmstat: use raw_cpu_ops to avoid false positives on preemption checks
vm counters are allowed to be racy. Use raw_cpu_ops to avoid the
local_irq_disable overhead and to avoid preemption checks which will be
added to the __this_cpu operations.
[akpm@linux-foundation.org: Add comment. Again.] Signed-off-by: Christoph Lameter <cl@linux.com> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Statistics are not critical to the operation of the allocation but
should also not cause too much overhead.
When __this_cpu_inc is altered to check if preemption is disabled this
triggers. Use raw_cpu_inc to avoid the checks. Using this_cpu_ops may
cause interrupt disable/enable sequences on various arches which may
significantly impact allocator performance.
[akpm@linux-foundation.org: add comment] Signed-off-by: Christoph Lameter <cl@linux.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
net: replace __this_cpu_inc in route.c with raw_cpu_inc
The RT_CACHE_STAT_INC macro triggers the new preemption checks
for __this_cpu ops.
I do not see any other synchronization that would allow the use of a
__this_cpu operation here however in commit dbd2915ce87e ("[IPV4]:
RT_CACHE_STAT_INC() warning fix") Andrew justifies the use of
raw_smp_processor_id() here because "we do not care" about races. In
the past we agreed that the price of disabling interrupts here to get
consistent counters would be too high. These counters may be inaccurate
due to race conditions.
The use of __this_cpu op improves the situation already from what commit dbd2915ce87e did since the single instruction emitted on x86 does not
allow the race to occur anymore. However, non x86 platforms could still
experience a race here.
Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
modules: use raw_cpu_write for initialization of per cpu refcount.
The initialization of a structure is not subject to synchronization.
The use of __this_cpu would trigger a false positive with the additional
preemption checks for __this_cpu ops.
So simply disable the check through the use of raw_cpu ops.
Therefore we need to use raw_cpu_read here to avoid false postives.
Note that this issue has been discussed in prior years. If the process
changes nodes after retrieving the current numa node then that is
acceptable since most uses of numa_node etc are for optimization and not
for correctness.
There were suggestions to implement a raw_numa_node_id in order to do
preempt checks for numa_node_id as well. But I think we better defer
that to another patch since that would mean investigating how
numa_node_id() is used throughout the kernel which would increase the
scope of this patchset significantly. After all preemption was never
checked before when numa_node_id() was used.
The kernel has never been audited to ensure that this_cpu operations are
consistently used throughout the kernel. The code generated in many
places can be improved through the use of this_cpu operations (which
uses a segment register for relocation of per cpu offsets instead of
performing address calculations).
The patch set also addresses various consistency issues in general with
the per cpu macros.
A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only
because checks are skipped. This is typically shown through a raw_
prefix. So this patch set changes the places where __this_cpu_ptr()
is used to raw_cpu_ptr().
B. There has been the long term wish by some that __this_cpu operations
would check for preemption. However, there are cases where preemption
checks need to be skipped. This patch set adds raw_cpu operations that
do not check for preemption and then adds preemption checks to the
__this_cpu operations.
C. The use of __get_cpu_var is always a reference to a percpu variable
that can also be handled via a this_cpu operation. This patch set
replaces all uses of __get_cpu_var with this_cpu operations.
D. We can then use this_cpu RMW operations in various places replacing
sequences of instructions by a single one.
E. The use of this_cpu operations throughout will allow other arches than
x86 to implement optimized references and RMV operations to work with
per cpu local data.
F. The use of this_cpu operations opens up the possibility to
further optimize code that relies on synchronization through
per cpu data.
The patch set works in a couple of stages:
I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr().
Also converts the existing __this_cpu_xx_# primitive in the x86
code to raw_cpu_xx_#.
II. Patch 2-4 use the raw_cpu operations in places that would give
us false positives once they are enabled.
III. Patch 5 adds preemption checks to __this_cpu operations to allow
checking if preemption is properly disabled when these functions
are used.
IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var
with this_cpu_ptr. They do not depend on any changes to the percpu
code. No preemption tests are skipped if they are applied.
V. Patches 21-46 are conversion patches that use this_cpu operations
in various kernel subsystems/drivers or arch code.
VI. Patches 47/48 (not included in this series) remove no longer used
functions (__this_cpu_ptr and __get_cpu_var). These should only be
applied after all the conversion patches have made it and after we
have done additional passes through the kernel to ensure that none of
the uses of these functions remain.
This patch (of 46):
The patches following this one will add preemption checks to __this_cpu
ops so we need to have an alternative way to use this_cpu operations
without preemption checks.
raw_cpu_ops will be the basis for all other ops since these will be the
operations that do not implement any checks.
Primitive operations are renamed by this patch from __this_cpu_xxx to
raw_cpu_xxxx.
Also change the uses of the x86 percpu primitives in preempt.h.
These depend directly on asm/percpu.h (header #include nesting issue).
Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Alex Shi <alex.shi@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bryan Wu <cooloney@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: David Daney <david.daney@cavium.com> Cc: David Miller <davem@davemloft.net> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Hedi Berriche <hedi@sgi.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: John Stultz <john.stultz@linaro.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Mike Travis <travis@sgi.com> Cc: Neil Brown <neilb@suse.de> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Robert Richter <rric@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Jones [Mon, 7 Apr 2014 22:39:32 +0000 (15:39 -0700)]
slub: fix leak of 'name' in sysfs_slab_add
The failure paths of sysfs_slab_add don't release the allocation of
'name' made by create_unique_id() a few lines above the context of the
diff below. Create a common exit path to make it more obvious what
needs freeing.
[vdavydov@parallels.com: free the name only if !unmergeable] Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, we try to arrange sysfs entries for memcg caches in the same
manner as for global caches. Apart from turning /sys/kernel/slab into a
mess when there are a lot of kmem-active memcgs created, it actually
does not work properly - we won't create more than one link to a memcg
cache in case its parent is merged with another cache. For instance, if
A is a root cache merged with another root cache B, we will have the
following sysfs setup:
X
A -> X
B -> X
where X is some unique id (see create_unique_id()). Now if memcgs M and
N start to allocate from cache A (or B, which is the same), we will get:
X
X:M
X:N
A -> X
B -> X
A:M -> X:M
A:N -> X:N
Since B is an alias for A, we won't get entries B:M and B:N, which is
confusing.
It is more logical to have entries for memcg caches under the
corresponding root cache's sysfs directory. This would allow us to keep
sysfs layout clean, and avoid such inconsistencies like one described
above.
This patch does the trick. It creates a "cgroup" kset in each root
cache kobject to keep its children caches there.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
slub: adjust memcg caches when creating cache alias
Otherwise, kzalloc() called from a memcg won't clear the whole object.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memcg, slab: do not destroy children caches if parent has aliases
Currently we destroy children caches at the very beginning of
kmem_cache_destroy(). This is wrong, because the root cache will not
necessarily be destroyed in the end - if it has aliases (refcount > 0),
kmem_cache_destroy() will simply decrement its refcount and return. In
this case, at best we will get a bunch of warnings in dmesg, like this
one:
kmem_cache_destroy kmalloc-32:0: Slab cache still has objects
CPU: 1 PID: 7139 Comm: modprobe Tainted: G B W 3.13.0+ #117
Call Trace:
dump_stack+0x49/0x5b
kmem_cache_destroy+0xdf/0xf0
kmem_cache_destroy_memcg_children+0x97/0xc0
kmem_cache_destroy+0xf/0xf0
xfs_mru_cache_uninit+0x21/0x30 [xfs]
exit_xfs_fs+0x2e/0xc44 [xfs]
SyS_delete_module+0x198/0x1f0
system_call_fastpath+0x16/0x1b
At worst - if kmem_cache_destroy() will race with an allocation from a
memcg cache - the kernel will panic.
This patch fixes this by moving children caches destruction after the
check if the cache has aliases. Plus, it forbids destroying a root
cache if it still has children caches, because each children cache keeps
a reference to its parent.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memcg, slab: unregister cache from memcg before starting to destroy it
Currently, memcg_unregister_cache(), which deletes the cache being
destroyed from the memcg_slab_caches list, is called after
__kmem_cache_shutdown() (see kmem_cache_destroy()), which starts to
destroy the cache.
As a result, one can access a partially destroyed cache while traversing
a memcg_slab_caches list, which can have deadly consequences (for
instance, cache_show() called for each cache on a memcg_slab_caches list
from mem_cgroup_slabinfo_read() will dereference pointers to already
freed data).
To fix this, let's move memcg_unregister_cache() before the cache
destruction process beginning, issuing memcg_register_cache() on failure.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memcg, slab: separate memcg vs root cache creation paths
Memcg-awareness turned kmem_cache_create() into a dirty interweaving of
memcg-only and except-for-memcg calls. To clean this up, let's move the
code responsible for memcg cache creation to a separate function.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch cleans up the memcg cache creation path as follows:
- Move memcg cache name creation to a separate function to be called
from kmem_cache_create_memcg(). This allows us to get rid of the mutex
protecting the temporary buffer used for the name formatting, because
the whole cache creation path is protected by the slab_mutex.
- Get rid of memcg_create_kmem_cache(). This function serves as a proxy
to kmem_cache_create_memcg(). After separating the cache name creation
path, it would be reduced to a function call, so let's inline it.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a kmem cache is created (kmem_cache_create_memcg()), we first try to
find a compatible cache that already exists and can handle requests from
the new cache, i.e. has the same object size, alignment, ctor, etc. If
there is such a cache, we do not create any new caches, instead we simply
increment the refcount of the cache found and return it.
Currently we do this procedure not only when creating root caches, but
also for memcg caches. However, there is no point in that, because, as
every memcg cache has exactly the same parameters as its parent and cache
merging cannot be turned off in runtime (only on boot by passing
"slub_nomerge"), the root caches of any two potentially mergeable memcg
caches should be merged already, i.e. it must be the same root cache, and
therefore we couldn't even get to the memcg cache creation, because it
already exists.
The only exception is boot caches - they are explicitly forbidden to be
merged by setting their refcount to -1. There are currently only two of
them - kmem_cache and kmem_cache_node, which are used in slab internals (I
do not count kmalloc caches as their refcount is set to 1 immediately
after creation). Since they are prevented from merging preliminary I
guess we should avoid to merge their children too.
So let's remove the useless code responsible for merging memcg caches.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Glauber Costa <glommer@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Mon, 7 Apr 2014 22:39:22 +0000 (15:39 -0700)]
asm/system.h: um: arch_align_stack() moved to asm/exec.h
arch_align_stack() moved to asm/exec.h, so change the comment referring to
asm/system.h which no longer exists.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel: use macros from compiler.h instead of __attribute__((...))
To increase compiler portability there is <linux/compiler.h> which
provides convenience macros for various gcc constructs. Eg: __weak for
__attribute__((weak)). I've replaced all instances of gcc attributes
with the right macro in the kernel subsystem.
Signed-off-by: Gideon Israel Dsouza <gidisrael@gmail.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the renamed symbol is defined lib/iomap.c implements ioport_map and
ioport_unmap and currently (nearly) all platforms define the port
accessor functions outb/inb and friend unconditionally. So
HAS_IOPORT_MAP is the better name for this.
Consequently NO_IOPORT is renamed to NO_IOPORT_MAP.
The motivation for this change is to reintroduce a symbol HAS_IOPORT
that signals if outb/int et al are available. I will address that at
least one merge window later though to keep surprises to a minimum and
catch new introductions of (HAS|NO)_IOPORT.
Daniel M. Weeks [Mon, 7 Apr 2014 22:39:16 +0000 (15:39 -0700)]
initramfs: debug detected compression method
This can greatly aid in narrowing down the real source of initramfs
problems such as failures related to the compression of the in-kernel
initramfs when an external initramfs is in use as well. Existing errors
are ambiguous as to which initramfs is a problem and why.
[akpm@linux-foundation.org: use pr_debug()] Signed-off-by: Daniel M. Weeks <dan@danweeks.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Jones [Mon, 7 Apr 2014 22:39:15 +0000 (15:39 -0700)]
fault-injection: set bounds on what /proc/self/make-it-fail accepts.
/proc/self/make-it-fail is a boolean, but accepts any number, including
negative ones. Change variable to unsigned, and cap upper bound at 1.
[akpm@linux-foundation.org: don't make make_it_fail unsigned] Signed-off-by: Dave Jones <davej@fedoraproject.org> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
x86: always define BUG() and HAVE_ARCH_BUG, even with !CONFIG_BUG
This ensures that BUG() always has a definition that causes a trap (via
an undefined instruction), and that the compiler still recognizes the
code following BUG() as unreachable, avoiding warnings that would
otherwise appear (such as on non-void functions that don't return a
value after BUG()).
In addition to saving a few bytes over the generic infinite-loop
implementation, this implementation traps rather than looping, which
potentially allows for better error-recovery behavior (such as by
rebooting).
When !CONFIG_BUG and !HAVE_ARCH_BUG, define the generic BUG() as an
infinite loop rather than a no-op. This avoids undefined behavior if
execution ever actually reaches BUG(), and avoids warnings about code
after BUG() (such as on non-void functions calling BUG() and then not
returning).
bug: when !CONFIG_BUG, simplify WARN_ON_ONCE and family
When !CONFIG_BUG, WARN_ON and family become simple passthroughs of their
condition argument; however, WARN_ON_ONCE and family still have conditions
and a boolean to detect one-time invocation, even though the warning
they'd emit doesn't exist. Make the existing definitions conditional on
CONFIG_BUG, and add definitions for !CONFIG_BUG that map to the
passthrough versions of WARN and WARN_ON.
This saves 4.4k on a minimized configuration (smaller than allnoconfig),
and 20.6k with defconfig plus CONFIG_BUG=n.
kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT
"make allnoconfig" exists to ease testing of minimal configurations.
Documentation/SubmitChecklist includes a note to test with allnoconfig.
This helps catch missing dependencies on common-but-not-required
functionality, which might otherwise go unnoticed.
However, allnoconfig still leaves many symbols enabled, because they're
hidden behind CONFIG_EMBEDDED or CONFIG_EXPERT. For instance, allnoconfig
still has CONFIG_PRINTK and CONFIG_BLOCK enabled, so drivers don't
typically get build-tested with those disabled.
To address this, introduce a new Kconfig option "allnoconfig_y", used on
symbols which only exist to hide other symbols. Set it on CONFIG_EMBEDDED
(which then selects CONFIG_EXPERT). allnoconfig will then disable all the
symbols hidden behind those.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ia64: select CONFIG_TTY for use of tty_write_message in unaligned
Fix breakage which will be exposed by the patch "kconfig: make allnoconfig
disable options behind EMBEDDED and EXPERT".
arch/ia64/kernel/unaligned.c uses tty_write_message to print an
unaligned access exception to the TTY of the current user process.
Enable TTY to prevent a build error.
Minimal fix, on the basis that few people on ia64 will care deeply about
kernel size enough to turn off TTY. Ideally, I'd instead suggest
dropping the tty_write_message entirely, and just leaving the printk.
Bonus: no need to sprintf first.
Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, booting without initrd specified on 80x25 screen gives a call
trace followed by atkbd : Spurious ACK. Original message ("VFS: Unable
to mount root fs") is not available. Of course this could happen in
other situations...
This patch displays panic reason after call trace which could help lot
of people even if it's not the very last line on screen.
Also, convert all panic.c printk(KERN_EMERG to pr_emerg(
[akpm@linux-foundation.org: missed a couple of pr_ conversions] Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
affs: add mount option to avoid filename truncates
Normal behavior for filenames exceeding specific filesystem limits is to
refuse operation.
AFFS standard name length being only 30 characters against 255 for usual
Linux filesystems, original implementation does filename truncate by
default with a define value AFFS_NO_TRUNCATE which can be enabled but
needs module compilation.
This patch adds 'nofilenametruncate' mount option so that user can
easily activate that feature and avoid a lot of problems (eg overwrite
files ...)
Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs/affs/dir.c: unlock/brelse dir on failure + code clean-up
Commit 0edf977d2ae3 ("[readdir] convert affs") returns directly -EIO
without unlocking dir inode and releasing dir bh when second affs_bread
sequence fails. This patch restores initial behaviour. It also fixes
pr_debug and affs_error to fit in 80 columns + removes reference to
filldir (replaced by dir_emit in the commit above).
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>