]> git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/log
mirror_ubuntu-jammy-kernel.git
13 years agoRapidIO: documentation update
Alexandre Bounine [Wed, 2 Nov 2011 20:39:19 +0000 (13:39 -0700)]
RapidIO: documentation update

Update rapidio.txt to reflect changes from recent patch.
See http://marc.info/?l=linux-kernel&m=131285620113589&w=2 for details.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Liu Gang <Gang.Liu@freescale.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/net/rionet.c: fix ethernet address macros for LE platforms
Alexandre Bounine [Wed, 2 Nov 2011 20:39:15 +0000 (13:39 -0700)]
drivers/net/rionet.c: fix ethernet address macros for LE platforms

Modify Ethernet addess macros to be compatible with BE/LE platforms

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Chul Kim <chul.kim@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: <stable@kernel.org> [2.6.39+]
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoRapidIO: fix potential null deref in rio_setup_device()
Alexandre Bounine [Wed, 2 Nov 2011 20:39:11 +0000 (13:39 -0700)]
RapidIO: fix potential null deref in rio_setup_device()

The "goto cleanup" path can deference "rswitch" when it is NULL.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Chul Kim <chul.kim@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoRapidIO: add mport driver for Tsi721 bridge
Alexandre Bounine [Wed, 2 Nov 2011 20:39:09 +0000 (13:39 -0700)]
RapidIO: add mport driver for Tsi721 bridge

Add RapidIO mport driver for IDT TSI721 PCI Express-to-SRIO bridge device.
 The driver provides full set of callback functions defined for mport
devices in RapidIO subsystem.  It also is compatible with current version
of RIONET driver (Ethernet over RapidIO messaging services).

This patch is applicable to kernel versions starting from 2.6.39.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Chul Kim <chul.kim@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoarch/powerpc/sysdev/fsl_rio.c: release rapidio port I/O region resource if port faile...
Liu Gang [Wed, 2 Nov 2011 20:39:07 +0000 (13:39 -0700)]
arch/powerpc/sysdev/fsl_rio.c: release rapidio port I/O region resource if port failed to initialize

The "struct rio_mport" contains a member of master port I/O memory
resource structure "struct resource iores".  This resource will be read
from device tree and be used for rapidio R/W transaction memory space.
Rapidio requests the port I/O memory resource under the root resource
"iomem_resource".

struct rio_mport *port;
port = kzalloc(sizeof(struct rio_mport), GFP_KERNEL);

request_resource(&iomem_resource, &port->iores);

When port failed to initialize, allocated "rio_mport" structure memory
will be freed, and the port I/O memory resource structure pointer
"&port->iores" will be invalid.  If other requests resource under
"iomem_resource", "&port->iores" node may be operated in the child
resources list and this will cause the system to crash.

So the requested port I/O memory resource should be released before
freeing allocated "rio_mport" structure.

Signed-off-by: Liu Gang <Gang.Liu@freescale.com>
Acked-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rapidio/rio-scan.c: use discovered bit to test if enumeration is complete
Liu Gang [Wed, 2 Nov 2011 20:39:05 +0000 (13:39 -0700)]
drivers/rapidio/rio-scan.c: use discovered bit to test if enumeration is complete

The discovered bit in PGCCSR register indicates if the device has been
discovered by system host.  In Rapidio systems, some agent devices can also
be master devices.  They can issue requests into the system.

Signed-off-by: Liu Gang <Gang.Liu@freescale.com>
Acked-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoinit: add root=PARTUUID=UUID/PARTNROFF=%d support
Will Drewry [Wed, 2 Nov 2011 20:38:59 +0000 (13:38 -0700)]
init: add root=PARTUUID=UUID/PARTNROFF=%d support

Expand root=PARTUUID=UUID syntax to support selecting a root partition by
integer offset from a known, unique partition.  This approach provides
similar properties to specifying a device and partition number, but using
the UUID as the unique path prior to evaluating the offset.

For example,
  root=PARTUUID=99DE9194-FC15-4223-9192-FC243948F88B/PARTNROFF=1
selects the partition with UUID 99DE.. then select the next
partition.

This change is motivated by a particular usecase in Chromium OS where the
bootloader can easily determine what partition it is on (by UUID) but
doesn't perform general partition table walking.

That said, support for this model provides a direct mechanism for the user
to modify the root partition to boot without specifically needing to
extract each UUID or update the bootloader explicitly when the root
partition UUID is changed (if it is recreated to be larger, for instance).
 Pinning to a /boot-style partition UUID allows the arbitrary root
partition reconfiguration/modifications with slightly less ambiguity than
just [dev][partition] and less stringency than the specific root partition
UUID.

[sfr@canb.auug.org.au: fix init sections warning]
Signed-off-by: Will Drewry <wad@chromium.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoinclude/linux/sem.h: make sysv_sem empty if SYSVIPC is disabled
Manfred Spraul [Wed, 2 Nov 2011 20:38:56 +0000 (13:38 -0700)]
include/linux/sem.h: make sysv_sem empty if SYSVIPC is disabled

For the sysvsem undo, each task struct contains a sysv_sem structure with
a pointer to the undo information.

This pointer is only necessary if sysvipc is enabled - thus the pointer
can be made conditional on CONFIG_SYSVIPC.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoipc/sem.c: remove private structures from public header file
Manfred Spraul [Wed, 2 Nov 2011 20:38:54 +0000 (13:38 -0700)]
ipc/sem.c: remove private structures from public header file

include/linux/sem.h contains several structures that are only used within
ipc/sem.c.

The patch moves them into ipc/sem.c - there is no need to expose the
structures to the whole kernel.

No functional changes, only whitespace cleanups and 80-char per line
fixes.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoipc/sem.c: handle spurious wakeups
Manfred Spraul [Wed, 2 Nov 2011 20:38:52 +0000 (13:38 -0700)]
ipc/sem.c: handle spurious wakeups

semtimedop() does not handle spurious wakeups, it returns -EINTR to user
space.  Most other schedule() users would just loop and not return to user
space.  The patch adds such a loop to semtimedop()

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoipc/sem.c: fix return code race with semop vs. semop +semctl(IPC_RMID)
Manfred Spraul [Wed, 2 Nov 2011 20:38:50 +0000 (13:38 -0700)]
ipc/sem.c: fix return code race with semop vs. semop +semctl(IPC_RMID)

sys_semtimedop() may return -EIDRM although the semaphore operation
completed successfully:

thread 1: thread 2:
semtimedop(), sleeps
semop():
* acquires sem_lock()
semtimedop() woken up due to timeout
sem_lock() loops
* notices that thread 2 could be completed.
* performs the operations that thread 2 is sleeping on.
* marks the semaphore operation as IN_WAKEUP
* drops sem_lock(), does wakeup, sets return code to 0
* thread delayed due to interrupt, whatever
* returns to user space
* thread still delayed
semctl(IPC_RMID)
* acquires sem_lock()
* ipc_rmid(), ipcp->deleted=1
* drops sem_lock()
* thread finally continues - but seem_lock()
  now fails due to ipcp->deleted == 1
* returns -EIDRM instead of 0

The fix is trivial: Always use the return code in queue.status.

In real world, the race probably doesn't matter:
If the semaphore array is destroyed, the app is probably not interested
if the last operation succeeded or was already cancelled.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoida: make ida_simple_get/put() IRQ safe
Tejun Heo [Wed, 2 Nov 2011 20:38:46 +0000 (13:38 -0700)]
ida: make ida_simple_get/put() IRQ safe

It's often convenient to be able to release resource from IRQ context.
Make ida_simple_*() use irqsave/restore spin ops so that they are IRQ
safe.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoproc: fix races against execve() of /proc/PID/fd**
Vasiliy Kulikov [Wed, 2 Nov 2011 20:38:44 +0000 (13:38 -0700)]
proc: fix races against execve() of /proc/PID/fd**

fd* files are restricted to the task's owner, and other users may not get
direct access to them.  But one may open any of these files and run any
setuid program, keeping opened file descriptors.  As there are permission
checks on open(), but not on readdir() and read(), operations on the kept
file descriptors will not be checked.  It makes it possible to violate
procfs permission model.

Reading fdinfo/* may disclosure current fds' position and flags, reading
directory contents of fdinfo/ and fd/ may disclosure the number of opened
files by the target task.  This information is not sensible per se, but it
can reveal some private information (like length of a password stored in a
file) under certain conditions.

Used existing (un)lock_trace functions to check for ptrace_may_access(),
but instead of using EPERM return code from it use EACCES to be consistent
with existing proc_pid_follow_link()/proc_pid_readlink() return code.  If
they differ, attacker can guess what fds exist by analyzing stat() return
code.  Patched handlers: stat() for fd/*, stat() and read() for fdindo/*,
readdir() and lookup() for fd/ and fdinfo/.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoprocfs: report EISDIR when reading sysctl dirs in proc
Pavel Emelyanov [Wed, 2 Nov 2011 20:38:42 +0000 (13:38 -0700)]
procfs: report EISDIR when reading sysctl dirs in proc

On reading sysctl dirs we should return -EISDIR instead of -EINVAL.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocpusets: avoid looping when storing to mems_allowed if one node remains set
David Rientjes [Wed, 2 Nov 2011 20:38:39 +0000 (13:38 -0700)]
cpusets: avoid looping when storing to mems_allowed if one node remains set

{get,put}_mems_allowed() exist so that general kernel code may locklessly
access a task's set of allowable nodes without having the chance that a
concurrent write will cause the nodemask to be empty on configurations
where MAX_NUMNODES > BITS_PER_LONG.

This could incur a significant delay, however, especially in low memory
conditions because the page allocator is blocking and reclaim requires
get_mems_allowed() itself.  It is not atypical to see writes to
cpuset.mems take over 2 seconds to complete, for example.  In low memory
conditions, this is problematic because it's one of the most imporant
times to change cpuset.mems in the first place!

The only way a task's set of allowable nodes may change is through cpusets
by writing to cpuset.mems and when attaching a task to a generic code is
not reading the nodemask with get_mems_allowed() at the same time, and
then clearing all the old nodes.  This prevents the possibility that a
reader will see an empty nodemask at the same time the writer is storing a
new nodemask.

If at least one node remains unchanged, though, it's possible to simply
set all new nodes and then clear all the old nodes.  Changing a task's
nodemask is protected by cgroup_mutex so it's guaranteed that two threads
are not changing the same task's nodemask at the same time, so the
nodemask is guaranteed to be stored before another thread changes it and
determines whether a node remains set or not.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Paul Menage <paul@paulmenage.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomm/page_cgroup.c: quiet sparse noise
H Hartley Sweeten [Wed, 2 Nov 2011 20:38:36 +0000 (13:38 -0700)]
mm/page_cgroup.c: quiet sparse noise

warning: symbol 'swap_cgroup_ctrl' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: Fix race condition in memcg_check_events() with this_cpu usage
Steven Rostedt [Wed, 2 Nov 2011 20:38:33 +0000 (13:38 -0700)]
memcg: Fix race condition in memcg_check_events() with this_cpu usage

Various code in memcontrol.c () calls this_cpu_read() on the calculations
to be done from two different percpu variables, or does an open-coded
read-modify-write on a single percpu variable.

Disable preemption throughout these operations so that the writes go to
the correct palces.

[hannes@cmpxchg.org: added this_cpu to __this_cpu conversion]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: close race between charge and putback
Johannes Weiner [Wed, 2 Nov 2011 20:38:29 +0000 (13:38 -0700)]
memcg: close race between charge and putback

There is a potential race between a thread charging a page and another
thread putting it back to the LRU list:

  charge:                         putback:
  SetPageCgroupUsed               SetPageLRU
  PageLRU && add to memcg LRU     PageCgroupUsed && add to memcg LRU

The order of setting one flag and checking the other is crucial, otherwise
the charge may observe !PageLRU while the putback observes !PageCgroupUsed
and the page is not linked to the memcg LRU at all.

Global memory pressure may fix this by trying to isolate and putback the
page for reclaim, where that putback would link it to the memcg LRU again.
 Without that, the memory cgroup is undeletable due to a charge whose
physical page can not be found and moved out.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Cc: Ying Han <yinghan@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: skip scanning active lists based on individual size
Johannes Weiner [Wed, 2 Nov 2011 20:38:23 +0000 (13:38 -0700)]
memcg: skip scanning active lists based on individual size

Reclaim decides to skip scanning an active list when the corresponding
inactive list is above a certain size in comparison to leave the assumed
working set alone while there are still enough reclaim candidates around.

The memcg implementation of comparing those lists instead reports whether
the whole memcg is low on the requested type of inactive pages,
considering all nodes and zones.

This can lead to an oversized active list not being scanned because of the
state of the other lists in the memcg, as well as an active list being
scanned while its corresponding inactive list has enough pages.

Not only is this wrong, it's also a scalability hazard, because the global
memory state over all nodes and zones has to be gathered for each memcg
and zone scanned.

Make these calculations purely based on the size of the two LRU lists
that are actually affected by the outcome of the decision.

Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: do not expose uninitialized mem_cgroup_per_node to world
Igor Mammedov [Wed, 2 Nov 2011 20:38:21 +0000 (13:38 -0700)]
memcg: do not expose uninitialized mem_cgroup_per_node to world

If somebody is touching data too early, it might be easier to diagnose a
problem when dereferencing NULL at mem->info.nodeinfo[node] than trying to
understand why mem_cgroup_per_zone is [un|partly]initialized.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: fix oom schedule_timeout()
KAMEZAWA Hiroyuki [Wed, 2 Nov 2011 20:38:18 +0000 (13:38 -0700)]
memcg: fix oom schedule_timeout()

Before calling schedule_timeout(), task state should be changed.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: rename mem variable to memcg
Raghavendra K T [Wed, 2 Nov 2011 20:38:15 +0000 (13:38 -0700)]
memcg: rename mem variable to memcg

The memcg code sometimes uses "struct mem_cgroup *mem" and sometimes uses
"struct mem_cgroup *memcg".  Rename all mem variables to memcg in source
file.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocgroup/kmemleak: Annotate alloc_page() for cgroup allocations
Steven Rostedt [Wed, 2 Nov 2011 20:38:11 +0000 (13:38 -0700)]
cgroup/kmemleak: Annotate alloc_page() for cgroup allocations

When the cgroup base was allocated with kmalloc, it was necessary to
annotate the variable with kmemleak_not_leak().  But because it has
recently been changed to be allocated with alloc_page() (which skips
kmemleak checks) causes a warning on boot up.

I was triggering this output:

 allocated 8388608 bytes of page_cgroup
 please try 'cgroup_disable=memory' option if you don't want memory cgroups
 kmemleak: Trying to color unknown object at 0xf5840000 as Grey
 Pid: 0, comm: swapper Not tainted 3.0.0-test #12
 Call Trace:
  [<c17e34e6>] ? printk+0x1d/0x1f^M
  [<c10e2941>] paint_ptr+0x4f/0x78
  [<c178ab57>] kmemleak_not_leak+0x58/0x7d
  [<c108ae9f>] ? __rcu_read_unlock+0x9/0x7d
  [<c1cdb462>] kmemleak_init+0x19d/0x1e9
  [<c1cbf771>] start_kernel+0x346/0x3ec
  [<c1cbf1b4>] ? loglevel+0x18/0x18
  [<c1cbf0aa>] i386_start_kernel+0xaa/0xb0

After a bit of debugging I tracked the object 0xf840000 (and others) down
to the cgroup code.  The change from allocating base with kmalloc to
alloc_page() has the base not calling kmemleak_alloc() which adds the
pointer to the object_tree_root, but kmemleak_not_leak() adds it to the
crt_early_log[] table.  On kmemleak_init(), the entry is found in the
early_log[] but not the object_tree_root, and this error message is
displayed.

If alloc_page() fails then it defaults back to vmalloc() which still uses
the kmemleak_alloc() which makes us still need the kmemleak_not_leak()
call.  The solution is to call the kmemleak_alloc() directly if the
alloc_page() succeeds.

Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocgroups: don't attach task to subsystem if migration failed
Ben Blum [Wed, 2 Nov 2011 20:38:07 +0000 (13:38 -0700)]
cgroups: don't attach task to subsystem if migration failed

If a task has exited to the point it has called cgroup_exit() already,
then we can't migrate it to another cgroup anymore.

This can happen when we are attaching a task to a new cgroup between the
call to ->can_attach_task() on subsystems and the migration that is
eventually tried in cgroup_task_migrate().

In this case cgroup_task_migrate() returns -ESRCH and we don't want to
attach the task to the subsystems because the attachment to the new cgroup
itself failed.

Fix this by only calling ->attach_task() on the subsystems if the cgroup
migration succeeded.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocgroups: more safe tasklist locking in cgroup_attach_proc
Ben Blum [Wed, 2 Nov 2011 20:38:05 +0000 (13:38 -0700)]
cgroups: more safe tasklist locking in cgroup_attach_proc

Fix unstable tasklist locking in cgroup_attach_proc.

According to this thread - https://lkml.org/lkml/2011/7/27/243 - RCU is
not sufficient to guarantee the tasklist is stable w.r.t.  de_thread and
exit.  Taking tasklist_lock for reading, instead of rcu_read_lock, ensures
proper exclusion.

Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agohfs: fix hfs_find_init() sb->ext_tree NULL ptr oops
Phillip Lougher [Wed, 2 Nov 2011 20:38:01 +0000 (13:38 -0700)]
hfs: fix hfs_find_init() sb->ext_tree NULL ptr oops

Clement Lecigne reports a filesystem which causes a kernel oops in
hfs_find_init() trying to dereference sb->ext_tree which is NULL.

This proves to be because the filesystem has a corrupted MDB extent
record, where the extents file does not fit into the first three extents
in the file record (the first blocks).

In hfs_get_block() when looking up the blocks for the extent file
(HFS_EXT_CNID), it fails the first blocks special case, and falls
through to the extent code (which ultimately calls hfs_find_init())
which is in the process of being initialised.

Hfs avoids this scenario by always having the extents b-tree fitting
into the first blocks (the extents B-tree can't have overflow extents).

The fix is to check at mount time that the B-tree fits into first
blocks, i.e.  fail if HFS_I(inode)->alloc_blocks >=
HFS_I(inode)->first_blocks

Note, the existing commit 47f365eb57573 ("hfs: fix oops on mount with
corrupted btree extent records") becomes subsumed into this as a special
case, but only for the extents B-tree (HFS_EXT_CNID), it is perfectly
acceptable for the catalog B-Tree file to grow beyond three extents,
with the remaining extent descriptors in the extents overfow.

This fixes CVE-2011-2203

Reported-by: Clement LECIGNE <clement.lecigne@netasq.com>
Signed-off-by: Phillip Lougher <plougher@redhat.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoisofs: add readpages support
Namjae Jeon [Wed, 2 Nov 2011 20:38:00 +0000 (13:38 -0700)]
isofs: add readpages support

Use mpage_readpages() instead of multiple calls to isofs_readpage() to
reduce the CPU utilization and make performance higher.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agominix: describe usage of different magic numbers
Sami Kerola [Wed, 2 Nov 2011 20:37:58 +0000 (13:37 -0700)]
minix: describe usage of different magic numbers

One can get this information from minix/inode.c, but adding the
explanations at the definition sites is more appropriate.

Signed-off-by: Sami Kerola <kerolasa@iki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rtc/rtc-mc13xxx.c: move probe and remove callbacks to .init.text and .exit...
Uwe Kleine-König [Wed, 2 Nov 2011 20:37:56 +0000 (13:37 -0700)]
drivers/rtc/rtc-mc13xxx.c: move probe and remove callbacks to .init.text and .exit.text

The driver is added using platform_driver_probe(), so the callbacks can be
discarded more aggessively.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agortc: add initial support for mcp7941x parts
David Anders [Wed, 2 Nov 2011 20:37:53 +0000 (13:37 -0700)]
rtc: add initial support for mcp7941x parts

Add initial support for the microchip mcp7941x series of real time clocks.

The mcp7941x series is generally compatible with the ds1307 and ds1337 rtc
devices from dallas semiconductor.  minor differences include a backup
battery enable bit, and the polarity of the oscillator enable bit.

Signed-off-by: David Anders <danders.dev@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rtc/class.c: convert idr to ida and use ida_simple_get()
Jonathan Cameron [Wed, 2 Nov 2011 20:37:49 +0000 (13:37 -0700)]
drivers/rtc/class.c: convert idr to ida and use ida_simple_get()

This is the one use of an ida that doesn't retry on receiving -EAGAIN.
I'm assuming do so will cause no harm and may help on a rare occasion.

Signed-off-by: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoinit/do_mounts_rd.c: fix ramdisk identification for padded cramfs
Neil Armstrong [Wed, 2 Nov 2011 20:37:47 +0000 (13:37 -0700)]
init/do_mounts_rd.c: fix ramdisk identification for padded cramfs

When a cramfs ramdisk padded with 512 bytes is given to the kernel, the
current identify_ramdisk_image function fails to identify it.

Tested with a padded cramfs image on an ARM based board.

Signed-off-by: Neil Armstrong <narmstrong@neotion.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Davidlohr Bueso <dave@gnu.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoramfs: remove module leftovers
Richard Weinberger [Wed, 2 Nov 2011 20:37:45 +0000 (13:37 -0700)]
ramfs: remove module leftovers

Since ramfs is hard-selected to "y", the module leftovers make no sense.

Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agobinfmt_elf: fix PIE execution with randomization disabled
Jiri Kosina [Wed, 2 Nov 2011 20:37:41 +0000 (13:37 -0700)]
binfmt_elf: fix PIE execution with randomization disabled

The case of address space randomization being disabled in runtime through
randomize_va_space sysctl is not treated properly in load_elf_binary(),
resulting in SIGKILL coming at exec() time for certain PIE-linked binaries
in case the randomization has been disabled at runtime prior to calling
exec().

Handle the randomize_va_space == 0 case the same way as if we were not
supporting .text randomization at all.

Based on original patch by H.J. Lu and Josh Boyer.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: H.J. Lu <hongjiu.lu@intel.com>
Cc: <stable@kernel.org>
Tested-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agothp: share get_huge_page_tail()
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:36 +0000 (13:37 -0700)]
thp: share get_huge_page_tail()

This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agosparc: gup_pte_range() support THP based tail recounting
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:31 +0000 (13:37 -0700)]
sparc: gup_pte_range() support THP based tail recounting

Up to this point the code assumed old refcounting for hugepages (pre-thp).
 This updates the code directly to the thp mapcount tail page refcounting.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agos390: gup_huge_pmd() return 0 if pte changes
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:28 +0000 (13:37 -0700)]
s390: gup_huge_pmd() return 0 if pte changes

s390 didn't return 0 in that case, if it's rolling back the *nr pointer it
should also return zero to avoid adding pages to the array at the wrong
offset.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agos390: gup_huge_pmd() support THP tail recounting
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:25 +0000 (13:37 -0700)]
s390: gup_huge_pmd() support THP tail recounting

Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agopowerpc: gup_huge_pmd() return 0 if pte changes
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:19 +0000 (13:37 -0700)]
powerpc: gup_huge_pmd() return 0 if pte changes

powerpc didn't return 0 in that case, if it's rolling back the *nr pointer
it should also return zero to avoid adding pages to the array at the wrong
offset.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agopowerpc: gup_hugepte() support THP based tail recounting
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:15 +0000 (13:37 -0700)]
powerpc: gup_hugepte() support THP based tail recounting

Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agopowerpc: gup_hugepte() avoid freeing the head page too many times
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:11 +0000 (13:37 -0700)]
powerpc: gup_hugepte() avoid freeing the head page too many times

We only taken "refs" pins on the head page not "*nr" pins.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agopowerpc: get_hugepte() don't put_page() the wrong page
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:08 +0000 (13:37 -0700)]
powerpc: get_hugepte() don't put_page() the wrong page

"page" may have changed to point to the next hugepage after the loop
completed, The references have been taken on the head page, so the
put_page must happen there too.

This is a longstanding issue pre-thp inclusion.

It's totally unclear how these page_cache_add_speculative and
pte_val(pte) != pte_val(*ptep) checks are necessary across all the
powerpc gup_fast code, when x86 doesn't need any of that: there's no way
the page can be freed with irq disabled so we're guaranteed the
atomic_inc will happen on a page with page_count > 0 (so not needing the
speculative check).

The pte check is also meaningless on x86: no need to rollback on x86 if
the pte changed, because the pte can still change a CPU tick after the
check succeeded and it won't be rolled back in that case.  The important
thing is we got a reference on a valid page that was mapped there a CPU
tick ago.  So not knowing the soft tlb refill code of ppc64 in great
detail I'm not removing the "speculative" page_count increase and the
pte checks across all the code, but unless there's a strong reason for
it they should be later cleaned up too.

If a pte can change from huge to non-huge (like it could happen with
THP) passing a pte_t *ptep to gup_hugepte() would also require to repeat
the is_hugepd in gup_hugepte(), but that shouldn't happen with hugetlbfs
only so I'm not altering that.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agopowerpc: remove superfluous PageTail checks on the pte gup_fast
Andrea Arcangeli [Wed, 2 Nov 2011 20:37:03 +0000 (13:37 -0700)]
powerpc: remove superfluous PageTail checks on the pte gup_fast

This part of gup_fast doesn't seem capable of handling hugetlbfs ptes,
those should be handled by gup_hugepd only, so these checks are
superfluous.

Plus if this wasn't a noop, it would have oopsed because, the insistence
of using the speculative refcounting would trigger a VM_BUG_ON if a tail
page was encountered in the page_cache_get_speculative().

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomm: thp: tail page refcounting fix
Andrea Arcangeli [Wed, 2 Nov 2011 20:36:59 +0000 (13:36 -0700)]
mm: thp: tail page refcounting fix

Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
wasn't safe, if the pfn ended up being a tail page of a transparent
hugepage under splitting by __split_huge_page_refcount().

He then found the problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail->_count zero at
all times.  This will guarantee that get_page_unless_zero() can never
succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
is unused for all tail pages of a compound page, so we can simply
account the tail page references there and transfer them to
tail_page->_count in __split_huge_page_refcount() (in addition to the
head_page->_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages.  That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic.  As
opposed to other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns.  It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).

The standard get_page() as invoked by direct-io instead will now take
the compound_lock but still only for tail pages.  The direct-io paths
are usually I/O bound and the compound_lock is per THP so very
finegrined, so there's no risk of scalability issues with it.  A simple
direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead.  So it's worth it.  Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages().  The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation
and usually only run in I/O paths.

This new refcounting on page_tail->_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'next/soc' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 04:08:03 +0000 (21:08 -0700)]
Merge branch 'next/soc' of git://git.linaro.org/people/arnd/arm-soc

* 'next/soc' of git://git.linaro.org/people/arnd/arm-soc: (21 commits)
  MAINTAINERS: add ARM/FREESCALE IMX6 entry
  arm/imx: merge i.MX3 and i.MX6
  arm/imx6q: add suspend/resume support
  arm/imx6q: add device tree machine support
  arm/imx6q: add smp and cpu hotplug support
  arm/imx6q: add core drivers clock, gpc, mmdc and src
  arm/imx: add gic_handle_irq function
  arm/imx6q: add core definitions and low-level debug uart
  arm/imx6q: add device tree source
  ARM: highbank: add suspend support
  ARM: highbank: Add cpu hotplug support
  ARM: highbank: add SMP support
  MAINTAINERS: add Calxeda Highbank ARM platform
  ARM: add Highbank core platform support
  ARM: highbank: add devicetree source
  ARM: l2x0: add empty l2x0_of_init
  picoxcell: add a definition of VMALLOC_END
  picoxcell: remove custom ioremap implementation
  picoxcell: add the DTS for the PC7302 board
  picoxcell: add the DTS for pc3x2 and pc3x3 devices
  ...

Fix up trivial conflicts in arch/arm/Kconfig, and some more header file
conflicts in arch/arm/mach-omap2/board-generic.c (as per an ealier merge
by Arnd).

13 years agoMerge branch 'next/dt' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 04:02:35 +0000 (21:02 -0700)]
Merge branch 'next/dt' of git://git.linaro.org/people/arnd/arm-soc

* 'next/dt' of git://git.linaro.org/people/arnd/arm-soc:
  ARM: gic: use module.h instead of export.h
  ARM: gic: fix irq_alloc_descs handling for sparse irq
  ARM: gic: add OF based initialization
  ARM: gic: add irq_domain support
  irq: support domains with non-zero hwirq base
  of/irq: introduce of_irq_init
  ARM: at91: add at91sam9g20 and Calao USB A9G20 DT support
  ARM: at91: dt: at91sam9g45 family and board device tree files
  arm/mx5: add device tree support for imx51 babbage
  arm/mx5: add device tree support for imx53 boards
  ARM: msm: Add devicetree support for msm8660-surf
  msm_serial: Add devicetree support
  msm_serial: Use relative resources for iomem

Fix up conflicts in arch/arm/mach-at91/{at91sam9260.c,at91sam9g45.c}

13 years agoMerge branch 'next/cleanup2' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 03:58:25 +0000 (20:58 -0700)]
Merge branch 'next/cleanup2' of git://git.linaro.org/people/arnd/arm-soc

* 'next/cleanup2' of git://git.linaro.org/people/arnd/arm-soc: (31 commits)
  ARM: OMAP: Warn if omap_ioremap is called before SoC detection
  ARM: OMAP: Move set_globals initialization to happen in init_early
  ARM: OMAP: Map SRAM later on with ioremap_exec()
  ARM: OMAP: Remove calls to SRAM allocations for framebuffer
  ARM: OMAP: Avoid cpu_is_omapxxxx usage until map_io is done
  ARM: OMAP1: Use generic map_io, init_early and init_irq
  arm/dts: OMAP3+: Add mpu, dsp and iva nodes
  arm/dts: OMAP4: Add a main ocp entry bound to l3-noc driver
  ARM: OMAP2+: l3-noc: Add support for device-tree
  ARM: OMAP2+: board-generic: Add i2c static init
  ARM: OMAP2+: board-generic: Add DT support to generic board
  arm/dts: Add support for OMAP3 Beagle board
  arm/dts: Add initial device tree support for OMAP3 SoC
  arm/dts: Add support for OMAP4 SDP board
  arm/dts: Add support for OMAP4 PandaBoard
  arm/dts: Add initial device tree support for OMAP4 SoC
  ARM: OMAP: omap_device: Add a method to build an omap_device from a DT node
  ARM: OMAP: omap_device: Add omap_device_[alloc|delete] for DT integration
  of: Add helpers to get one string in multiple strings property
  ARM: OMAP2+: devices: Remove all omap_device_pm_latency structures
  ...

Fix up trivial header file conflicts in arch/arm/mach-omap2/board-generic.c

13 years agoMerge branch 'next/cross-platform' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 03:34:22 +0000 (20:34 -0700)]
Merge branch 'next/cross-platform' of git://git.linaro.org/people/arnd/arm-soc

* 'next/cross-platform' of git://git.linaro.org/people/arnd/arm-soc:
  arm/imx: use Kconfig choice for low-level debug UART selection
  ARM: realview: use Kconfig choice for debug UART selection
  ARM: plat-samsung: use Kconfig choice for debug UART selection
  ARM: versatile: convert logical CPU numbers to physical numbers
  ARM: ux500: convert logical CPU numbers to physical numbers
  ARM: shmobile: convert logical CPU numbers to physical numbers
  ARM: msm: convert logical CPU numbers to physical numbers
  ARM: exynos4: convert logical CPU numbers to physical numbers

Fix up trivial conflict (config DEBUG_S3C_UART move/split vs addition of
ARM_KPROBES_TEST option) in arch/arm/Kconfig.debug

13 years agoMerge branch 'next/devel' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 03:31:25 +0000 (20:31 -0700)]
Merge branch 'next/devel' of git://git.linaro.org/people/arnd/arm-soc

* 'next/devel' of git://git.linaro.org/people/arnd/arm-soc: (50 commits)
  ARM: tegra: update defconfig
  arm/tegra: Harmony: Configure PMC for low-level interrupts
  arm/tegra: device tree support for ventana board
  arm/tegra: add support for ventana pinmuxing
  arm/tegra: prepare Seaboard pinmux code for derived boards
  arm/tegra: pinmux: ioremap registers
  gpio/tegra: Convert to a platform device
  arm/tegra: Convert pinmux driver to a platform device
  arm/dt: Tegra: Add pinmux node to tegra20.dtsi
  arm/tegra: Prep boards for gpio/pinmux conversion to pdevs
  ARM: mx5: fix clock usage for suspend
  ARM i.MX entry-macro.S: remove now unused code
  ARM i.MX boards: use CONFIG_MULTI_IRQ_HANDLER
  ARM i.MX tzic: add handle_irq function
  ARM i.MX avic: add handle_irq function
  ARM: mx25: Add the missing IIM base definition
  ARM i.MX avic: convert to use generic irq chip
  mx31moboard: Add poweroff support
  ARM: mach-qong: Add watchdog support
  ARM: davinci: AM18x: Add wl1271/wlan support
  ...

Fix up conflicts in:
arch/arm/mach-at91/at91sam9g45.c
arch/arm/mach-mx5/devices-imx53.h
arch/arm/plat-mxc/include/mach/memory.h

13 years agoMerge branch 'next/board' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 03:25:36 +0000 (20:25 -0700)]
Merge branch 'next/board' of git://git.linaro.org/people/arnd/arm-soc

* 'next/board' of git://git.linaro.org/people/arnd/arm-soc: (34 commits)
  ep93xx: add support Vision EP9307 SoM
  ARM: mxs: Add initial support for DENX MX28
  ARM: EXYNOS4: Add support SMDK4412 Board
  ARM: EXYNOS4: Add MCT support for EXYNOS4412
  ARM: EXYNOS4: Add functions for gic interrupt handling
  ARM: EXYNOS4: Add support clock for EXYNOS4412
  ARM: EXYNOS4: Add support new EXYNOS4412 SoC
  ARM: EXYNOS4: Add support MCT PPI for EXYNOS4212
  ARM: EXYNOS4: Add support PPI in external GIC
  ARM: EXYNOS4: convert boot_params to atag_offset
  ixp4xx: support omicron ixp425 based boards
  ARM: EXYNOS4: Add support SMDK4212 Board
  ARM: EXYNOS4: Add support PM for EXYNOS4212
  ARM: EXYNOS4: Add support clock for EXYNOS4212
  ARM: EXYNOS4: Add support new EXYNOS4212 SoC
  at91: USB-A9G20 C01 & C11 board support
  at91: merge board USB-A9260 and USB-A9263 together
  at91: add support for RSIs EWS board
  ARM: SAMSUNG: Fix mask value for S5P64X0 CPU IDs
  ARM: SAMSUNG: Fix mask for S3C64xx CPU IDs
  ...

13 years agoMerge branch 'next/deletion' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 03:24:30 +0000 (20:24 -0700)]
Merge branch 'next/deletion' of git://git.linaro.org/people/arnd/arm-soc

* 'next/deletion' of git://git.linaro.org/people/arnd/arm-soc:
  ARM: mach-nuc93x: delete

Fix up trivial delete/edit conflicts in
arch/arm/mach-nuc93x/{Makefile.boot,mach-nuc932evb.c,time.c}

13 years agoMerge branch 'next/pm' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 03:22:01 +0000 (20:22 -0700)]
Merge branch 'next/pm' of git://git.linaro.org/people/arnd/arm-soc

* 'next/pm' of git://git.linaro.org/people/arnd/arm-soc: (66 commits)
  ARM: CSR: PM: use outer_resume to resume L2 cache
  ARM: CSR: call l2x0_of_init to init L2 cache of SiRFprimaII
  ARM: OMAP: voltage: voltage layer present, even when CONFIG_PM=n
  ARM: CSR: PM: add sleep entry for SiRFprimaII
  ARM: CSR: PM: save/restore irq status in suspend cycle
  ARM: CSR: PM: save/restore timer status in suspend cycle
  OMAP4: PM: TWL6030: add cmd register
  OMAP4: PM: TWL6030: fix ON/RET/OFF voltages
  OMAP4: PM: TWL6030: address 0V conversions
  OMAP4: PM: TWL6030: fix uv to voltage for >0x39
  OMAP4: PM: TWL6030: fix voltage conversion formula
  omap: voltage: add a stub header file for external/regulator use
  OMAP2+: VC: more registers are per-channel starting with OMAP5
  OMAP3+: voltage: update nominal voltage in voltdm_scale() not VC post-scale
  OMAP3+: voltage: rename omap_voltage_get_nom_volt -> voltdm_get_voltage
  OMAP3+: voltdm: final removal of omap_vdd_info
  OMAP3+: voltage: move/rename curr_volt from vdd_info into struct voltagedomain
  OMAP3+: voltage: rename scale and reset functions using voltdm_ prefix
  OMAP3+: VP: combine setting init voltage into common function
  OMAP3+: VP: remove unused omap_vp_get_curr_volt()
  ...

Fix up trivial conflict in arch/arm/mach-prima2/l2x0.c (code removal vs
edit)

13 years agoMerge branch 'next/timer' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 03:18:05 +0000 (20:18 -0700)]
Merge branch 'next/timer' of git://git.linaro.org/people/arnd/arm-soc

* 'next/timer' of git://git.linaro.org/people/arnd/arm-soc:
  clocksource: fixup ux500 build problems
  ARM: omap: use __devexit_p in dmtimer driver
  ARM: ux500: Reprogram timers upon resume
  ARM: plat-nomadik: timer: Export reset functions
  ARM: plat-nomadik: timer: Add support for periodic timers
  ARM: ux500: Move timer code to separate file
  ARM: ux500: add support for clocksource DBX500 PRCMU
  clocksource: add DBX500 PRCMU Timer support
  ARM: plat-nomadik: MTU sched_clock as an option
  ARM: OMAP: dmtimer: add error handling to export APIs
  ARM: OMAP: dmtimer: low-power mode support
  ARM: OMAP: dmtimer: skip reserved timers
  ARM: OMAP: dmtimer: pm_runtime support
  ARM: OMAP: dmtimer: switch-over to platform device driver
  ARM: OMAP: dmtimer: platform driver
  ARM: OMAP2+: dmtimer: convert to platform devices
  ARM: OMAP1: dmtimer: conversion to platform devices
  ARM: OMAP2+: dmtimer: add device names to flck nodes
  ARM: OMAP: Add support for dmtimer v2 ip

13 years agoMerge branch 'next/driver' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 03:16:43 +0000 (20:16 -0700)]
Merge branch 'next/driver' of git://git.linaro.org/people/arnd/arm-soc

* 'next/driver' of git://git.linaro.org/people/arnd/arm-soc:
  hw_random: add driver for atmel true hardware random number generator
  ARM: at91: at91sam9g45: add trng clock and platform device
  MX53 Enable the AHCI SATA on MX53 SMD board
  MX53 Enable the AHCI SATA on MX53 LOCO board
  MX53 Enable the AHCI SATA on MX53 ARD board
  AHCI Add the AHCI SATA feature on the MX53 platforms
  Fix pata imx resource
  ARM: imx: Define functions for registering PATA
  ARM: imx: Add PATA clock support
  ARM: imx: Add PATA resources for other i.MX processors
  imx: efika: Enable pata.
  imx51: add pata clock
  imx51: add pata device

Fix up trivial conflict (new selects next to each other from separate
branches for EFIKA_COMMON) in arch/arm/mach-mx5/Kconfig

13 years agoMerge branch 'next/cleanup' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 03:11:00 +0000 (20:11 -0700)]
Merge branch 'next/cleanup' of git://git.linaro.org/people/arnd/arm-soc

* 'next/cleanup' of git://git.linaro.org/people/arnd/arm-soc: (125 commits)
  ARM: mach-mxs: fix machines' initializers order
  mmc: mxcmmc: explicitly includes mach/hardware.h
  arm/imx: explicitly includes mach/hardware.h in pm-imx27.c
  arm/imx: remove mx27_setup_weimcs() from mx27.h
  arm/imx: explicitly includes mach/hardware.h in mach-kzm_arm11_01.c
  arm/imx: remove mx31_setup_weimcs() from mx31.h
  ARM: tegra: devices.c should include devices.h
  ARM: tegra: cpu-tegra: unexport two functions
  ARM: tegra: cpu-tegra: sparse type fix
  ARM: tegra: dma: staticify some tables and functions
  ARM: tegra: tegra2_clocks: don't export some tables
  ARM: tegra: tegra_powergate_is_powered should be static
  ARM: tegra: tegra_rtc_read_ms should be static
  ARM: tegra: tegra_init_cache should be static
  ARM: tegra: pcie: 0 -> NULL changes
  ARM: tegra: pcie: include board.h
  ARM: tegra: pcie: don't cast __iomem pointers
  ARM: tegra: tegra2_clocks: 0 -> NULL changes
  ARM: tegra: tegra2_clocks: don't cast __iomem pointers
  ARM: tegra: timer: don't cast __iomem pointers
  ...

Fix up trivial conflicts in
  arch/arm/mach-omap2/Makefile,
  arch/arm/mach-u300/{Makefile.boot,core.c}
  arch/arm/plat-{mxc,omap}/devices.c

13 years agoMerge branch 'next/fixes' of git://git.linaro.org/people/arnd/arm-soc
Linus Torvalds [Wed, 2 Nov 2011 02:55:06 +0000 (19:55 -0700)]
Merge branch 'next/fixes' of git://git.linaro.org/people/arnd/arm-soc

* 'next/fixes' of git://git.linaro.org/people/arnd/arm-soc: (28 commits)
  ARM: pxa/cm-x300: properly set bt_reset pin
  ARM: mmp: rename SHEEVAD to GPLUGD
  ARM: imx: Fix typo 'MACH_MX31_3DS_MXC_NAND_USE_BBT'
  ARM: i.MX28: shift frac value in _CLK_SET_RATE
  plat-mxc: iomux-v3.h: implicitly enable pull-up/down when that's desired
  ARM: mx5: fix clock usage for suspend
  ARM: pxa: use correct __iomem annotations
  ARM: pxa: sharpsl pm needs SPI
  ARM: pxa: centro and treo680 need palm27x
  ARM: pxa: make pxafb_smart_*() empty when not enabled
  ARM: pxa: select POWER_SUPPLY on raumfeld
  ARM: pxa: pxa95x is incompatible with earlier pxa
  ARM: pxa: CPU_FREQ_TABLE is needed for CPU_FREQ
  ARM: pxa: pxa95x/saarb depends on pxa3xx code
  ARM: pxa: allow selecting just one of TREO680/CENTRO
  ARM: pxa: export symbols from pxa3xx-ulpi
  ARM: pxa: make zylonite_pxa*_init declaration match code
  ARM: pxa/z2: fix building error of pxa27x_cpu_suspend() no longer available
  ARM: at91: add defconfig for at91sam9g45 family
  ARM: at91: remove dependency for Atmel PWM driver selector in Kconfig
  ...

13 years agoMerge branch 'imx/imx6q' into next/soc
Arnd Bergmann [Wed, 2 Nov 2011 01:46:55 +0000 (02:46 +0100)]
Merge branch 'imx/imx6q' into next/soc

Conflicts:
Documentation/devicetree/bindings/arm/fsl.txt
arch/arm/Kconfig
arch/arm/Kconfig.debug
arch/arm/plat-mxc/include/mach/common.h

13 years agoMerge branch 'picoxcell/soc' into next/soc
Arnd Bergmann [Wed, 2 Nov 2011 01:46:17 +0000 (02:46 +0100)]
Merge branch 'picoxcell/soc' into next/soc

13 years agoMerge branch 'highbank/soc' into next/soc
Arnd Bergmann [Wed, 2 Nov 2011 01:46:10 +0000 (02:46 +0100)]
Merge branch 'highbank/soc' into next/soc

Conflicts:
arch/arm/mach-mxs/include/mach/gpio.h
arch/arm/mach-omap2/board-generic.c
arch/arm/plat-mxc/include/mach/gpio.h

13 years agoMerge branch 'for-linus/i2c-3.2' of git://git.fluff.org/bjdooks/linux
Linus Torvalds [Tue, 1 Nov 2011 22:07:19 +0000 (15:07 -0700)]
Merge branch 'for-linus/i2c-3.2' of git://git.fluff.org/bjdooks/linux

* 'for-linus/i2c-3.2' of git://git.fluff.org/bjdooks/linux: (47 commits)
  i2c-s3c2410: Add device tree support
  i2c-s3c2410: Keep a copy of platform data and use it
  i2c-nomadik: cosmetic coding style corrections
  i2c-au1550: dev_pm_ops conversion
  i2c-au1550: increase timeout waiting for master done
  i2c-au1550: remove unused ack_timeout
  i2c-au1550: remove usage of volatile keyword
  i2c-tegra: __iomem annotation fix
  i2c-eg20t: Add initialize processing in case i2c-error occurs
  i2c-eg20t: Fix flag setting issue
  i2c-eg20t: add stop sequence in case wait-event timeout occurs
  i2c-eg20t: Separate error processing
  i2c-eg20t: Fix 10bit access issue
  i2c-eg20t: Modify returned value s32 to long
  i2c-eg20t: Fix bus-idle waiting issue
  i2c-designware: Fix PCI core warning on suspend/resume
  i2c-designware: Add runtime power management support
  i2c-designware: Add support for Designware core behind PCI devices.
  i2c-designware: Push all register reads/writes into the core code.
  i2c-designware: Support multiple cores using same ISR
  ...

13 years agoMerge branch 'for-linus' of git://opensource.wolfsonmicro.com/regulator
Linus Torvalds [Tue, 1 Nov 2011 22:06:20 +0000 (15:06 -0700)]
Merge branch 'for-linus' of git://opensource.wolfsonmicro.com/regulator

* 'for-linus' of git://opensource.wolfsonmicro.com/regulator: (22 commits)
  regulator: Constify constraints name
  regulator: Fix possible nullpointer dereference in regulator_enable()
  regulator: gpio-regulator add dependency on GENERIC_GPIO
  regulator: Add module.h include to gpio-regulator
  regulator: Add driver for gpio-controlled regulators
  regulator: remove duplicate REG_CTRL2 defines in tps65023
  regulator: Clarify documentation for regulator-regulator supplies
  regulator: Fix some bitrot in the machine driver documentation
  regulator: tps65023: Added support for the similiar TPS65020 chip
  regulator: tps65023: Setting correct core regulator for tps65021
  regulator: tps65023: Set missing bit for update core-voltage
  regulator: tps65023: Fixes i2c configuration issues
  regulator: Add debugfs file showing the supply map table
  regulator: tps6586x: add SMx slew rate setting
  regulator: tps65023: Fixes i2c configuration issues
  regulator: tps6507x: Remove num_voltages array
  regulator: max8952: removed unused mutex.
  regulator: fix regulator/consumer.h kernel-doc warning
  regulator: Ensure enough enable time for max8649
  regulator: 88pm8607: Fix off-by-one value range checking in the case of no id is matched
  ...

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394...
Linus Torvalds [Tue, 1 Nov 2011 22:02:38 +0000 (15:02 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: ohci: fix isochronous DMA synchronization
  firewire: ohci: work around selfID junk due to wrong gap count
  firewire: net: Use posted writes
  firewire: use clamp and min3 macros
  firewire: ohci: optimize TSB41BA3D detection
  firewire: ohci: TSB41BA3D support tweaks
  firewire: ohci: Add support for TSB41BA3D phy
  firewire: ohci: Move code from the bus reset tasklet into a workqueue
  firewire: sbp2: fold two functions into one
  firewire: sbp2: move some code to more sensible places
  firewire: sbp2: remove obsolete reference counting

13 years agoMerge branch 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Linus Torvalds [Tue, 1 Nov 2011 17:52:29 +0000 (10:52 -0700)]
Merge branch 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

* 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore: make pstore write function return normal success/fail value
  pstore: change mutex locking to spin_locks
  pstore: defer inserting OOPS entries into pstore

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland...
Linus Torvalds [Tue, 1 Nov 2011 17:51:38 +0000 (10:51 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
  mlx4_core: Deprecate log_num_vlan module param
  IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
  IB/mlx4: Enable 4K mtu for IBoE
  RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
  RDMA/cxgb4: Serialize calls to CQ's comp_handler
  RDMA/cxgb3: Serialize calls to CQ's comp_handler
  IB/qib: Fix issue with link states and QSFP cables
  IB/mlx4: Configure extended active speeds
  mlx4_core: Add extended port capabilities support
  IB/qib: Hold links until tuning data is available
  IB/qib: Clean up checkpatch issue
  IB/qib: Remove s_lock around header validation
  IB/qib: Precompute timeout jiffies to optimize latency
  IB/qib: Use RCU for qpn lookup
  IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
  IB/qib: Decode path MTU optimization
  IB/qib: Optimize RC/UC code by IB operation
  IPoIB: Use the right function to do DMA unmap pages
  RDMA/cxgb4: Use correct QID in insert_recv_cqe()
  RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
  ...

13 years agoMerge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc...
Roland Dreier [Tue, 1 Nov 2011 16:37:08 +0000 (09:37 -0700)]
Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next

13 years agoMerge git://github.com/herbertx/crypto
Linus Torvalds [Tue, 1 Nov 2011 16:24:41 +0000 (09:24 -0700)]
Merge git://github.com/herbertx/crypto

* git://github.com/herbertx/crypto: (48 commits)
  crypto: user - Depend on NET instead of selecting it
  crypto: user - Add dependency on NET
  crypto: talitos - handle descriptor not found in error path
  crypto: user - Initialise match in crypto_alg_match
  crypto: testmgr - add twofish tests
  crypto: testmgr - add blowfish test-vectors
  crypto: Make hifn_795x build depend on !ARCH_DMA_ADDR_T_64BIT
  crypto: twofish-x86_64-3way - fix ctr blocksize to 1
  crypto: blowfish-x86_64 - fix ctr blocksize to 1
  crypto: whirlpool - count rounds from 0
  crypto: Add userspace report for compress type algorithms
  crypto: Add userspace report for cipher type algorithms
  crypto: Add userspace report for rng type algorithms
  crypto: Add userspace report for pcompress type algorithms
  crypto: Add userspace report for nivaead type algorithms
  crypto: Add userspace report for aead type algorithms
  crypto: Add userspace report for givcipher type algorithms
  crypto: Add userspace report for ablkcipher type algorithms
  crypto: Add userspace report for blkcipher type algorithms
  crypto: Add userspace report for ahash type algorithms
  ...

13 years agosysfs: Make sysfs_rename safe with sysfs_dirents in rbtrees.
Eric W. Biederman [Tue, 1 Nov 2011 14:06:17 +0000 (07:06 -0700)]
sysfs: Make sysfs_rename safe with sysfs_dirents in rbtrees.

In sysfs_rename we need to remove the optimization of not calling
sysfs_unlink_sibling and sysfs_link_sibling if the renamed parent
directory is not changing.  This optimization is no longer valid now
that sysfs dirents are stored in an rbtree sorted by name.

Move the assignment of s_ns before the call of sysfs_link_sibling.  With
no sysfs_dirent fields changing after the call of sysfs_link_sibling
this allows sysfs_link_sibling to take any of the directory entries into
account when it builds the rbtrees, and s_ns looks like a prime canidate
to be used in the rbtree in the future.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Greg KH <gregkh@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge branch 'imx/devel' into next/dt
Arnd Bergmann [Tue, 1 Nov 2011 16:12:22 +0000 (17:12 +0100)]
Merge branch 'imx/devel' into next/dt

The board changes in the imx/devel branch conflict with other changes in
the device imx/dt branch.

Conflicts:
arch/arm/mach-mx5/board-mx53_loco.c
arch/arm/mach-mx5/board-mx53_smd.c
arch/arm/plat-mxc/include/mach/common.h
arch/arm/plat-mxc/include/mach/memory.h

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
13 years agoMerge Qualcom Hexagon architecture
Linus Torvalds [Tue, 1 Nov 2011 14:48:13 +0000 (07:48 -0700)]
Merge Qualcom Hexagon architecture

This is the fifth version of the patchset (with one tiny whitespace fix)
to the Linux kernel to support the Qualcomm Hexagon architecture.

Between now and the next pull requests, Richard Kuo should have his key
signed, etc., and should be back on kernel.org.  In the meantime, this
got merged as a emailed patch-series.

* Hexagon: (36 commits)
  Add extra arch overrides to asm-generic/checksum.h
  Hexagon: Add self to MAINTAINERS
  Hexagon: Add basic stacktrace functionality for Hexagon architecture.
  Hexagon: Add configuration and makefiles for the Hexagon architecture.
  Hexagon: Comet platform support
  Hexagon: kgdb support files
  Hexagon: Add page-fault support.
  Hexagon: Add page table header files & etc.
  Hexagon: Add ioremap support
  Hexagon: Provide DMA implementation
  Hexagon: Implement basic TLB management routines for Hexagon.
  Hexagon: Implement basic cache-flush support
  Hexagon: Provide basic implementation and/or stubs for I/O routines.
  Hexagon: Add user access functions
  Hexagon: Add locking types and functions
  Hexagon: Add SMP support
  Hexagon: Provide basic debugging and system trap support.
  Hexagon: Add ptrace support
  Hexagon: Add time and timer functions
  Hexagon: Add interrupts
  ...

13 years agoAdd extra arch overrides to asm-generic/checksum.h
Linas Vepstas [Mon, 31 Oct 2011 23:56:59 +0000 (18:56 -0500)]
Add extra arch overrides to asm-generic/checksum.h

There are plausible reasons for architectures to provide their own
versions of csum_partial_copy_nocheck and csum_tcpudp_magic.
By protecting these, the architecture can still re-use the
asm-generic checksum.h, instead of copying it.

Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add self to MAINTAINERS
Richard Kuo [Mon, 31 Oct 2011 23:56:38 +0000 (18:56 -0500)]
Hexagon: Add self to MAINTAINERS

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add basic stacktrace functionality for Hexagon architecture.
Richard Kuo [Mon, 31 Oct 2011 23:56:19 +0000 (18:56 -0500)]
Hexagon: Add basic stacktrace functionality for Hexagon architecture.

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add configuration and makefiles for the Hexagon architecture.
Richard Kuo [Mon, 31 Oct 2011 23:55:58 +0000 (18:55 -0500)]
Hexagon: Add configuration and makefiles for the Hexagon architecture.

Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Comet platform support
Richard Kuo [Mon, 31 Oct 2011 23:55:25 +0000 (18:55 -0500)]
Hexagon: Comet platform support

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: kgdb support files
Linas Vepstas [Mon, 31 Oct 2011 23:54:43 +0000 (18:54 -0500)]
Hexagon: kgdb support files

Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add page-fault support.
Richard Kuo [Mon, 31 Oct 2011 23:54:08 +0000 (18:54 -0500)]
Hexagon: Add page-fault support.

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add page table header files & etc.
Richard Kuo [Mon, 31 Oct 2011 23:53:38 +0000 (18:53 -0500)]
Hexagon: Add page table header files & etc.

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add ioremap support
Richard Kuo [Mon, 31 Oct 2011 23:52:53 +0000 (18:52 -0500)]
Hexagon: Add ioremap support

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Provide DMA implementation
Richard Kuo [Mon, 31 Oct 2011 23:52:22 +0000 (18:52 -0500)]
Hexagon: Provide DMA implementation

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Implement basic TLB management routines for Hexagon.
Richard Kuo [Mon, 31 Oct 2011 23:52:00 +0000 (18:52 -0500)]
Hexagon: Implement basic TLB management routines for Hexagon.

Mostly all stubs, as the TLB is managed by the hypervisor.

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Implement basic cache-flush support
Richard Kuo [Mon, 31 Oct 2011 23:50:51 +0000 (18:50 -0500)]
Hexagon: Implement basic cache-flush support

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Provide basic implementation and/or stubs for I/O routines.
Richard Kuo [Mon, 31 Oct 2011 23:48:50 +0000 (18:48 -0500)]
Hexagon: Provide basic implementation and/or stubs for I/O routines.

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add user access functions
Richard Kuo [Mon, 31 Oct 2011 23:48:07 +0000 (18:48 -0500)]
Hexagon: Add user access functions

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add locking types and functions
Richard Kuo [Mon, 31 Oct 2011 23:47:33 +0000 (18:47 -0500)]
Hexagon: Add locking types and functions

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add SMP support
Richard Kuo [Mon, 31 Oct 2011 23:46:34 +0000 (18:46 -0500)]
Hexagon: Add SMP support

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Provide basic debugging and system trap support.
Richard Kuo [Mon, 31 Oct 2011 23:44:34 +0000 (18:44 -0500)]
Hexagon: Provide basic debugging and system trap support.

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add ptrace support
Richard Kuo [Mon, 31 Oct 2011 23:43:44 +0000 (18:43 -0500)]
Hexagon: Add ptrace support

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add time and timer functions
Richard Kuo [Mon, 31 Oct 2011 23:43:24 +0000 (18:43 -0500)]
Hexagon: Add time and timer functions

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add interrupts
Richard Kuo [Mon, 31 Oct 2011 23:42:51 +0000 (18:42 -0500)]
Hexagon: Add interrupts

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add startup code
Richard Kuo [Mon, 31 Oct 2011 23:42:28 +0000 (18:42 -0500)]
Hexagon: Add startup code

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add init_task and process functions
Richard Kuo [Mon, 31 Oct 2011 23:41:49 +0000 (18:41 -0500)]
Hexagon: Add init_task and process functions

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add signal functions
Richard Kuo [Mon, 31 Oct 2011 23:41:21 +0000 (18:41 -0500)]
Hexagon: Add signal functions

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Support dynamic module loading.
Linas Vepstas [Mon, 31 Oct 2011 23:40:46 +0000 (18:40 -0500)]
Hexagon: Support dynamic module loading.

Modules should be compiled as ordinary .o's; shared objects are not
supported.

Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Export ksyms defined in assembly files.
Linas Vepstas [Mon, 31 Oct 2011 23:40:19 +0000 (18:40 -0500)]
Hexagon: Export ksyms defined in assembly files.

Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add hypervisor interface
Richard Kuo [Mon, 31 Oct 2011 23:39:14 +0000 (18:39 -0500)]
Hexagon: Add hypervisor interface

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add memcpy and memset accelerated functions
Richard Kuo [Mon, 31 Oct 2011 23:38:38 +0000 (18:38 -0500)]
Hexagon: Add memcpy and memset accelerated functions

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add checksum functions
Richard Kuo [Mon, 31 Oct 2011 23:38:04 +0000 (18:38 -0500)]
Hexagon: Add checksum functions

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add delay functions
Richard Kuo [Mon, 31 Oct 2011 23:37:20 +0000 (18:37 -0500)]
Hexagon: Add delay functions

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add threadinfo
Richard Kuo [Mon, 31 Oct 2011 23:36:46 +0000 (18:36 -0500)]
Hexagon: Add threadinfo

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: Add processor and system headers
Richard Kuo [Mon, 31 Oct 2011 23:36:04 +0000 (18:36 -0500)]
Hexagon: Add processor and system headers

Signed-off-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Linas Vepstas <linas@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>