Johannes Weiner [Mon, 24 May 2010 21:32:10 +0000 (14:32 -0700)]
mincore: break do_mincore() into logical pieces
Split out functions to handle hugetlb ranges, pte ranges and unmapped
ranges, to improve readability but also to prepare the file structure for
nested page table walks.
No semantic changes intended.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 24 May 2010 21:32:09 +0000 (14:32 -0700)]
mincore: cleanups
This fixes some minor issues that bugged me while going over the code:
o adjust argument order of do_mincore() to match the syscall
o simplify range length calculation
o drop superfluous shift in huge tlb calculation, address is page aligned
o drop dead nr_huge calculation
o check pte_none() before pte_present()
o comment and whitespace fixes
No semantic changes intended.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miao Xie [Mon, 24 May 2010 21:32:08 +0000 (14:32 -0700)]
cpuset,mm: fix no node to alloc memory when changing cpuset's mems
Before applying this patch, cpuset updates task->mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later. But in the way, the allocator may find that
there is no node to alloc memory.
The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:
(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom
This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.
[akpm@linux-foundation.org: fix spello] Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Paul Menage <menage@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin reported that the allocator may see an empty nodemask when
changing cpuset's mems[1]. It happens only on the kernel that do not do
atomic nodemask_t stores. (MAX_NUMNODES > BITS_PER_LONG)
But I found that there is also a problem on the kernel that can do atomic
nodemask_t stores. The problem is that the allocator can't find a node to
alloc page when changing cpuset's mems though there is a lot of free
memory. The reason is like this:
(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom
I can use the attached program reproduce it by the following step:
several hours later, oom will happen though there is a lot of free memory.
This patchset fixes this problem by expanding the nodes range first(set
newly allowed bits) and shrink it lazily(clear newly disallowed bits). So
we use a variable to tell the write-side task that read-side task is
reading nodemask, and the write-side task clears newly disallowed nodes
after read-side task ends the current memory allocation.
This patch:
In order to fix no node to alloc memory, when we want to update mempolicy
and mems_allowed, we expand the set of nodes first (set all the newly
nodes) and shrink the set of nodes lazily(clean disallowed nodes), But the
mempolicy's rebind functions may breaks the expanding.
So we restructure the mempolicy's rebind functions and split the rebind
work to two steps, just like the update of cpuset's mems: The 1st step:
expand the set of the mempolicy's nodes. The 2nd step: shrink the set of
the mempolicy's nodes. It is used when there is no real lock to protect
the mempolicy in the read-side. Otherwise we can do rebind work at once.
If the mempolicy needn't be updated by two steps, we can pass
MPOL_REBIND_ONCE to the rebind functions. Or we can pass
MPOL_REBIND_STEP1 to do the first step of the rebind work and pass
MPOL_REBIND_STEP2 to do the second step work.
Besides that, it maybe long time between these two step and we have to
release the lock that protects mempolicy and mems_allowed. If we hold the
lock once again, we must check whether the current mempolicy is under the
rebinding (the first step has been done) or not, because the task may
alloc a new mempolicy when we don't hold the lock. So we defined the
following flag to identify it:
#define MPOL_F_REBINDING (1 << 2)
The new functions will be used in the next patch.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Paul Menage <menage@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Mon, 24 May 2010 21:32:03 +0000 (14:32 -0700)]
mempolicy: lose unnecessary loop variable in mpol_parse_str()
We don't really need the extra variable 'i' in mpol_parse_str(). The only
use is as the the loop variable. Then, it's assigned to 'mode'. Just use
mode, and loose the 'uninitialized_var()' macro.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Mon, 24 May 2010 21:32:02 +0000 (14:32 -0700)]
mempolicy: don't call mpol_set_nodemask() when no_context
No need to call mpol_set_nodemask() when we have no context for the
mempolicy. This can occur when we're parsing a tmpfs 'mpol' mount option.
Just save the raw nodemask in the mempolicy's w.user_nodemask member for
use when a tmpfs/shmem file is created. mpol_shared_policy_init() will
"contextualize" the policy for the new file based on the creating task's
context.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Liu [Mon, 24 May 2010 21:32:01 +0000 (14:32 -0700)]
mempolicy: remove redundant check
Lee's patch "mempolicy: use MPOL_PREFERRED for system-wide default policy"
has made the MPOL_DEFAULT only used in the memory policy APIs. So, no
need to check in __mpol_equal also. Also get rid of mpol_match_intent()
and move its logic directly into __mpol_equal().
Signed-off-by: Bob Liu <lliubbo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Liu [Mon, 24 May 2010 21:32:00 +0000 (14:32 -0700)]
mempolicy: remove case MPOL_INTERLEAVE from policy_zonelist()
In policy_zonelist() mode MPOL_INTERLEAVE shouldn't happen, so fall
through to BUG() instead of break to return. I also fixed the comment.
Signed-off-by: Bob Liu <lliubbo@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Corrado Zoccolo [Mon, 24 May 2010 21:31:54 +0000 (14:31 -0700)]
page allocator: reduce fragmentation in buddy allocator by adding buddies that are merging to the tail of the free lists
In order to reduce fragmentation, this patch classifies freed pages in two
groups according to their probability of being part of a high order merge.
Pages belonging to a compound whose next-highest buddy is free are more
likely to be part of a high order merge in the near future, so they will
be added at the tail of the freelist. The remaining pages are put at the
front of the freelist.
In this way, the pages that are more likely to cause a big merge are kept
free longer. Consequently there is a tendency to aggregate the
long-living allocations on a subset of the compounds, reducing the
fragmentation.
This heuristic was tested on three machines, x86, x86-64 and ppc64 with
3GB of RAM in each machine. The tests were kernbench, netperf, sysbench
and STREAM for performance and a high-order stress test for huge page
allocations.
KernBench X86
Elapsed mean 374.77 ( 0.00%) 375.10 (-0.09%)
User mean 649.53 ( 0.00%) 650.44 (-0.14%)
System mean 54.75 ( 0.00%) 54.18 ( 1.05%)
CPU mean 187.75 ( 0.00%) 187.25 ( 0.27%)
KernBench X86-64
Elapsed mean 94.45 ( 0.00%) 94.01 ( 0.47%)
User mean 323.27 ( 0.00%) 322.66 ( 0.19%)
System mean 36.71 ( 0.00%) 36.50 ( 0.57%)
CPU mean 380.75 ( 0.00%) 381.75 (-0.26%)
KernBench PPC64
Elapsed mean 173.45 ( 0.00%) 173.74 (-0.17%)
User mean 587.99 ( 0.00%) 587.95 ( 0.01%)
System mean 60.60 ( 0.00%) 60.57 ( 0.05%)
CPU mean 373.50 ( 0.00%) 372.75 ( 0.20%)
The tests are run to have confidence limits within 1%. Results marked
with a * were not confident although in this case, it's only outside by
small amounts. Even with some results that were not confident, the
netperf UDP results were generally positive.
There was a distinct lack of confidence in the X86* figures so I included
what the devation was where the results were not confident. Many of the
results, whether gains or losses were within the standard deviation so no
solid conclusion can be reached on performance impact. Looking at the
figures, only the X86-64 ones look suspicious with a few losses that were
outside the noise. However, the results were so unstable that without
knowing why they vary so much, a solid conclusion cannot be reached.
Again, there is little to conclude here. While there are a few losses,
the results vary by +/- 8% in some cases. They are the results of most
concern as there are some large losses but it's also within the variance
typically seen between kernel releases.
The STREAM results varied so little and are so verbose that I didn't
include them here.
The final test stressed how many huge pages can be allocated. The
absolute number of huge pages allocated are the same with or without the
page. However, the "unusability free space index" which is a measure of
external fragmentation was slightly lower (lower is better) throughout the
lifetime of the system. I also measured the latency of how long it took
to successfully allocate a huge page. The latency was slightly lower and
on X86 and PPC64, more huge pages were allocated almost immediately from
the free lists. The improvement is slight but there.
[mel@csn.ul.ie: Tested, reworked for less branches]
[czoccolo@gmail.com: fix oops by checking pfn_valid_within()] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Mon, 24 May 2010 21:31:48 +0000 (14:31 -0700)]
tmpfs: insert tmpfs cache pages to inactive list at first
Shaohua Li reported parallel file copy on tmpfs can lead to OOM killer.
This is regression of caused by commit 9ff473b9a7 ("vmscan: evict
streaming IO first"). Wow, It is 2 years old patch!
Currently, tmpfs file cache is inserted active list at first. This means
that the insertion doesn't only increase numbers of pages in anon LRU, but
it also reduces anon scanning ratio. Therefore, vmscan will get totally
confused. It scans almost only file LRU even though the system has plenty
unused tmpfs pages.
Historically, lru_cache_add_active_anon() was used for two reasons.
1) Intend to priotize shmem page rather than regular file cache.
2) Intend to avoid reclaim priority inversion of used once pages.
But we've lost both motivation because (1) Now we have separate anon and
file LRU list. then, to insert active list doesn't help such priotize.
(2) In past, one pte access bit will cause page activation. then to
insert inactive list with pte access bit mean higher priority than to
insert active list. Its priority inversion may lead to uninteded lru
chun. but it was already solved by commit 645747462 (vmscan: detect
mapped file pages used only once). (Thanks Hannes, you are great!)
Thus, now we can use lru_cache_add_anon() instead.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: Shaohua Li <shaohua.li@intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Mon, 24 May 2010 21:31:45 +0000 (14:31 -0700)]
xtensa: set ARCH_KMALLOC_MINALIGN
Architectures that handle DMA-non-coherent memory need to set
ARCH_KMALLOC_MINALIGN to make sure that kmalloc'ed buffer is DMA-safe: the
buffer doesn't share a cache with the others.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
cmd640: fix kernel oops in test_irq() method
pdc202xx_old: ignore "FIFO empty" bit in test_irq() method
pdc202xx_old: wire test_irq() method for PDC2026x
IDE: pass IRQ flags to the IDE core
ide: fix comment typo in ide.h
Linus Torvalds [Mon, 24 May 2010 15:02:58 +0000 (08:02 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin: (30 commits)
Blackfin: SMP: fix continuation lines
Blackfin: acvilon: fix timeout usage for I2C
Blackfin: fix typo in BF537 IRQ comment
Blackfin: unify duplicate MEM_MT48LC32M8A2_75 kconfig options
Blackfin: set ARCH_KMALLOC_MINALIGN
Blackfin: use atomic kmalloc in L1 alloc so it too can be atomic
Blackfin: another year of changes (update copyright in boot log)
Blackfin: optimize strncpy a bit
Blackfin: isram: clean up ITEST_COMMAND macro and improve the selftests
Blackfin: move string functions to normal lib/ assembly
Blackfin: SIC: cut down on IAR MMR reads a bit
Blackfin: bf537-minotaur: fix build errors due to header changes
Blackfin: kgdb: pass up the CC register instead of a 0 stub
Blackfin: handle HW errors in the new "FAULT" printing code
Blackfin: show the whole accumulator in the pseudo DBG insn
Blackfin: support all possible registers in the pseudo instructions
Blackfin: add support for the DBG (debug output) pseudo insn
Blackfin: change the BUG opcode to an unused 16-bit opcode
Blackfin: allow NMI watchdog to be used w/RETN as a scratch reg
Blackfin: add support for the DBGA (debug assert) pseudo insn
...
Linus Torvalds [Mon, 24 May 2010 15:01:10 +0000 (08:01 -0700)]
Merge branch 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
uml: Pushdown the bkl from harddog_kern ioctl
sunrpc: Pushdown the bkl from sunrpc cache ioctl
sunrpc: Pushdown the bkl from ioctl
autofs4: Pushdown the bkl from ioctl
uml: Convert to unlocked_ioctls to remove implicit BKL
ncpfs: BKL ioctl pushdown
coda: Clean-up whitespace problems in pioctl.c
coda: BKL ioctl pushdown
drivers: Push down BKL into various drivers
isdn: Push down BKL into ioctl functions
scsi: Push down BKL into ioctl functions
dvb: Push down BKL into ioctl functions
smbfs: Push down BKL into ioctl function
coda/psdev: Remove BKL from ioctl function
um/mmapper: Remove BKL usage
sn_hwperf: Kill BKL usage
hfsplus: Push down BKL into ioctl function
Linus Torvalds [Mon, 24 May 2010 15:00:13 +0000 (08:00 -0700)]
Merge git://git.infradead.org/battery-2.6
* git://git.infradead.org/battery-2.6:
ds2760_battery: Document ABI change
ds2760_battery: Make charge_now and charge_full writeable
power_supply: Add support for writeable properties
power_supply: Use attribute groups
power_supply: Add test_power driver
tosa_battery: Fix build error due to direct driver_data usage
wm97xx_battery: Quieten sparse warning (bat_set_pdata not declared)
ds2782_battery: Get rid of magic numbers in driver_data
ds2782_battery: Add support for ds2786 battery gas gauge
pda_power: Add function callbacks for suspend and resume
wm831x_power: Use genirq
Driver for Zipit Z2 battery chip
ds2782_battery: Fix clientdata on removal
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (25 commits)
sh: fix up sh7785lcr_32bit_defconfig.
arch/sh/lib/strlen.S: Checkpatch cleanup
sh: fix up sh7786 dmaengine build.
sh: guard cookie consistency across termination in the DMA driver
sh: prevent the DMA driver from unloading, while in use
sh: fix Oops in the serial SCI driver
sh: allow platforms to specify SD-card supported voltages
mmc: let MFD's provide supported Vdd card voltages to tmio_mmc
sh: disable SD-card write-protection detection on kfr2r09
mfd: pass platform flags down to the tmio_mmc driver
tmio: add a platform flag to disable card write-protection detection
sh: Add SDHI DMA support to migor
sh: Add SDHI DMA support to kfr2r09
sh: Add SDHI DMA support to ms7724se
sh: Add SDHI DMA support to ecovec
mmc: add DMA support to tmio_mmc driver, when used on SuperH
sh: prepare the SDHI MFD driver to pass DMA configuration to tmio_mmc.c
mmc: prepare tmio_mmc for passing of DMA configuration from the MFD cell
sh: add DMA slave definitions to sh7724
sh: add DMA slaves for two SDHI controllers to sh7722
...
Linus Torvalds [Mon, 24 May 2010 14:57:41 +0000 (07:57 -0700)]
Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: confusion between kmap() and kmap_atomic() api
exofs: Add default address_space_operations
Linus Torvalds [Mon, 24 May 2010 14:45:43 +0000 (07:45 -0700)]
Revert "ath9k: Group Key fix for VAPs"
This reverts commit 03ceedea972a82d343fa5c2528b3952fa9e615d5, since it
breaks resume from suspend-to-ram on Rafael's Acer Ferrari One.
NetworkManager thinks everything is ok, but it can't connect to the AP
to get an IP address after the resume.
In fact, it even breaks resume for non-ath9k chipsets: reverting it also
fixes Rafael's Toshiba Protege R500 with the iwlagn driver. As Johannes
says:
"Indeed, this patch needs to be reverted. That mac80211 change is wrong
and completely unnecessary."
Reported-and-requested-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Johannes Berg <johannes@sipsolutions.net> Cc: Daniel Yingqiang Ma <yma.cool@gmail.com> Cc: John W. Linville <linville@tuxdriver.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
fat: convert to unlocked_ioctl
fat: Cleanup nls_unload() usage
fat: use pack_hex_byte() instead of custom one
Linus Torvalds [Mon, 24 May 2010 14:37:52 +0000 (07:37 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (59 commits)
ceph: reuse mon subscribe message instead of allocated anew
ceph: avoid resending queued message to monitor
ceph: Storage class should be before const qualifier
ceph: all allocation functions should get gfp_mask
ceph: specify max_bytes on readdir replies
ceph: cleanup pool op strings
ceph: Use kzalloc
ceph: use common helper for aborted dir request invalidation
ceph: cope with out of order (unsafe after safe) mds reply
ceph: save peer feature bits in connection structure
ceph: resync headers with userland
ceph: use ceph. prefix for virtual xattrs
ceph: throw out dirty caps metadata, data on session teardown
ceph: attempt mds reconnect if mds closes our session
ceph: clean up send_mds_reconnect interface
ceph: wait for mds OPEN reply to indicate reconnect success
ceph: only send cap releases when mds is OPEN|HUNG
ceph: dicard cap releases on mds restart
ceph: make mon client statfs handling more generic
ceph: drop src address(es) from message header [new protocol feature]
...
Linus Torvalds [Mon, 24 May 2010 14:37:38 +0000 (07:37 -0700)]
Merge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6
* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6:
of: change of_match_device to work with struct device
of: Remove duplicate fields from of_platform_driver
drivercore: Add of_match_table to the common device drivers
arch/microblaze: Move dma_mask from of_device into pdev_archdata
arch/powerpc: Move dma_mask from of_device into pdev_archdata
of: eliminate of_device->node and dev_archdata->{of,prom}_node
of: Always use 'struct device.of_node' to get device node pointer.
i2c/of: Allow device node to be passed via i2c_board_info
driver-core: Add device node pointer to struct device
of: protect contents of of_platform.h and of_device.h
of/flattree: Make unflatten_device_tree() safe to call from any arch
of/flattree: make of_fdt.h safe to unconditionally include.
Linus Torvalds [Mon, 24 May 2010 14:33:43 +0000 (07:33 -0700)]
Merge branch 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: Use alloc_pages_exact_node() for page allocation
slub: __kmalloc_node_track_caller should trace kmalloc_large_node case
slub: Potential stack overflow
crypto: Use ARCH_KMALLOC_MINALIGN for CRYPTO_MINALIGN now that it's exposed
mm: Move ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN to <linux/slub_def.h>
mm: Move ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN to <linux/slob_def.h>
mm: Move ARCH_SLAB_MINALIGN and ARCH_KMALLOC_MINALIGN to <linux/slab_def.h>
slab: Fix missing DEBUG_SLAB last user
slab: add memory hotplug support
slab: Fix continuation lines
Ben Hutchings [Mon, 24 May 2010 00:02:30 +0000 (17:02 -0700)]
fusion: fix kernel-doc notation
The function name must be followed by a space, hypen, space, and a
short description.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Eric Moore <Eric.Moore@lsi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Documentation/vm: use better value for MAP_HUGETLB
documentation: slightly more correct value for MAP_HUGETLB in map_hugetlb.c
still not correct for alpha, mips, parisc or xtensa but working out of
the box in the most common architectures without having to deal with
complicated macros or including architecture specific headers.
Signed-off-by: Carlo Marcelo Arenas Belon <carenas@sajinet.com.pe> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Chua [Sun, 23 May 2010 23:16:24 +0000 (07:16 +0800)]
timers: Fix slack calculation for expired timers
commit 3bbb9ec946 (timers: Introduce the concept of timer slack for
legacy timers) does not take the case into account when the timer is
already expired. This broke wireless drivers.
The solution is not to apply slack to already expired timers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@linux.intel.com>
Thomas Gleixner [Sun, 23 May 2010 06:14:45 +0000 (08:14 +0200)]
timekeeping: Fix timezone update
commit 64ce4c2f (time: Clean up warp_clock()) breaks the timezone
update in a very subtle way. To avoid the direct access to timekeeping
internals it adds the timezone delta to the current time with
timespec_add_safe(). This works nicely when the timezone delta is > 0.
If timezone delta is < 0 then the wrap check in timespec_add_safe()
triggers and timespec_add_safe() returns TIME_MAX and screws up
timekeeping completely.
The comment above timespec_add_safe() says:
It's assumed that both values are valid (>= 0)
Add the timezone seconds adjustment directly.
Reported-by: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
FUJITA Tomonori [Thu, 20 May 2010 03:21:38 +0000 (23:21 -0400)]
Blackfin: set ARCH_KMALLOC_MINALIGN
Architectures that handle DMA-non-coherent memory need to set
ARCH_KMALLOC_MINALIGN to make sure that kmalloc'ed buffer is DMA-safe:
the buffer doesn't share a cache with the others.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Robin Getz [Tue, 4 May 2010 14:59:21 +0000 (14:59 +0000)]
Blackfin: optimize strncpy a bit
Add a little strncpy optimization which can easily cut boot time by 20%.
When the kernel is booting with initramfs, it builds up the filesystem
from a cpio archive by calling strncpy_from_user() via fs/namei.c's
do_getname() on every file in the archive (which can be lots) with a
length of PATH_MAX (1024). This causes the dest of the strncpy to be
padded with many NUL bytes.
This optimization mostly causes these NUL bytes to be padded with a call
to memset() which is already optimized for filling memory quickly, but
the hardware loop helps a little bit as well.
Boot time measured with 'loglevel=0' so UART speed doesn't get in the way.
Signed-off-by: Robin Getz <robin.getz@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Robin Getz [Mon, 3 May 2010 17:23:20 +0000 (17:23 +0000)]
Blackfin: move string functions to normal lib/ assembly
Since 'extern inline' doesn't work correctly in the context of the Linux
kernel (too many overriding defines), move the string functions to normal
lib/ assembly files (like the existing mem funcs). This avoids the forced
inline all over the kernel and allows us to place them constantly in L1.
This also avoids some module failures when gcc inserts calls to string
functions but the kernel build system doesn't fully consult the library
archives.
Signed-off-by: Robin Getz <robin.getz@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Mike Frysinger [Thu, 22 Apr 2010 21:15:00 +0000 (21:15 +0000)]
Blackfin: SIC: cut down on IAR MMR reads a bit
Tweak the for loops that operate on the SIC IAR system MMRs to avoid
re-reading them multiple times in a row. System MMRs are a little
slower to access, so avoid the penalty when possible.
Mike Frysinger [Mon, 12 Apr 2010 05:53:35 +0000 (05:53 +0000)]
Blackfin: kgdb: pass up the CC register instead of a 0 stub
While the CC pseudo register can be deduced from the ASTAT register, make
sure we set its value correctly instead of always stubbing it out as 0.
GDB itself looks at this pseudo register instead of ASTAT, so we have to
supply the right value.
Graf Yang [Wed, 17 Mar 2010 09:00:32 +0000 (09:00 +0000)]
Blackfin: allow NMI watchdog to be used w/RETN as a scratch reg
NMIs are not safe to return from because many anomaly workarounds are
implemented by disabling interrupts. The NMI obviously violates this
assumption. Since the NMI watchdog never returns, we don't have to
worry about it clobbering RETN when it is being used as a scratch register
with the exception stack.
Signed-off-by: Graf Yang <graf.yang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Robin Getz [Tue, 16 Mar 2010 14:40:17 +0000 (14:40 +0000)]
Blackfin: add support for the DBGA (debug assert) pseudo insn
A few pseudo debug insns exist to make testing of simulators easier.
Since these don't actually exist in the hardware, we have to have the
exception handler take care of emulating these. This allows sim test
cases to be executed unmodified under Linux and thus simplify debugging
greatly.
Signed-off-by: Robin Getz <robin.getz@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
For lookup if we get ENOENT error from the server we still
instantiate the dentry. We need to make sure we have dentry
operations set in that case so that a later dput on the dentry
does the expected. Without the patch we get the below error
#ln -sf abc abclink
ln: creating symbolic link `abclink': No such file or directory
Now on the host do
$ touch abclink
Guest now gives ENOENT error.
# ls
ls: cannot access abclink: No such file or directory
Debugged-by:Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
sh: guard cookie consistency across termination in the DMA driver
If all descriptors on a channel are terminated or the channel is released,
update the completed cookie counter to match the last cookie. This prevents
inconsistency warning on resumed DMA operation.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
tmio: add a platform flag to disable card write-protection detection
Write-protection status is not always available, e.g., micro-SD cards do not
have a write-protection switch at all. This patch adds a flag to let platforms
force tmio_mmc to consider the card writable.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Ian Molton <ian@mnementh.co.uk> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Minchan Kim [Wed, 14 Apr 2010 14:58:36 +0000 (23:58 +0900)]
slub: Use alloc_pages_exact_node() for page allocation
The alloc_slab_page() in SLUB uses alloc_pages() if node is '-1'. This means
that node validity check in alloc_pages_node is unnecessary and we can use
alloc_pages_exact_node() to avoid comparison and branch as commit 6484eb3e2a81807722 ("page allocator: do not check NUMA node ID when the caller
knows the node is valid") did for the page allocator.
Cc: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
slub: __kmalloc_node_track_caller should trace kmalloc_large_node case
commit 94b528d (kmemtrace: SLUB hooks for caller-tracking functions)
missed tracing kmalloc_large_node in __kmalloc_node_track_caller. We
should trace it same as __kmalloc_node.
Acked-by: David Rientjes <rientjes@google.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Eric Dumazet [Wed, 24 Mar 2010 21:25:47 +0000 (22:25 +0100)]
slub: Potential stack overflow
I discovered that we can overflow stack if CONFIG_SLUB_DEBUG=y and use slabs
with many objects, since list_slab_objects() and process_slab() use
DECLARE_BITMAP(map, page->objects).
With 65535 bits, we use 8192 bytes of stack ...
Switch these allocations to dynamic allocations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
mmc: add DMA support to tmio_mmc driver, when used on SuperH
SDHI controllers on SuperH, served by the tmio_mmc driver, can use slave DMA
for data transfer. This patch adds support for the dmaengine API to the
tmio_mmc driver.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Ian Molton <ian@mnementh.co.uk> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
mmc: prepare tmio_mmc for passing of DMA configuration from the MFD cell
After this patch, if the "dma" pointer in struct tmio_mmc_data is not NULL, it
points to a struct, containing two tokens, that have to be passed to the
dmaengine driver for channel configuration.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
sh: define DMA slaves per CPU type, remove now redundant header
Now that DMA slave IDs are only used used in platform specific code and have
become opaque cookies for the rest of the code, we can make the, CPU specific
too.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Paul Mundt <lethal@linux-sh.org>