Linus Torvalds [Thu, 6 Nov 2008 23:53:47 +0000 (15:53 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Block: use round_jiffies_up()
Add round_jiffies_up and related routines
block: fix __blkdev_get() for removable devices
generic-ipi: fix the smp_mb() placement
blk: move blk_delete_timer call in end_that_request_last
block: add timer on blkdev_dequeue_request() not elv_next_request()
bio: define __BIOVEC_PHYS_MERGEABLE
block: remove unused ll_new_mergeable()
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] SAM9 watchdog - supported on all SAM9 and CAP9 processors
[WATCHDOG] SAM9 watchdog - update for moved headers
Linus Torvalds [Thu, 6 Nov 2008 23:50:11 +0000 (15:50 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: linear: Fix a division by zero bug for very small arrays.
md: fix bug in raid10 recovery.
md: revert the recent addition of a call to the BLKRRPART ioctl.
Linus Torvalds [Thu, 6 Nov 2008 23:45:57 +0000 (15:45 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
net/9p: fix printk format warnings
unsigned fid->fid cannot be negative
9p: rdma: remove duplicated #include
p9: Fix leak of waitqueue in request allocation path
9p: Remove unneeded free of fcall for Flush
9p: Make all client spin locks IRQ safe
9p: rdma: Set trans prior to requesting async connection ops
David Howells [Wed, 5 Nov 2008 17:38:47 +0000 (17:38 +0000)]
Fix accidental implicit cast in HR-timer conversion
Fix the hrtimer_add_expires_ns() function. It should take a 'u64 ns' argument,
but rather takes an 'unsigned long ns' argument - which might only be 32-bits.
On FRV, this results in the kernel locking up because hrtimer_forward() passes
the result of a 64-bit multiplication to this function, for which the compiler
discards the top 32-bits - something that didn't happen when ktime_add_ns() was
called directly.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OGAWA Hirofumi [Thu, 6 Nov 2008 20:53:55 +0000 (12:53 -0800)]
fat: Fix ATTR_RO for directory
FAT has the ATTR_RO (read-only) attribute. But on Windows, the ATTR_RO
of the directory will be just ignored actually, and is used by only
applications as flag. E.g. it's setted for the customized folder by
Explorer.
This adds "rodir" option. If user specified it, ATTR_RO is used as
read-only flag even if it's the directory. Otherwise, inode->i_mode
is not used to hold ATTR_RO (i.e. fat_mode_can_save_ro() returns 0).
OGAWA Hirofumi [Thu, 6 Nov 2008 20:53:54 +0000 (12:53 -0800)]
fat: Fix ATTR_RO in the case of (~umask & S_WUGO) == 0
If inode->i_mode doesn't have S_WUGO, current code assumes it means
ATTR_RO. However, if (~[ufd]mask & S_WUGO) == 0, inode->i_mode can't
hold S_WUGO. Therefore the updated directory entry will always have
ATTR_RO.
This adds fat_mode_can_hold_ro() to check it. And if inode->i_mode
can't hold, uses -i_attrs to hold ATTR_RO instead.
With this, we don't set ATTR_RO unless users change it via ioctl() if
(~[ufd]mask & S_WUGO) == 0.
And on FAT_IOCTL_GET_ATTRIBUTES path, this adds ->i_mutex to it for
not returning the partially updated attributes by FAT_IOCTL_SET_ATTRIBUTES
to userland.
OGAWA Hirofumi [Thu, 6 Nov 2008 20:53:54 +0000 (12:53 -0800)]
fat: Cleanup FAT attribute stuff
This adds three helpers:
fat_make_attrs() - makes FAT attributes from inode.
fat_make_mode() - makes mode_t from FAT attributes.
fat_save_attrs() - saves FAT attributes to inode.
Then this replaces: MSDOS_MKMODE() by fat_make_mode(), fat_attr() by
fat_make_attrs(), ->i_attrs = attr & ATTR_UNUSED by fat_save_attrs().
And for root inode, those is used with ATTR_DIR instead of bogus
ATTR_NONE.
OGAWA Hirofumi [Thu, 6 Nov 2008 20:53:52 +0000 (12:53 -0800)]
fat: Kill d_invalidate() in vfat_lookup()
d_invalidate() for positive dentry doesn't work in some cases
(vfsmount, nfsd, and maybe others). shrink_dcache_parent() by
d_invalidate() is pointless for vfat usage at all.
So, this kills it, and intead of it uses d_move().
To save old behavior, this returns alias simply for directory (don't
change pwd, etc..). the directory lookup shouldn't be important for
performance.
OGAWA Hirofumi [Thu, 6 Nov 2008 20:53:51 +0000 (12:53 -0800)]
fat: Fix/Cleanup dcache handling for vfat
- Add comments for handling dcache of vfat.
- Separate case-sensitive case and case-insensitive to
vfat_revalidate() and vfat_ci_revalidate().
vfat_revalidate() doesn't need to drop case-insensitive negative
dentry on creation path.
- Current code is missing to set ->d_revalidate to the negative dentry
created by unlink/etc..
This sets ->d_revalidate always, and returns 1 for positive
dentry. Now, we don't need to change ->d_op dynamically anymore,
so this just uses sb->s_root->d_op to set ->d_op.
- d_find_alias() may return DCACHE_DISCONNECTED dentry. It's not
the interesting dentry there. This checks it.
- Add missing LOOKUP_PARENT check. We don't need to drop the valid
negative dentry for (LOOKUP_CREATE | LOOKUP_PARENT) lookup.
- For consistent filename on creation path, this drops negative dentry
if we can't see intent.
OGAWA Hirofumi [Thu, 6 Nov 2008 20:53:47 +0000 (12:53 -0800)]
fat: use generic_file_llseek() for directory
Since fat_dir_ioctl() was already fixed (i.e. called under ->i_mutex),
and __fat_readdir() doesn't take BKL anymore. So, BKL for ->llseek()
is pointless, and we have to use generic_file_llseek().
OGAWA Hirofumi [Thu, 6 Nov 2008 20:53:47 +0000 (12:53 -0800)]
fat: Fix and cleanup timestamp conversion
This cleans date_dos2unix()/fat_date_unix2dos() up. New code should be
much more readable.
And this fixes those old functions. Those doesn't handle 2100
correctly. 2100 isn't leap year, but old one handles it as leap year.
Also, with this, centi sec is handled and is fixed.
Andrew Victor [Thu, 6 Nov 2008 20:53:42 +0000 (12:53 -0800)]
SAM9 watchdog: update for moved headers
The architecture header files were recently moved from
include/asm-arm/mach-at91/ to arch/arm/mach-at91/include/mach/. The SAM9
watchdog driver still includes a header from the old location.
Signed-off-by: Andrew Victor <linux@maxim.org.za> Cc: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Frans Pop [Thu, 6 Nov 2008 20:53:41 +0000 (12:53 -0800)]
rtc-cmos: fix boot log message
-rtc0: alarms up to one month, y3k, 114 bytes nvram, , hpet irqs irqs
+rtc0: alarms up to one month, y3k, 114 bytes nvram, hpet irqs
Signed-off-by: Frans Pop <elendil@planet.nl> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Thu, 6 Nov 2008 20:53:40 +0000 (12:53 -0800)]
atmel_serial: keep clock off when it's not needed
The atmel_serial driver is mismanaging its clock by leaving it on at all
times ... the whole point of clock management is to leave it off unless
it's actively needed, which conserves power!!
Although the kernel doesn't actually hang without my fix, it does
discard quite a lot of early console output.
The result still looks correct:
usart users= 1 on 35000000 Hz, for atmel_usart.0
usart users= 0 off 35000000 Hz, for atmel_usart.2
when using ttyS0 as serial console.
[haavard.skinnemoen@atmel.com: Make sure clock is enabled early for console] Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 3e680aae4e53ab54cdbb0c29257dae0cbb158e1c ("fb: convert
lock/unlock_kernel() into local fb mutex") introduced several deadlocks
in the fb_compat_ioctl() path, as mutex_lock() doesn't allow recursion,
unlike lock_kernel(). This broke frame buffer applications on 64-bit
systems with a 32-bit userland.
This patch fixes the remaining deadlocks:
- Revert commit 120a37470c2831fea49fdebaceb5a7039f700ce6,
- Extract the core logic of fb_ioctl() into a new function do_fb_ioctl(),
- Change all callsites of fb_ioctl() where info->lock is already held to
call do_fb_ioctl() instead,
- Add sparse annotations to all routines that take info->lock.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gerald Schaefer [Thu, 6 Nov 2008 20:53:36 +0000 (12:53 -0800)]
memory hotplug: fix page_zone() calculation in test_pages_isolated()
My last bugfix here (adding zone->lock) introduced a new problem: Using
page_zone(pfn_to_page(pfn)) to get the zone after the for() loop is wrong.
pfn will then be >= end_pfn, which may be in a different zone or not
present at all. This may lead to an addressing exception in page_zone()
or spin_lock_irqsave().
Now I use __first_valid_page() again after the loop to find a valid page
for page_zone().
Arthur Jones [Thu, 6 Nov 2008 20:53:35 +0000 (12:53 -0800)]
ext3: wait on all pending commits in ext3_sync_fs
In ext3_sync_fs, we only wait for a commit to finish if we started it, but
there may be one already in progress which will not be synced.
In the case of a data=ordered umount with pending long symlinks which are
delayed due to a long list of other I/O on the backing block device, this
causes the buffer associated with the long symlinks to not be moved to the
inode dirty list in the second phase of fsync_super. Then, before they
can be dirtied again, kjournald exits, seeing the UMOUNT flag and the
dirty pages are never written to the backing block device, causing long
symlink corruption and exposing new or previously freed block data to
userspace.
This can be reproduced with a script created
by Eric Sandeen <sandeen@redhat.com>:
#!/bin/bash
umount /mnt/test2
mount /dev/sdb4 /mnt/test2
rm -f /mnt/test2/*
dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512
touch
/mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
ln -s
/mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename
/mnt/test2/link
umount /mnt/test2
mount /dev/sdb4 /mnt/test2
ls /mnt/test2/
umount /mnt/test2
To ensure all commits are synced, we flush all journal commits now when
sync_fs'ing ext3.
Signed-off-by: Arthur Jones <ajones@riverbed.com> Cc: Eric Sandeen <sandeen@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> [2.6.everything] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: dann frazier <dannf@hp.com> Acked-by: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 6 Nov 2008 20:53:32 +0000 (12:53 -0800)]
cgroups: fix invalid cgrp->dentry before cgroup has been completely removed
This fixes an oops when reading /proc/sched_debug.
A cgroup won't be removed completely until finishing cgroup_diput(), so we
shouldn't invalidate cgrp->dentry in cgroup_rmdir(). Otherwise, when a
group is being removed while cgroup_path() gets called, we may trigger
NULL dereference BUG.
The bug can be reproduced:
# cat test.sh
#!/bin/sh
mount -t cgroup -o cpu xxx /mnt
for (( ; ; ))
{
mkdir /mnt/sub
rmdir /mnt/sub
}
# ./test.sh &
# cat /proc/sched_debug
It really does not matter when we flush the lru. The system is free to
add pages onto the lru even during migration which will make the page
migration either skip the page (mbind, migrate_pages) or return a busy
state (move_pages).
Fixes this lockdep warning (and potential deadlock):
Some VM place has
mmap_sem -> kevent_wq via lru_add_drain_all()
net/core/dev.c::dev_ioctl() has
rtnl_lock -> mmap_sem (*) the ioctl has copy_from_user() and it can do page fault.
linkwatch_event has
kevent_wq -> rtnl_lock
Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Hugh Dickins <hugh@veritas.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fbdev: add new framebuffer driver for Fujitsu MB862xx GDCs
Add a framebuffer driver for the Fujitsu Carmine/Coral-P(A)/Lime graphics
controllers. Lime GDC support is known to work on PPC440EPx based lwmon5
and MPC8544E based socrates embedded boards, both equipped with Lime GDC.
Carmine/Coral-P PCI GDC support is known to work on PPC440EPx based
Sequoia board and also on x86 platform.
Signed-off-by: Anatolij Gustschin <agust@denx.de> Cc: Dmitry Baryshkov <dbaryshkov@gmail.com> Cc: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: Matteo Fortini <m.fortini@selcomgroup.com> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Thu, 6 Nov 2008 20:53:29 +0000 (12:53 -0800)]
oom: do not dump task state for non thread group leaders
When /proc/sys/vm/oom_dump_tasks is enabled, it's only necessary to dump
task state information for thread group leaders. The kernel log gets
quickly overwhelmed on machines with a massive number of threads by
dumping non-thread group leaders.
Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Thu, 6 Nov 2008 20:53:27 +0000 (12:53 -0800)]
hugetlb: pull gigantic page initialisation out of the default path
As we can determine exactly when a gigantic page is in use we can optimise
the common regular page cases by pulling out gigantic page initialisation
into its own function. As gigantic pages are never released to buddy we
do not need a destructor. This effectivly reverts the previous change to
the main buddy allocator. It also adds a paranoid check to ensure we
never release gigantic pages from hugetlbfs to the main buddy.
Signed-off-by: Andy Whitcroft <apw@shadowen.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: <stable@kernel.org> [2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Whitcroft [Thu, 6 Nov 2008 20:53:26 +0000 (12:53 -0800)]
hugetlbfs: handle pages higher order than MAX_ORDER
When working with hugepages, hugetlbfs assumes that those hugepages are
smaller than MAX_ORDER. Specifically it assumes that the mem_map is
contigious and uses that to optimise access to the elements of the mem_map
that represent the hugepage. Gigantic pages (such as 16GB pages on
powerpc) by definition are of greater order than MAX_ORDER (larger than
MAX_ORDER_NR_PAGES in size). This means that we can no longer make use of
the buddy alloctor guarentees for the contiguity of the mem_map, which
ensures that the mem_map is at least contigious for maximmally aligned
areas of MAX_ORDER_NR_PAGES pages.
This patch adds new mem_map accessors and iterator helpers which handle
any discontiguity at MAX_ORDER_NR_PAGES boundaries. It then uses these to
implement gigantic page versions of copy_huge_page and clear_huge_page,
and to allow follow_hugetlb_page handle gigantic pages.
Signed-off-by: Andy Whitcroft <apw@shadowen.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: <stable@kernel.org> [2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes a regression where the controller firmware version is not
displayed in procfs. The previous patch would be called anytime something
changed. This will get called only once for each controller.
Signed-off-by: Mike Miller <mike.miller@hp.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: <stable@kernel.org> [2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes a broken symlink in sysfs that was introduced by the
above commit. We broke it in 2.6.27-rc on or about 20080804. Some
installers are broken if this symlink does not exist and they may not
detect the logical drives configured on the controller. It does not
require being backported into 2.6.26.x or earlier kernels.
Signed-off-by: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: <stable@kernel.org> [2.6.27.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Thu, 6 Nov 2008 20:53:23 +0000 (12:53 -0800)]
autofs4: collect version check return
The function check_dev_ioctl_version() returns an error code upon fail but
it isn't captured and returned in validate_dev_ioctl() as it should be.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Thu, 6 Nov 2008 20:53:22 +0000 (12:53 -0800)]
autofs4: correct offset mount expire check
When checking a directory tree in autofs_tree_busy() we can incorrectly
decide that the tree isn't busy. This happens for the case of an active
offset mount as autofs4_follow_mount() follows past the active offset
mount, which has an open file handle used for expires, causing the file
handle not to count toward the busyness check.
Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Henrik Rydberg [Thu, 6 Nov 2008 20:53:20 +0000 (12:53 -0800)]
hwmon: applesmc: add support for Macbook 5
Add accelerometer, backlight and temperature sensor support for the new
unibody Macbook 5.
Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Tested-by: David M. Lary <dmlary@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Brown [Thu, 6 Nov 2008 20:53:18 +0000 (12:53 -0800)]
rtc: fix handling of missing tm_year data when reading alarms
When fixing up invalid years rtc_read_alarm() was calling rtc_valid_tm()
as a boolean but rtc_valid_tm() returns zero on success or a negative
number if the time is not valid so the test was inverted.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Problem 1 (see patch below):
vc_tab_stop is declared as an array of 8 unsigned ints in struct
vc_data in include/linux/console_struct.h .
In drivers/char/vt.c only 5 of these 8 unsigned ints get initialized
leading to unintended tabulator placement on displays with more than
160 columns text.
Problem 2 (open):
Upcoming displays will have more than 256 columns of text leading to
invalid memory access in drivers/char/vt.c during tabulator
calculations:
if (vc->vc_tab_stop[vc->vc_x >> 5] & (1 << (vc->vc_x & 31)))
break;
Signed-off-by: Wolfgang Kroworsch <wolfgang@kroworsch.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Miller [Thu, 6 Nov 2008 08:37:40 +0000 (00:37 -0800)]
net: Fix recursive descent in __scm_destroy().
__scm_destroy() walks the list of file descriptors in the scm_fp_list
pointed to by the scm_cookie argument.
Those, in turn, can close sockets and invoke __scm_destroy() again.
There is nothing which limits how deeply this can occur.
The idea for how to fix this is from Linus. Basically, we do all of
the fput()s at the top level by collecting all of the scm_fp_list
objects hit by an fput(). Inside of the initial __scm_destroy() we
keep running the list until it is empty.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Victor [Wed, 5 Nov 2008 20:36:35 +0000 (22:36 +0200)]
[WATCHDOG] SAM9 watchdog - supported on all SAM9 and CAP9 processors
The SAM9 watchdog driver is usable on the whole family of AT91SAM9 and
CAP9 processors.
Update the configuration to indicate this and allow the driver to be selected.
Signed-off-by: Andrew Victor <linux@maxim.org.za> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Victor [Wed, 5 Nov 2008 20:18:41 +0000 (22:18 +0200)]
[WATCHDOG] SAM9 watchdog - update for moved headers
The architecture header files were recently moved from
include/asm-arm/mach-at91/ to arch/arm/mach-at91/include/mach/.
The SAM9 watchdog driver still includes a header from the old location.
Signed-off-by: Andrew Victor <linux@maxim.org.za> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andre Noll [Thu, 6 Nov 2008 08:41:24 +0000 (19:41 +1100)]
md: linear: Fix a division by zero bug for very small arrays.
We currently oops with a divide error on starting a linear software
raid array consisting of at least two very small (< 500K) devices.
The bug is caused by the calculation of the hash table size which
tries to compute sector_div(sz, base) with "base" being zero due to
the small size of the component devices of the array.
Fix this by requiring the hash spacing to be at least one which
implies that also "base" is non-zero.
This bug has existed since about 2.6.14.
Cc: stable@kernel.org Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
Alan Stern [Thu, 6 Nov 2008 07:42:49 +0000 (08:42 +0100)]
Block: use round_jiffies_up()
This patch (as1159b) changes the timeout routines in the block core to
use round_jiffies_up(). There's no point in rounding the timer
deadline down, since if it expires too early we will have to restart
it.
The patch also removes some unnecessary tests when a request is
removed from the queue's timer list.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Alan Stern [Thu, 6 Nov 2008 07:42:48 +0000 (08:42 +0100)]
Add round_jiffies_up and related routines
This patch (as1158b) adds round_jiffies_up() and friends. These
routines work like the analogous round_jiffies() functions, except
that they will never round down.
The new routines will be useful for timeouts where we don't care
exactly when the timer expires, provided it doesn't expire too soon.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Wed, 5 Nov 2008 09:21:06 +0000 (10:21 +0100)]
block: fix __blkdev_get() for removable devices
Commit 0762b8bde9729f10f8e6249809660ff2ec3ad735 moved disk_get_part()
in front of recursive get on the whole disk, which caused removable
devices to try disk_get_part() before rescanning after a new media is
inserted, which might fail legit open attempts or give the old
partition.
This patch fixes the problem by moving disk_get_part() after
__blkdev_get() on the whole disk.
Suresh Siddha [Thu, 30 Oct 2008 17:28:41 +0000 (18:28 +0100)]
generic-ipi: fix the smp_mb() placement
smp_mb() is needed (to make the memory operations visible globally) before
sending the ipi on the sender and the receiver (on Alpha atleast) needs
smp_read_barrier_depends() in the handler before reading the call_single_queue
list in a lock-free fashion.
On x86, x2apic mode register accesses for sending IPI's don't have serializing
semantics. So the need for smp_mb() before sending the IPI becomes more
critical in x2apic mode.
Remove the unnecessary smp_mb() in csd_flag_wait(), as the presence of that
smp_mb() doesn't mean anything on the sender, when the ipi receiver is not
doing any thing special (like memory fence) after clearing the CSD_FLAG_WAIT.
Mike Anderson [Thu, 30 Oct 2008 09:16:20 +0000 (02:16 -0700)]
blk: move blk_delete_timer call in end_that_request_last
Move the calling blk_delete_timer to later in end_that_request_last to
address an issue where blkdev_dequeue_request may have add a timer for the
request.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Tejun Heo [Thu, 30 Oct 2008 07:32:29 +0000 (08:32 +0100)]
block: add timer on blkdev_dequeue_request() not elv_next_request()
Block queue supports two usage models - one where block driver peeks
at the front of queue using elv_next_request(), processes it and
finishes it and the other where block driver peeks at the front of
queue, dequeue the request using blkdev_dequeue_request() and finishes
it. The latter is more flexible as it allows the driver to process
multiple commands concurrently.
These two inconsistent usage models affect the block layer
implementation confusing. For some, elv_next_request() is considered
the issue point while others consider blkdev_dequeue_request() the
issue point.
Till now the inconsistency mostly affect only accounting, so it didn't
really break anything seriously; however, with block layer timeout,
this inconsistency hits hard. Block layer considers
elv_next_request() the issue point and adds timer but SCSI layer
thinks it was just peeking and when the request can't process the
command right away, it's just left there without further processing.
This makes the request dangling on the timer list and, when the timer
goes off, the request which the SCSI layer and below think is still on
the block queue ends up in the EH queue, causing various problems - EH
hang (failed count goes over busy count and EH never wakes up),
WARN_ON() and oopses as low level driver trying to handle the unknown
command, etc. depending on the timing.
As SCSI midlayer is the only user of block layer timer at the moment,
moving blk_add_timer() to elv_dequeue_request() fixes the problem;
however, this two usage models definitely need to be cleaned up in the
future.
Define __BIOVEC_PHYS_MERGEABLE as the default implementation of
BIOVEC_PHYS_MERGEABLE, so that its available for reuse within an
arch-specific definition of BIOVEC_PHYS_MERGEABLE.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
NeilBrown [Thu, 6 Nov 2008 06:28:20 +0000 (17:28 +1100)]
md: fix bug in raid10 recovery.
Adding a spare to a raid10 doesn't cause recovery to start.
This is due to an silly type in
commit 6c2fce2ef6b4821c21b5c42c7207cb9cf8c87eda
and so is a bug in 2.6.27 and .28-rc.
Thanks to Thomas Backlund for bisecting to find this.
Cc: Thomas Backlund <tmb@mandriva.org> Cc: stable@kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 6 Nov 2008 06:28:01 +0000 (17:28 +1100)]
md: revert the recent addition of a call to the BLKRRPART ioctl.
It turns out that it is only safe to call blkdev_ioctl when the device
is actually open (as ->bd_disk is set to NULL on last close). And it
is quite possible for do_md_stop to be called when the device is not
open. So discard the call to blkdev_ioctl(BLKRRPART) which was
added in
commit 934d9c23b4c7e31840a895ba4b7e88d6413c81f3
It is just as easy to call this ioctl from userspace when needed (on
mdadm -S) so leave it out of the kernel
[JFFS2] fix race condition in jffs2_lzo_compress()
deflate_mutex protects the globals lzo_mem and lzo_compress_buf. However,
jffs2_lzo_compress() unlocks deflate_mutex _before_ it has copied out the
compressed data from lzo_compress_buf. Correct this by moving the mutex
unlock after the copy.
In addition, document what deflate_mutex actually protects.
Cc: stable@kernel.org Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: Richard Purdie <rpurdie@openedhand.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Randy Dunlap [Wed, 5 Nov 2008 04:46:46 +0000 (20:46 -0800)]
net/9p: fix printk format warnings
Fix printk format warnings in net/9p.
Built cleanly on 7 arches.
net/9p/client.c:820: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:820: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:867: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:867: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:932: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:932: warning: format '%llx' expects type 'long long unsigned int', but argument 6 has type 'u64'
net/9p/client.c:982: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:982: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:1025: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'u64'
net/9p/client.c:1025: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 12 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64'
net/9p/client.c:1227: warning: format '%llx' expects type 'long long unsigned int', but argument 13 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 7 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 12 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 8 has type 'u64'
net/9p/client.c:1252: warning: format '%llx' expects type 'long long unsigned int', but argument 13 has type 'u64'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Tom Tucker [Thu, 23 Oct 2008 21:33:25 +0000 (16:33 -0500)]
p9: Fix leak of waitqueue in request allocation path
If a T or R fcall cannot be allocated, the function returns an error
but neglects to free the wait queue that was successfully allocated.
If it comes through again a second time this wq will be overwritten
with a new allocation and the old allocation will be leaked.
Also, if the client is subsequently closed, the close path will
attempt to clean up these allocations, so set the req fields to
NULL to avoid duplicate free.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Tom Tucker [Thu, 23 Oct 2008 21:30:13 +0000 (16:30 -0500)]
9p: rdma: Set trans prior to requesting async connection ops
The RDMA connection manager is fundamentally asynchronous.
Since the async callback context is the client pointer, the
transport in the client struct needs to be set prior to calling
the first async op.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Ingo Molnar [Wed, 5 Nov 2008 15:52:08 +0000 (16:52 +0100)]
sched: re-tune balancing
Impact: improve wakeup affinity on NUMA systems, tweak SMP systems
Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain
balancing defaults on NUMA and SMP systems.
Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason
why we would not want to have wakeup affinity across nodes as well.
(we already do this in the standard NUMA template.)
lat_ctx on a NUMA box is particularly happy about this change:
[MTD] [NOR] Fix cfi_send_gen_cmd handling of x16 devices in x8 mode (v4)
For "unlock" cycles to 16bit devices in 8bit compatibility mode we need
to use the byte addresses 0xaaa and 0x555. These effectively match
the word address 0x555 and 0x2aa, except the latter has its low bit set.
Most chips don't care about the value of the 'A-1' pin in x8 mode,
but some -- like the ST M29W320D -- do. So we need to be careful to
set it where appropriate.
cfi_send_gen_cmd is only ever passed addresses where the low byte
is 0x00, 0x55 or 0xaa. Of those, only addresses ending 0xaa are
affected by this patch, by masking in the extra low bit when the device
is known to be in compatibility mode.
[dwmw2: Do it only when (cmd_ofs & 0xff) == 0xaa]
v4: Fix stupid typo in cfi_build_cmd_addr that failed to compile
I'm writing this patch way to late at night.
v3: Bring all of the work back into cfi_build_cmd_addr
including calling of map_bankwidth(map) and cfi_interleave(cfi)
So every caller doesn't need to.
v2: Only modified the address if we our device_type is larger than our
bus width.
Cc: stable@kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Peter Zijlstra [Tue, 4 Nov 2008 20:25:10 +0000 (21:25 +0100)]
sched: fix buddies for group scheduling
Impact: scheduling order fix for group scheduling
For each level in the hierarchy, set the buddy to point to the right entity.
Therefore, when we do the hierarchical schedule, we have a fair chance of
ending up where we meant to.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently we only have a forward looking buddy, that is, we prefer to
schedule to the task we last woke up, under the presumption that its
going to consume the data we just produced, and therefore will have
cache hot benefits.
This allows co-waking producer/consumer task pairs to run ahead of the
pack for a little while, keeping their cache warm. Without this, we
would interleave all pairs, utterly trashing the cache.
This patch introduces a backward looking buddy, that is, suppose that
in the above scenario, the consumer preempts the producer before it
can go to sleep, we will therefore miss the wakeup from consumer to
producer (its already running, after all), breaking the cycle and
reverting to the cache-trashing interleaved schedule pattern.
The backward buddy will try to schedule back to the task that woke us
up in case the forward buddy is not available, under the assumption
that the last task will be the one with the most cache hot task around
barring current.
This will basically allow a task to continue after it got preempted.
In order to avoid starvation, we allow either buddy to get wakeup_gran
ahead of the pack.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
powerpc: Fix "unused variable" warning in pci_dlpar.c
This gets rid of this build warning:
arch/powerpc/platforms/pseries/pci_dlpar.c: In function 'init_phb_dynamic':
arch/powerpc/platforms/pseries/pci_dlpar.c:192: warning: unused variable 'b'
This is one of the very few warnings left in a ppc64_defconfig build and
getting rid of it will make it easier to see future introduced ones (in
fact this was introduced very recently).
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Alexey Dobriyan [Sun, 2 Nov 2008 10:21:57 +0000 (10:21 +0000)]
powerpc/cell: Fix compile error in ras.c
This fixes this error on Cell when CONFIG_KEXEC = n:
arch/powerpc/platforms/cell/ras.c:299: error: implicit declaration of function 'crash_shutdown_register'
We have to include <asm/kexec.h> because it contains the dummy
definition of crash_shutdown_register that is used when
CONFIG_KEXEC=n, but <linux/kexec.h> doesn't include <asm/kexec.h> in
that case.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Alexey Dobriyan [Sun, 2 Nov 2008 07:26:51 +0000 (07:26 +0000)]
powerpc/ps3: Fix compile error in ps3-lpm.c
Compiling with CONFIG_SMP = n and CONFIG_PS3_LPM != n gives this error:
drivers/ps3/ps3-lpm.c:838: error: implicit declaration of function 'get_hard_smp_processor_id'
This fixes it. We have to include <asm/smp.h> rather than
<linux/smp.h> because the UP definition of get_hard_smp_processor_id()
is in <asm/smp.h>, and <linux/smp.h> only includes <asm/smp.h> if
CONFIG_SMP = y.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
xfrm: Fix xfrm_policy_gc_lock handling.
niu: Use pci_ioremap_bar().
bnx2x: Version Update
bnx2x: Calling netif_carrier_off at the end of the probe
bnx2x: PCI configuration bug on big-endian
bnx2x: Removing the PMF indication when unloading
mv643xx_eth: fix SMI bus access timeouts
net: kconfig cleanup
fs_enet: fix polling
XFRM: copy_to_user_kmaddress() reports local address twice
SMC91x: Fix compilation on some platforms.
udp: Fix the SNMP counter of UDP_MIB_INERRORS
udp: Fix the SNMP counter of UDP_MIB_INDATAGRAMS
drivers/net/smc911x.c: Fix lockdep warning on xmit.
Linus Torvalds [Tue, 4 Nov 2008 16:19:01 +0000 (08:19 -0800)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: mask off DET when restoring SControl for detach
libata: implement ATA_HORKAGE_ATAPI_MOD16_DMA and apply it
libata: Fix a potential race condition in ata_scsi_park_show()
sata_nv: fix generic, nf2/3 detection regression
sata_via: restore vt*_prepare_host error handling
sata_promise: add ATA engine reset to reset ops
Tejun Heo [Mon, 3 Nov 2008 10:27:07 +0000 (19:27 +0900)]
libata: mask off DET when restoring SControl for detach
libata restores SControl on detach; however, trying to restore
non-zero DET can cause undeterministic behavior including PMP device
going offline till power cycling. Mask off DET when restoring
SControl.
Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tejun Heo [Mon, 3 Nov 2008 10:01:09 +0000 (19:01 +0900)]
libata: implement ATA_HORKAGE_ATAPI_MOD16_DMA and apply it
libata always uses PIO for ATAPI commands when the number of bytes to
transfer isn't multiple of 16 but quantum DAT72 chokes on odd bytes
PIO transfers. Implement a horkage to skip the mod16 check and apply
it to the quantum device.
This is reported by John Clark in the following thread.
http://thread.gmane.org/gmane.linux.ide/34748
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: John Clark <clarkjc@runbox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Elias Oltmanns [Mon, 3 Nov 2008 10:01:08 +0000 (19:01 +0900)]
libata: Fix a potential race condition in ata_scsi_park_show()
Peter Moulder has pointed out that there is a slight chance that a
negative value might be passed to jiffies_to_msecs() in
ata_scsi_park_show(). This is fixed by saving the value of jiffies in a
local variable, thus also reducing code since the volatile variable
jiffies is accessed only once.
Signed-off-by: Elias Oltmanns <eo@nebensachen.de> Signed-off-by: Tejun Heo <tj.kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Tejun Heo [Mon, 3 Nov 2008 03:37:49 +0000 (12:37 +0900)]
sata_nv: fix generic, nf2/3 detection regression
All three flavors of sata_nv's are different in how their hardreset
behaves.
* generic: Hardreset is not reliable. Link often doesn't come online
after hardreset.
* nf2/3: A little bit better - link comes online with longer debounce
timing. However, nf2/3 can't reliable wait for the first D2H
Register FIS, so it can't wait for device readiness or classify the
device after hardreset. Follow-up SRST required.
* ck804: Hardreset finally works.
The core layer change to prefer hardreset and follow up changes
exposed the above issues and caused various detection regressions for
all three flavors. This patch, hopefully, fixes all the known issues
and should make sata_nv error handling more reliable.
Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
catched by gcc:
drivers/ata/sata_via.c: In function 'svia_init_one':
drivers/ata/sata_via.c:567: warning: 'host' may be used uninitialized in this function
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Joseph Chan <JosephChan@via.com.tw> Cc: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Promise ATA engines need to be reset when errors occur.
That's currently done for errors detected by sata_promise itself,
but it's not done for errors like timeouts detected outside of
the low-level driver.
The effect of this omission is that a timeout tends to result
in a sequence of failed COMRESETs after which libata EH gives
up and disables the port. At that point the port's ATA engine
hangs and even reloading the driver will not resume it.
To fix this, make sata_promise override ->hardreset on SATA
ports with code which calls pdc_reset_port() on the port in
question before calling libata's hardreset. PATA ports don't
use ->hardreset, so for those we override ->softreset instead.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>