* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6:
PCI Hotplug: Fix small mem leak in IBM Hot Plug Controller Driver
PCI: rename DECLARE_PCI_DEVICE_TABLE to DEFINE_PCI_DEVICE_TABLE
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6:
drivers: fix dma_get_required_mask
firmware: provide stubs for the FW_LOADER=n case
nozomi: fix initialization and early flow control access
sysdev: fix problem with sysdev_class being re-registered
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
lguest: Do not append space to guests kernel command line
lguest: Revert 1ce70c4fac3c3954bd48c035f448793867592bc0, fix real problem.
lguest: Sanitize the lguest clock.
lguest: fix __get_vm_area usage.
lguest: make sure cpu is initialized before accessing it
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
[WATCHDOG] make watchdog/hpwdt.c:asminline_call() static
[WATCHDOG] Remove volatiles from watchdog device structures
[WATCHDOG] replace remaining __FUNCTION__ occurrences
[WATCHDOG] hpwdt: Use dmi_walk() instead of own copy
[WATCHDOG] Fix return value warning in hpwdt
[WATCHDOG] Fix declaration of struct smbios_entry_point in hpwdt
[WATCHDOG] it8712f_wdt support for 16-bit timeout values, WDIOC_GETSTATUS
Krzysztof Helt [Mon, 10 Mar 2008 18:44:00 +0000 (11:44 -0700)]
mbxfb: fix incorrect argument type
Fix wrong pointer type passed into the dev_dbg() function.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Acked-by: Mike Rapoport <mike@compulab.co.il> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Mon, 10 Mar 2008 18:43:59 +0000 (11:43 -0700)]
iov_iter_advance() fix
iov_iter_advance() skips over zero-length iovecs, however it does not properly
terminate at the end of the iovec array. Fix this by checking against
i->count before we skip a zero-length iov.
The bug was reproduced with a test program that continually randomly creates
iovs to writev. The fix was also verified with the same program and also it
could verify that the correct data was contained in the file after each
writev.
Paul E. McKenney [Mon, 10 Mar 2008 18:43:57 +0000 (11:43 -0700)]
rcu: move PREEMPT_RCU config option back under PREEMPT
The original preemptible-RCU patch put the choice between classic and
preemptible RCU into kernel/Kconfig.preempt, which resulted in build failures
on machines not supporting CONFIG_PREEMPT. This choice was therefore moved to
init/Kconfig, which worked, but placed the choice between classic and
preemptible RCU at the top level, a very obtuse choice indeed.
This patch changes from the Kconfig "choice" mechanism to a pair of booleans,
only one of which (CONFIG_PREEMPT_RCU) is user-visible, and is located in
kernel/Kconfig.preempt, where one would expect it to be. The other
(CONFIG_CLASSIC_RCU) is in init/Kconfig so that it is available to all
architectures, hopefully avoiding build breakage. Thanks to Roman Zippel for
suggesting this approach.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Josh Triplett <josh@freedesktop.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Mon, 10 Mar 2008 18:43:53 +0000 (11:43 -0700)]
modules: warn about suspicious return values from module's ->init() hook
Return value convention of module's init functions is 0/-E. Sometimes,
e.g. during forward-porting mistakes happen and buggy module created,
where result of comparison "workqueue != NULL" is propagated all the way up
to sys_init_module. What happens is that some other module created
workqueue in question, our module created it again and module was
successfully loaded.
Or it could be some other bug.
Let's make such mistakes much more visible. In retrospective, such
messages would noticeably shorten some of my head-scratching sessions.
Note, that dump_stack() is just a way to get attention from user. Sample
message:
sys_init_module: 'foo'->init suspiciously returned 1, it should follow 0/-E convention
sys_init_module: loading module anyway...
Pid: 4223, comm: modprobe Not tainted 2.6.24-25f666300625d894ebe04bac2b4b3aadb907c861 #5
Rusty Russell [Mon, 10 Mar 2008 18:43:52 +0000 (11:43 -0700)]
modules: fix module waiting for dependent modules' init
Commit c9a3ba55 (module: wait for dependent modules doing init.) didn't quite
work because the waiter holds the module lock, meaning that the state of the
module it's waiting for cannot change.
Fortunately, it's fairly simple to update the state outside the lock and do
the wakeup.
Thanks to Jan Glauber for tracking this down and testing (qdio and qeth).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adam Litke [Mon, 10 Mar 2008 18:43:50 +0000 (11:43 -0700)]
hugetlb: correct page count for surplus huge pages
Free pages in the hugetlb pool are free and as such have a reference count of
zero. Regular allocations into the pool from the buddy are "freed" into the
pool which results in their page_count dropping to zero. However, surplus
pages can be directly utilized by the caller without first being freed to the
pool. Therefore, a call to put_page_testzero() is in order so that such a
page will be handed to the caller with a correct count.
This has not affected end users because the bad page count is reset before the
page is handed off. However, under CONFIG_DEBUG_VM this triggers a BUG when
the page count is validated.
Thanks go to Mel for first spotting this issue and providing an initial fix.
Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnaud Patard [Mon, 10 Mar 2008 18:43:48 +0000 (11:43 -0700)]
gpio/pca953x bugfix: mark as can_sleep
The pca953x driver is an I2C driver so gpio_chip->can_sleep should be set.
This lets upper layers know they should use the gpio_*_cansleep() calls to
access values, and may not access them from nonsleeping contexts.
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: "eric miao" <eric.y.miao@gmail.com> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Mon, 10 Mar 2008 18:43:48 +0000 (11:43 -0700)]
md: reduce CPU wastage on idle md array with a write-intent bitmap
Recent patch titled
Reduce CPU wastage on idle md array with a write-intent bitmap.
would sometimes leave the array with dirty bitmap bits that stay dirty. A
subsequent write would sort things out so it isn't a big problem, but should
be fixed nonetheless.
We need to make sure that when the bitmap becomes not "allclean", the
daemon_sleep really does get set to a sensible value.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Mon, 10 Mar 2008 18:43:47 +0000 (11:43 -0700)]
md: fix formatting error in /proc/mdstat
If an md array is "auto-read-only", then this appears in /proc/mdstat as
/dev/md0: active(auto-read-only)
whereas if it is truely readonly, it appears as
/dev/md0: active (read-only)
The difference being a space.
One program known to parse this file expects the space and gets badly
confused. It will be fixed, but it would be best if what the kernel generates
is more consistent too.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Mon, 10 Mar 2008 18:43:45 +0000 (11:43 -0700)]
mempolicy: fix reference counting bugs
Address 3 known bugs in the current memory policy reference counting method.
I have a series of patches to rework the reference counting to reduce overhead
in the allocation path. However, that series will require testing in -mm once
I repost it.
1) alloc_page_vma() does not release the extra reference taken for
vma/shared mempolicy when the mode == MPOL_INTERLEAVE. This can result in
leaking mempolicy structures. This is probably occurring, but not being
noticed.
Fix: add the conditional release of the reference.
2) hugezonelist unconditionally releases a reference on the mempolicy when
mode == MPOL_INTERLEAVE. This can result in decrementing the reference
count for system default policy [should have no ill effect] or premature
freeing of task policy. If this occurred, the next allocation using task
mempolicy would use the freed structure and probably BUG out.
Fix: add the necessary check to the release.
3) The current reference counting method assumes that vma 'get_policy()'
methods automatically add an extra reference a non-NULL returned mempolicy.
This is true for shmem_get_policy() used by tmpfs mappings, including
regular page shm segments. However, SHM_HUGETLB shm's, backed by
hugetlbfs, just use the vma policy without the extra reference. This
results in freeing of the vma policy on the first allocation, with reuse of
the freed mempolicy structure on subsequent allocations.
Fix: Rather than add another condition to the conditional reference
release, which occur in the allocation path, just add a reference when
returning the vma policy in shm_get_policy() to match the assumptions.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Greg KH <greg@kroah.com> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Cc: <eric.whitney@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Dubov [Mon, 10 Mar 2008 18:43:40 +0000 (11:43 -0700)]
tifm: fix memorystick host initialization code
Instead of assuming that host is powered on only once at card insertion, allow
for the possibility that memstick layer may need to cycle card's power to get
it out from some unhealthy states.
Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Dubov [Mon, 10 Mar 2008 18:43:40 +0000 (11:43 -0700)]
tifm: fix the MemoryStick host fifo handling code
Additional input received from JMicron on MemoryStick host interfaces showed
that some assumtions in fifo handling code were incorrect. This patch also
fixes data corruption used to occure during PIO transfers.
Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Mon, 10 Mar 2008 18:43:37 +0000 (11:43 -0700)]
tridentfb: fix memory size detection
Fix memory size multiplier during detection.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Mon, 10 Mar 2008 18:43:36 +0000 (11:43 -0700)]
tridentfb: register should be left in non-locked state
Remove locking registers after they are unlocked during switch to/from MMIO
mode. This fixes regression on the Blade3D (Trident 9880) caused by the
previous patch (probe fixes).
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Boyer [Mon, 10 Mar 2008 18:43:35 +0000 (11:43 -0700)]
of_serial: fix section mismatch warnings
Fix the following section mismatches:
WARNING: drivers/built-in.o(.exit.text+0x5a): Section mismatch in reference from the function of_platform_serial_exit() to the variable .devinit.data:of_platform_serial_driver
The function __exit of_platform_serial_exit() references
a variable __devinitdata of_platform_serial_driver.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Juhl [Sat, 8 Mar 2008 01:16:07 +0000 (02:16 +0100)]
PCI Hotplug: Fix small mem leak in IBM Hot Plug Controller Driver
In drivers/pci/hotplug/ibmphp_ebda.c::ebda_rsrc_controller(), storage is
allocated with kzalloc() and assigned to 'tmp_slot'. Then lots of
stuff, like ->flag, ->supported_speed etc is set in tmp_slot. A bit
further down there's then this test :
if (!bus_info_ptr1) {
rc = -ENODEV;
goto error;
}
At this point, tmp_slot has not been assigned to anything, so when
erroring-out we want to free it, but nothing at the 'error:' label
free's 'tmp_slot' - and we can't really free 'tmp_slot' at 'error:'
since we may jump to that label later when 'tmp_slot' *has* been used
and we do not want it freed. So, the only sane option left seems to be
to kfree(tmp_slot) just before jumping to the 'error:' label in the one
place where this is what actually makes sense. The following patch does
just that and thus kills off a tiny potential memory leak.
James Bottomley [Sun, 9 Mar 2008 16:57:56 +0000 (11:57 -0500)]
drivers: fix dma_get_required_mask
There's a bug in the current implementation of dma_get_required_mask()
where it ands the returned mask with the current device mask. This
rather defeats the purpose if you're using the call to determine what
your mask should be (since you will at that time have the default
DMA_32BIT_MASK). This bug results in any driver that uses this function
*always* getting a 32 bit mask, which is wrong.
James Bottomley [Fri, 7 Mar 2008 14:57:54 +0000 (08:57 -0600)]
firmware: provide stubs for the FW_LOADER=n case
libsas has a case where it uses the firmware loader to provide services,
but doesn't want to select it all the time. This currently causes a
compile failure in libsas if FW_LOADER=n. Fix this by providing error
stubs for the firmware loader API in the FW_LOADER=n case.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Frank Seidel [Thu, 6 Mar 2008 20:45:57 +0000 (21:45 +0100)]
nozomi: fix initialization and early flow control access
Due to some flaws in the initialization and flow control
code kernel oopses could be triggered e.g. when accessing
the card too early after insertion.
See e.g. kernel.org bug #10077.
The main part of the fix is a trivial state management
making sure the card is realy ready to use before allowing
any access.
Signed-off-by: Frank Seidel <fseidel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Paul Bolle [Mon, 10 Mar 2008 15:39:03 +0000 (16:39 +0100)]
lguest: Do not append space to guests kernel command line
The lguest launcher appends a space to the kernel command line (if kernel
arguments are specified on its command line). This space is unneeded. More
importantly, this appended space will make Red Hat's nash script interpreter
(used in a Fedora style initramfs) add an empty argument to init's command
line. This empty argument will make kernel arguments like "init=/bin/bash"
fail (because the shell will try to execute a script with an empty name).
This could be considered a bug in nash, but is easily fixed in the lguest
launcher too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Ahmed managed to crash the Host in release_pgd(), which cannot be a Guest
bug, and indeed it wasn't.
The bug was that handing a 0 as the address of the toplevel page table
being manipulated can cause the lookup code in find_pgdir() to return
an uninitialized cache entry (we shadow up to 4 top level page tables
for each Guest).
The patch which he submitted (which removed the /4 from the index
calculation) simply ensured that these high-indexed entries hit the
early exit path of guest_set_pmd(). But you get lots of segfaults in
guest userspace as the PMDs aren't being updated.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 11 Mar 2008 14:35:56 +0000 (09:35 -0500)]
lguest: Sanitize the lguest clock.
Now the TSC code handles a zero return from calculate_cpu_khz(),
lguest can simply pass through the value it gets from the Host: if
non-zero, all the normal TSC code applies.
Otherwise (or if the Host really doesn't support TSC), the clocksource
code will fall back to the slower but reasonable lguest clock.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
lguest should pass the exact "end" it wants, not some random constant
(it was possible previously that it would actually get an address
different from SWITCHER_ADDR).
Also, Fabio Checconi pointed out that we should make sure we're not
hitting the fixmap area.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Robert Bragg <robert@sixbynine.org>
Eugene Teo [Sat, 9 Feb 2008 15:53:17 +0000 (23:53 +0800)]
lguest: make sure cpu is initialized before accessing it
If req is LHREQ_INITIALIZE, and the guest has been initialized before
(unlikely), it will attempt to access cpu->tsk even though cpu is not yet
initialized.
Signed-off-by: Eugene Teo <eugeneteo@kernel.sg> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tao Ma [Wed, 5 Mar 2008 07:49:55 +0000 (15:49 +0800)]
ocfs2: Fix NULL pointer dereferences in o2net
In some situations, ocfs2_set_nn_state might get called with sc = NULL and
valid = 0. If sc = NULL, we can't dereference it to get the o2nm_node
member. Instead, do what o2net_initialize_handshake does and use NULL when
calling o2net_reconnect_delay and o2net_idle_timeout.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Sunil Mushran [Sat, 1 Mar 2008 22:04:21 +0000 (14:04 -0800)]
ocfs2/dlm: Add missing dlm_lockres_put()s in migration path
During migration, the recovery master node may be asked to master a lockres
it may not know about. In that case, it would not only have to create a
lockres and add it to the hash, but also remember to to do the _put_
corresponding to the kref_init in dlm_init_lockres(), as soon as the migration
is completed. Yes, we don't wait for the dlm_purge_lockres() to do that
matching put. Note the ref added for it being in the hash protects the lockres
from being freed prematurely.
This patch adds that missing put, as described above, to plug a memleak.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Sunil Mushran [Sat, 1 Mar 2008 22:04:20 +0000 (14:04 -0800)]
ocfs2/dlm: Add missing dlm_lock_put()s
Normally locks for remote nodes are freed when that node sends an UNLOCK
message to the master. The master node tags an DLM_UNLOCK_FREE_LOCK action
to do an extra put on the lock at the end.
However, there are times when the master node has to free the locks for the
remote nodes forcibly.
Two cases when this happens are:
1. When the master has migrated the lockres plus all locks to another node.
2. When the master is clearing all the locks of a dead node.
It was in the above two conditions that the dlm was missing the extra put.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Tao Ma [Mon, 3 Mar 2008 02:53:02 +0000 (10:53 +0800)]
ocfs2: Fix an endian bug in online resize.
In ocfs2_group_add, 'cr' is a disk field of type 'ocfs2_chain_rec', and we
were putting cpu byteorder values into it. Swap things to the right endian
before storing.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Joel Becker [Tue, 12 Feb 2008 22:56:25 +0000 (14:56 -0800)]
ocfs2: Fix endian bug in o2dlm protocol negotiation.
struct dlm_query_join_packet is made up of four one-byte fields. They
are effectively in big-endian order already. However, little-endian
machines swap them before putting the packet on the wire (because
query_join's response is a status, and that status is treated as a u32
on the wire). Thus, a big-endian and little-endian machines will
treat this structure differently.
The solution is to have little-endian machines swap the structure when
converting from the structure to the u32 representation.
Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Tao Ma [Thu, 28 Feb 2008 02:41:55 +0000 (10:41 +0800)]
ocfs2: Use dlm_print_one_lock_resource for lock resource print
__dlm_print_one_lock_resource must be called with spin_lock
the res->spinlock. While in some cases, we use it without this
precondition and lead to the failure of assert_spin_locked.
So call dlm_print_one_lock_resource instead.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
fs/ocfs2/dlm/dlmdomain.c: In function 'dlm_send_join_cancels':
fs/ocfs2/dlm/dlmdomain.c:983: warning: format '%u' expects type 'unsigned int', but argument 7 has type 'long unsigned int'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
time: remove obsolete CLOCK_TICK_ADJUST
time: don't touch an offlined CPU's ts->tick_stopped in tick_cancel_sched_timer()
time: prevent the loop in timespec_add_ns() from being optimised away
ntp: use unsigned input for do_div()
Ivan Kokshaysky [Sun, 9 Mar 2008 13:31:17 +0000 (16:31 +0300)]
alpha: fix iommu-related boot panic
This fixes a boot panic due to a typo in the recent iommu patchset from
FUJITA Tomonori <tomof@acm.org> - the code used dma_get_max_seg_size()
instead of dma_get_seg_boundary().
It also removes a couple of unnecessary BUG_ON() and ALIGN() macros.
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Reported-and-tested-by: Bob Tracy <rct@frus.com> Acked-by: FUJITA Tomonori <tomof@acm.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gregory Haskins [Sat, 8 Mar 2008 05:10:15 +0000 (00:10 -0500)]
cpu hotplug: adjust root-domain->online span in response to hotplug event
We currently set the root-domain online span automatically when the
domain is added to the cpu if the cpu is already a member of
cpu_online_map.
This was done as a hack/bug-fix for s2ram, but it also causes a problem
with hotplug CPU_DOWN transitioning. The right way to fix the original
problem is to actually respond to CPU_UP events, instead of CPU_ONLINE,
which is already too late.
This solves the hung reboot regression reported by Andrew Morton and
others.
While the fix did greatly improve the situation, it was correctly pointed out
by Roman that it does have a small bug: If the users change clocksources after
the system has been running and NTP has made corrections, the correctoins made
against the old clocksource will be applied against the new clocksource,
causing error.
The second attempt, which corrects the issue in the NTP_INTERVAL_LENGTH
definition has also made it up-stream as commit e13a2e61dd5152f5499d2003470acf9c838eab84
Roman has correctly pointed out that CLOCK_TICK_ADJUST is calculated
based on the PIT's frequency, and isn't really relevant to non-PIT
driven clocksources (that is, clocksources other then jiffies and pit).
This patch reverts both of those changes, and simply removes
CLOCK_TICK_ADJUST.
This does remove the granularity error correction for users of PIT and Jiffies
clocksource users, but the granularity error but for the majority of users, it
should be within the 500ppm range NTP can accommodate for.
For systems that have granularity errors greater then 500ppm, the
"ntp_tick_adj=" boot option can be used to compensate.
[johnstul@us.ibm.com: provided changelog]
[mattilinnanvuori@yahoo.com: maek ntp_tick_adj static] Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: john stultz <johnstul@us.ibm.com> Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: mingo@elte.hu Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Karsten Wiese [Tue, 4 Mar 2008 22:59:55 +0000 (14:59 -0800)]
time: don't touch an offlined CPU's ts->tick_stopped in tick_cancel_sched_timer()
Silences WARN_ONs in rcu_enter_nohz() and rcu_exit_nohz(), which appeared
before caused by (repeated) calls to:
$ echo 0 > /sys/devices/system/cpu/cpu1/online
$ echo 1 > /sys/devices/system/cpu/cpu1/online
Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de> Cc: johnstul@us.ibm.com Cc: Rafael Wysocki <rjw@sisk.pl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
David Howells [Mon, 25 Feb 2008 17:31:57 +0000 (18:31 +0100)]
ntp: use unsigned input for do_div()
The kernel NTP code shouldn't hand 64-bit *signed* values to do_div(). Make it
instead hand 64-bit unsigned values. This gets rid of a couple of warnings.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Roland McGrath [Sat, 8 Mar 2008 19:41:22 +0000 (11:41 -0800)]
Fix waitid si_code regression
In commit ee7c82da830ea860b1f9274f1f0cdf99f206e7c2 ("wait_task_stopped:
simplify and fix races with SIGCONT/SIGKILL/untrace"), the magic (short)
cast when storing si_code was lost in wait_task_stopped. This leaks the
in-kernel CLD_* values that do not match what userland expects.
Herbert Xu [Sat, 8 Mar 2008 12:29:43 +0000 (20:29 +0800)]
[CRYPTO] skcipher: Fix section mismatches
The previous patch to move chainiv and eseqiv into blkcipher created
a section mismatch for the chainiv exit function which was also called
from __init. This patch removes the __exit marking on it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Roland McGrath [Fri, 7 Mar 2008 22:56:02 +0000 (14:56 -0800)]
x86_64: make ptrace always sign-extend orig_ax to 64 bits
This makes 64-bit ptrace calls setting the 64-bit orig_ax field for a
32-bit task sign-extend the low 32 bits up to 64. This matches what a
64-bit debugger expects when tracing a 32-bit task.
This follows on my "x86_64 ia32 syscall restart fix". This didn't
matter until that was fixed.
The debugger ignores or zeros the high half of every register slot it
sets (including the orig_rax pseudo-register) uniformly. It expects
that the setting of the low 32 bits always has the same meaning as a
32-bit debugger setting those same 32 bits with native 32-bit
facilities.
This never arose before because the syscall restart check never
matched any -ERESTART* values due to lack of sign extension. Before
that fix, even 32-bit ptrace setting orig_eax to -1 failed to trigger
the restart check anyway. So this was never noticed as a regression
of 64-bit debuggers vs 32-bit debuggers on the same 64-bit kernel.
Signed-off-by: Roland McGrath <roland@redhat.com>
[ Changed to just do the sign-extension unconditionally on x86-64,
since orig_ax is always just a small integer and doesn't need
the full 64-bit range ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 7 Mar 2008 21:49:32 +0000 (13:49 -0800)]
Merge branch 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm
* 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm:
slub: fix typo in Documentation/vm/slub.txt
slab: NUMA slab allocator migration bugfix
slub: Do not cross cacheline boundaries for very small objects
slab - use angle brackets for include of kmalloc_sizes.h
slab numa fallback logic: Do not pass unfiltered flags to page allocator
slub statistics: Fix check for DEACTIVATE_REMOTE_FREES
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
ide: update references to Documentation/ide/ide.txt (v2)
ide: move ide.txt to Documentation/ide/
ide: fix buggy code in ide_register_hw()
ide: fix enabling DMA on it821x in "smart" mode
ide-cd: mark REQ_TYPE_ATA_PC write requests with REQ_RW flag
ide-cd: mark REQ_TYPE_ATA_PC write requests with REQ_RW flag
On Thursday 06 March 2008, walt wrote:
> For me, this commit causes the problem it's intended to fix:
>
> commit 9f10d9ee0ac6d79d7bc8b9a158bf4a29322d84d3
> Author: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> Date: Tue Feb 26 21:50:35 2008 +0100
>
> ide-cd: fix 'ireason' handling for REQ_TYPE_ATA_PC requests
>
> This fixes some hangs caused by not finishing the transfer before ending
> the request and also makes use of 'ireason == 1' quirk for spurious IRQs.
>
> When I mount a CD there is a long delay, and I see this error message:
>
> hdc: ide_cd_check_ireason: wrong transfer direction!
> cdrom: failed setting lba address space
> hdc: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> ide: failed opcode was: unknown
> hdc: drive not ready for command
> <repeated many times>
>
> When I revert this commit everything works properly again, including
> CD burning.
It turned out that REQ_TYPE_ATA_PC write requests were not marked as such
(the previous commit assumed them to be).
Reported-by: walt <w41ter@gmail.com> Tested-by: walt <w41ter@gmail.com> Reviewed-by: Borislav Petkov <petkovbb@googlemail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Linus Torvalds [Fri, 7 Mar 2008 20:08:07 +0000 (12:08 -0800)]
Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings
NFS: Fix the fsid revalidation in nfs_update_inode()
SUNRPC: Fix a nfs4 over rdma transport oops
NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
Trond Myklebust [Thu, 6 Mar 2008 17:34:59 +0000 (12:34 -0500)]
NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings
As long as the directory contents haven't changed, we should just let the
path walk proceed to cross the mountpoint. Apart from being an optimisation
in the case of 'nohide' mountpoint traversals, it also fixes an issue with
referrals: referral inodes don't have valid filehandles, so calling
nfs_revalidate_inode() on them is a bug.
Trond Myklebust [Thu, 6 Mar 2008 17:34:50 +0000 (12:34 -0500)]
NFS: Fix the fsid revalidation in nfs_update_inode()
When we detect that we've crossed a mountpoint on the remote server, we
must take care not to use that inode to revalidate the fsid on our
current superblock. To do so, we label the inode as a remote mountpoint,
and check for that in nfs_update_inode().
Tilman Schmidt [Fri, 7 Mar 2008 18:47:08 +0000 (19:47 +0100)]
gigaset: fix Oops on module unload regression
The card state mutex was only initialized when a device was connected,
but used during unload unconditionally, leading to an Oops if a driver
was loaded and unloaded again without ever connecting a device.
Fix this by initializing the mutex as soon as the structure is allocated.
Also add a missing mutex unlock revealed in the same execution path.
Linus Torvalds [Fri, 7 Mar 2008 18:08:17 +0000 (10:08 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel:
sched: don't allow rt_runtime_us to be zero for groups having rt tasks
sched: rt-group: fixup schedulability constraints calculation
sched: fix the wrong time slice value for SCHED_FIFO tasks
sched: export task_nice
sched: balance RT task resched only on runqueue
sched: retain vruntime
Ingo Molnar [Fri, 7 Mar 2008 09:47:43 +0000 (10:47 +0100)]
drivers/char/esp.c: fix bootup lockup
randconfig testing found a bootup lockup in drivers/char/esp.c because
of a spinlock that wasn't correctly initialized.
I'm not sure why it became more prominent in 2.6.25-rc4, the bug seems
rather old and i've been doing allyesconfig bootups for ages with
CONFIG_ESP enabled.
This fixes this bootup lockup:
PM: Adding info for No Bus:ttyP63
ttyP32 at 0x0240 (irq = 0) is an ESP primary port
BUG: spinlock lockup on CPU#0, swapper/1, f56dd004
Pid: 1, comm: swapper Not tainted 2.6.25-rc4-sched-devel.git-x86-latest.git #402 [<c03ac6f4>] _raw_spin_lock+0x134/0x140
[<c08649be>] _spin_lock_irqsave+0x5e/0x80
[<c0b9fbfe>] ? espserial_init+0x2be/0x6e0
[<c0b9fbfe>] espserial_init+0x2be/0x6e0
[<c0b877a3>] kernel_init+0x83/0x260
[<c0b9f940>] ? espserial_init+0x0/0x6e0
[<c010416a>] ? restore_nocheck_notrace+0x0/0xe
[<c0b87720>] ? kernel_init+0x0/0x260
[<c0b87720>] ? kernel_init+0x0/0x260
[<c0104507>] kernel_thread_helper+0x7/0x10
=======================
kzalloc() is not the way to initialize spinlocks anymore.
Dhaval Giani [Thu, 28 Feb 2008 09:51:56 +0000 (15:21 +0530)]
sched: don't allow rt_runtime_us to be zero for groups having rt tasks
This patch checks if we can set the rt_runtime_us to 0. If there is a
realtime task in the group, we don't want to set the rt_runtime_us as 0
or bad things will happen. (that task wont get any CPU time despite
being TASK_RUNNNG)
it was only possible to configure the rt-group scheduling parameters
beyond the default value in a very small range.
that's because div64_64() has a different calling convention than
do_div() :/
fix a few untidies while we are here; sysctl_sched_rt_period may overflow
due to that multiplication, so cast to u64 first. Also that RUNTIME_INF
juggling makes little sense although its an effective NOP.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
this is due to a place that can reschedule a task without holding
the tasks runqueue lock. This was caused by the RT balancing code
that pulls RT tasks to the current run queue and will reschedule the
current task.
There's a slight chance that the pulling of the RT tasks will release
the current runqueue's lock and retake it (in the double_lock_balance).
During this time that the runqueue is released, the current task can
migrate to another runqueue.
In the prio_changed_rt code, after the pull, if the current task is of
lesser priority than one of the RT tasks pulled, resched_task is called
on the current task. If the current task had migrated in that small
window, resched_task will be called without holding the runqueue lock
for the runqueue that the task is on.
This race condition also exists in the mainline kernel and this patch
adds a check to make sure the task hasn't migrated before calling
resched_task.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Tested-by: Sripathi Kodi <sripathik@in.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 29 Feb 2008 20:21:01 +0000 (15:21 -0500)]
sched: retain vruntime
Kei Tokunaga reported an interactivity problem when moving tasks
between control groups.
Tasks would retain their old vruntime when moved between groups, this
can cause funny lags. Re-set the vruntime on group move to fit within
the new tree.
Peter Korsgaard [Thu, 6 Mar 2008 09:56:45 +0000 (10:56 +0100)]
x86-boot: don't request VBE2 information
The new x86 setup code (4fd06960f120) broke booting on an old P3/500MHz
with an onboard Voodoo3 of mine. After debugging it, it turned out
to be caused by the fact that the vesa probing now asks for VBE2 data.
Disassembing the video BIOS shows that it overflows the vesa_general_info
structure when VBE2 data is requested because the source addresses for the
information strings which get strcpy'ed to the buffer lie outside the 32K
BIOS code (and hence contain long sequences of 0xff's).
(The full BIOS can be found at http://peter.korsgaard.com/vgabios.bin
if interested).
The old setup code didn't ask for VBE2 info, and the new code doesn't
actually do anything with the extra information, so the fix is to simply
not request it. Other BIOS'es might have the same problem.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jan Beulich [Wed, 5 Mar 2008 08:35:14 +0000 (08:35 +0000)]
x86: fix merge mistake in i387.c
convert_fxsr_to_user() in 2.6.24's i387_32.c did this, and
convert_to_fxsr() also does the inverse, so I assume it's an oversight
that it is no longer being done.
[ mingo@elte.hu:
we encode it this way because there's no space for the 'FPU Last
Instruction Opcode' (->fop) field in the legacy user_i387_ia32_struct
that PTRACE_GETFPREGS/PTRACE_SETFPREGS uses.
it's probably pure legacy - i'd be surprised if any user-space relied on
the FPU Last Opcode in any way. But indeed we used to do it previously
so the most conservative thing is to preserve that piece of information.
]
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>