[POWERPC] Fix performance regression in IRQ radix tree locking
When reworking the powerpc irq code, I figured out that we were using
the radix tree in a racy way. As a temporary fix, I put a spinlock in
there. However, this can have a significant impact on performances. This
patch reworks that to use a smarter technique based on the fact that
what we need is in fact a rwlock with extremely rare writers (thus
optimized for the read path).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Kim Phillips [Fri, 25 Aug 2006 16:59:07 +0000 (11:59 -0500)]
[POWERPC] Adapt ipic driver to new host_ops interface, add set_irq_type to set IRQ sense
This converts ipic code to Benh's IRQ mods. For the IPIC, IRQ sense values in the device tree equal those in include/linux/irq.h; that's 8 for low assertion (most internal IRQs on mpc83xx), and 2 for high-to-low change.
spinlocks added to [un]mask, ack operations; default handler and type now set in host_map; and redundant condition check eliminated.
Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Matt Porter [Fri, 4 Aug 2006 16:41:51 +0000 (11:41 -0500)]
[POWERPC] Remove flush_dcache_all export
Removes the flush_dcache_all export for non coherent platforms.
We removed the last in-kernel user of this years ago in arch/ppc
so it no longer serves a purpose. Plus, it breaks the build
at the moment.
Signed-off-by: Matt Porter <mporter@embeddedalley.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Adam Litke [Fri, 18 Aug 2006 18:22:21 +0000 (11:22 -0700)]
[POWERPC] hugepage BUG fix
On Tue, 2006-08-15 at 08:22 -0700, Dave Hansen wrote:
> kernel BUG in cache_free_debugcheck at mm/slab.c:2748!
Alright, this one is only triggered when slab debugging is enabled. The
slabs are assumed to be aligned on a HUGEPTE_TABLE_SIZE boundary. The free
path makes use of this assumption and uses the lowest nibble to pass around
an index into an array of kmem_cache pointers. With slab debugging turned
on, the slab is still aligned, but the "working" object pointer is not.
This would break the assumption above that a full nibble is available for
the PGF_CACHENUM_MASK.
The following patch reduces PGF_CACHENUM_MASK to cover only the two least
significant bits, which is enough to cover the current number of 4 pgtable
cache types. Then use this constant to mask out the appropriate part of
the huge pte pointer.
Signed-off-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Paul Mackerras [Wed, 23 Aug 2006 06:58:39 +0000 (16:58 +1000)]
[POWERPC] Correct masks used in emulating some instructions
When we get an illegal instruction exception, we check to see whether
the instruction is one that we emulate for the user program. Some of
the masks we use in checking whether the offending instruction is one
we care about didn't have the top bit set, which is the MSB of the
major opcode. Thus some undefined opcodes could get emulated as other
(defined but unimplemented) instructions. This corrects the masks.
The bootx_init.c trampoline didn't properly add the ramdisk to the
"reserve map" (list of reserved areas of memory), thus causing all sorts
of failures when using BootX with an initrd. Also fixes a possible
problem if the ramdisk is located before the device-tree passed by
BootX.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Nathan Lynch [Wed, 23 Aug 2006 01:36:05 +0000 (20:36 -0500)]
[POWERPC] Fix gettimeofday inaccuracies
There are two problems in the powerpc gettimeofday code which can
cause incorrect results to be returned.
The first is that there is a race between do_gettimeofday and the
timer interrupt:
1. do_gettimeofday does get_tb()
2. decrementer exception on boot cpu which runs timer_recalc_offset,
which also samples the timebase and updates the do_gtod structure
with a greater timebase value.
3. do_gettimeofday calls __do_gettimeofday, which leads to the
negative result from tb_val - temp_varp->tb_orig_stamp.
The second is caused by taking the boot cpu offline, which can cause
the value of tb_last_jiffy to be increased past the currently
available timebase, causing the same underflow as above.
[paulus@samba.org - define and use data_barrier() instead of mb().]
Signed-off-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Jon Loeliger [Fri, 18 Aug 2006 19:30:35 +0000 (14:30 -0500)]
[POWERPC] Rewrite the PPC 86xx IRQ handling to use Flat Device Tree
IRQ setup now comes from the Flat Device Tree and use the new generic
IRQ code. Fixed the fsl_soc.c IRQ OF interrupt node parsing.
Removed some unused MPC86xx macro definition.
Signed-off-by: Zhang Wei <wei.zhang@freescale.com> Signed-off-by: Jon Loeliger <jdl@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
(cherry picked from 919fede6edab94cccb3ca8c1c0b32fa62c9369a5 commit)
Andy Fleming [Fri, 18 Aug 2006 01:24:48 +0000 (20:24 -0500)]
[POWERPC] Fix CDS IRQ handling and PCI code
* Fix IRQ support in the 85xx CDS boards so it uses the new
generic stuff
* Fix PCI IRQ mapping to use the device tree
* Disabled i8259 support to allow the CDS to boot. This will be
fixed soon, but the current code doesn't even compile, so this
is a vast improvement
Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Andy Fleming [Fri, 18 Aug 2006 23:03:08 +0000 (18:03 -0500)]
[POWERPC] Fix interrupts on 8540 ADS board
* Fixed 8540 ADS support for the new irq layer
* Fixed 8540 ADS support for mapping PCI interrupts
* Updated 8540 ADS to use device tree for interrupt assignment
and sense values
Jon Loeliger [Thu, 17 Aug 2006 13:42:35 +0000 (08:42 -0500)]
[POWERPC] Convert to mac-address for ethernet MAC address data.
Also accept "local-mac-address". However the old "address"
is now obsolete, but accepted for backwards compatibility.
It should be removed after all device trees have been
converted to use "mac-address".
Signed-off-by: Jon Loeliger <jdl@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
The code for using the radix tree for reverse mapping of interrupts has
a typo that causes it to create incorrect mappings if the software and
hardware numbers happen to be different. This would, among others, cause
the IDE interrupt to fail on js20's. This fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
[POWERPC] kprobes: Fix possible system crash during out-of-line single-stepping
- On archs that have no-exec support, we vmalloc() a executable scratch
area of PAGE_SIZE and divide it up into an array of slots of maximum
instruction size for that arch
- On a kprobe registration, the original instruction is copied to the
first available free slot, so if multiple kprobes are registered, chances
are, they get contiguous slots
- On POWER4, due to not having coherent icaches, we could hit a situation
where a probe that is registered on one processor, is hit immediately on
another. This second processor could have fetched the stream of text from
the out-of-line single-stepping area *before* the probe registration
completed, possibly due to an earlier (and a different) kprobe hit and
hence would see stale data at the slot.
Executing such an arbitrary instruction lead to a problem as reported
in LTC bugzilla 23555.
The correct solution is to call flush_icache_range() as soon as the
instruction is copied for out-of-line single-stepping, so the correct
instruction is seen on all processors.
Thanks to Will Schmidt who tracked this down.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
To compile kexec on 32-bit we need a few more bits and pieces. Rather
than add empty definitions, we can make crash.c work on 32-bit, with
only a couple of kludges.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
[POWERPC] Move some kexec logic into machine_kexec.c
We're missing a few functions for kexec to compile on 32-bit. There's
nothing really 64-bit specific about the 64-bit versions, so make them
generic rather than adding empty definitions for 32-bit.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Will Schmidt [Tue, 8 Aug 2006 14:40:00 +0000 (09:40 -0500)]
[POWERPC] update {g5,iseries,pseries}_defconfigs
Updating the defconfigs for iseries, pseries, and G5. Sticking with
the defaults, with the following exceptions: I've turned off HW_RANDOM
for all three configs. For G5, I've enabled SND_AOA and friends as
modules; this includes the FABRIC_LAYOUT, ONYX, TAS, TOONIE and
SOUNDBUS* config options.
Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
David Wilder [Thu, 29 Jun 2006 22:17:30 +0000 (15:17 -0700)]
[POWERPC] Make secondary CPUs call into kdump on reset exception
In the case of a system hang, the user will invoke soft-reset to
initiate the kdump boot. If xmon is enabled, the CPU(s) enter into the
xmon debugger. Unfortunately, the secondary CPU(s) will return to the
hung state when they exit from the debugger (returned from die() ->
system_reset_exception()). This causes a problem in kdump since the
hung CPU(s) will not respond to the IPI sent from kdump. This patch
fixes the issue by calling crash_kexec_secondary() directly from
system_reset_exception() without returning to the previous state. These
secondary CPUs wait 5ms until the kdump boot is started by the primary
CPU. In the case we exited from the debugger to "recover" (command 'x'
in xmon) the primary and the secondary CPUs will all return from die()
-> system_reset_exception() ->crash_kexec_secondary() wait 5ms, then
return to the previous state. A kdump boot is not started in this case.
Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Haren Myneni [Thu, 27 Jul 2006 21:29:00 +0000 (14:29 -0700)]
[POWERPC] Fix might-sleep warning on removing cpus
Noticing the following might_sleep warning (dump_stack()) during kdump
testing when CONFIG_DEBUG_SPINLOCK_SLEEP is enabled. All secondary CPUs
will be calling rtas_set_indicator with interrupts disabled to remove
them from global interrupt queue.
To fix this issue, rtas_set_indicator_fast() is added so that will not
wait for RTAS 'busy' delay and this new function is used for kdump (in
xics_teardown_cpu()) and for CPU hotplug ( xics_migrate_irqs_away() and
xics_setup_cpu()).
Note that the platform architecture spec says that set-indicator
on the indicator we're using here is not permitted to return the
busy or extended busy status codes.
Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Sonny Rao [Wed, 2 Aug 2006 04:20:09 +0000 (00:20 -0400)]
[POWERPC] fix PMU initialization on pseries lpar
We should not be calling power4_enable_pmcs() in
pseries_lpar_enable_pmcs(); just doing the hypercall is sufficient.
Prior to 2.6.15 we did not call power4_enable_pmcs() for an lpar.
power4_enable_pmcs() tries to read the hid0 register which is no
longer legal for an lpar in newer Power processors.
* master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa:
[ALSA] Don't reject O_RDWR at opening PCM OSS with read/write-only device
[ALSA] snd-emu10k1: Implement support for Audigy 2 ZS [SB0353]
[ALSA] add MAINTAINERS entry for snd-aoa
[ALSA] aoa: platform function gpio: ignore errors from functions that don't exist
[ALSA] make snd-powermac load even when it can't bind the device
[ALSA] aoa: fix toonie codec
[ALSA] aoa: feature gpio layer: fix IRQ access
[ALSA] Conversions from kmalloc+memset to k(z|c)alloc
[ALSA] snd-emu10k1: Fixes ALSA bug#2190
While busy-waiting for completion, check the hardware after scheduling;
don't schedule and then immediately check the _timeout_. If the yield()
took a long time (as it does on my OLPC prototype board when it's busy),
we'd report a timeout even though the hardware was now ready.
This fixes it, and also switches the yield() for a cond_resched() because
we don't actually want to be _that_ nice about it. I see nice
tightly-packed SMBus transactions now, rather than waiting for milliseconds
between successive phases.
Actually, we shouldn't be busy-waiting here at all. We should be using
interrupts. That's an exercise for another day though.
Signed-off-by: David Woodhouse <dwmw2@infradead.org> Cc: Christer Weinigel <wingel@nano-system.com> Cc: <Jordan.Crouse@amd.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
so if we initialize UDF_I_LENEXTENTS(inode) = 0 earlier in udf_new_inode,
we won't try to free the (not) preallocated blocks, since this will match
the i_size = 0 set when the inode was initialized.
Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
matthieu castet [Sat, 5 Aug 2006 19:15:12 +0000 (12:15 -0700)]
[PATCH] pnpacpi: reject ACPI_PRODUCER resources
A patch in -mm kernel correct the parsing of "address resources" of pnpacpi.
Before we assumed it was memory only, but it could be also IO.
But this change show an hidden bug : some resources could be producer type
that are not handled by pnp layer. So we should ignore the producer
resources.
This patch fixes bug 6292 (http://bugzilla.kernel.org/show_bug.cgi?id=6292).
Some devices like PNP0A03 have 0xd00-0xffff and 0x0-0xcf7 as IO producer
resources.
Before correcting "address resources" parsing, it was seen as memory and was
harmless, because nobody tried to reserve this memory range as it should be
IO.
With the correction it become IO resources, and make failed all others device
that want to register IO in this range and use pnp layer (like a ISA sound
card).
Chris Mason [Sat, 5 Aug 2006 19:15:10 +0000 (12:15 -0700)]
[PATCH] reiserfs_write_full_page() should not get_block past eof
reiserfs_write_full_page does zero bytes in the file past eof, but it may
call get_block on those buffers as well. On machines where the page size
is larger than the blocksize, this can result in mmaped files incorrectly
growing up to a block boundary during writepage.
The fix is to avoid calling get_block for any blocks that are entirely past
eof
Signed-off-by: Chris Mason <mason@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is for collision check enhancement for memory hot add.
It's better to do resouce collision check before doing memory hot add,
which will touch memory management structures.
And add_section() should check section exists or not before calling
sparse_add_one_section(). (sparse_add_one_section() will do another
check anyway. but checking in memory_hotplug.c will be easy to understand.)
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: keith mannthey <kmannth@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
add_memory() does all necessary check to avoid collision. then, acpi layer
doesn't have to check region by itself.
(*) pfn_valid() just returns page struct is valid or not. It returns 0
if a section has been already added even is ioresource is not added.
ioresource collision check in mm/memory_hotplug.c can do more precise
collistion check.
added enabled bit check just for sanity check..
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Keith Mannthey <kmannth@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] memory hotadd fixes: change find_next_system_ram's return value manner
find_next_system_ram() returns valid memory range which meets requested area,
only used by memory-hot-add.
This function always rewrite requested resource even if returned area is not
fully fit in requested one. And sometimes the returnd resource is larger than
requested area. This annoyes the caller. This patch changes the returned
value to fit in requested area.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Keith Mannthey <kmannth@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ioresouce handling code in memory hotplug allows not-aligned memory hot add.
But when memmap and other memory structures are initialized, parameters should
be aligned. (if not aligned, initialization of mem_map will do wrong, it
assumes parameters are aligned.) This patch fix it.
And this patch allows ioresource collision check to handle -EEXIST.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Keith Mannthey <kmannth@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diego Calleja [Sat, 5 Aug 2006 19:14:55 +0000 (12:14 -0700)]
[PATCH] Fix BeFS slab corruption
In bugzilla #6941, Jens Kilian reported:
"The function befs_utf2nls (in fs/befs/linuxvfs.c) writes a 0 byte past the
end of a block of memory allocated via kmalloc(), leading to memory
corruption. This happens only for filenames which are pure ASCII and a
multiple of 4 bytes in length. [...]
Without DEBUG_SLAB, this leads to further corruption and hard lockups; I
believe this is the bug which has made kernels later than 2.6.8 unusable
for me. (This must be due to changes in memory management, the bug has
been in the BeFS driver since the time it was introduced (AFAICT).)
Steps to reproduce:
Create a directory (in BeOS, naturally :-) with files named, e.g.,
"1", "22", "333", "4444", ... Mount it in Linux and do an "ls" or "find""
This patch implements the suggested fix. Credits to Jens Kilian for
debugging the problem and finding the right fix.
Signed-off-by: Diego Calleja <diegocg@gmail.com> Cc: Jens Kilian <jjk@acm.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stefan Richter [Sat, 5 Aug 2006 19:14:53 +0000 (12:14 -0700)]
[PATCH] ieee1394: sbp2: enable auto spin-up for Maxtor disks
At least Maxtor OneTouch III require a "start stop unit" command after auto
spin-down before the next access can proceed. This patch activates the
responsible code in scsi_mod for all Maxtor SBP-2 disks.
https://bugzilla.novell.com/show_bug.cgi?id=183011
Maybe that should be done for all SBP-2 disks, but better be cautious.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Jody McIntyre <scjody@modernduck.com> Cc: Ben Collins <bcollins@ubuntu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Steven Rostedt [Sat, 5 Aug 2006 19:14:50 +0000 (12:14 -0700)]
[PATCH] Add stable branch to maintainers file
While helping someone to submit a patch to the stable branch, I noticed
that the stable branch is not listed in the MAINTAINERS file. This was
after I went there to look for the email addresses for the stable branch
list (stable@kernel.org).
This patch adds the stable branch to the maintainers file so that people
can find where to send patches when they have a fix for the stable team.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Neil Horman [Sat, 5 Aug 2006 19:14:45 +0000 (12:14 -0700)]
[PATCH] sh: fix proc file removal for superh store queue module
Clean up proc file removal in sq module for superh arch. currently on a
failed module load or on module unload a proc file is left registered which
can cause a random memory execution or oopses if read after unload. This
patch cleans up that deregistration.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alexey Dobriyan [Sat, 5 Aug 2006 19:14:43 +0000 (12:14 -0700)]
[PATCH] eicon: fix define conflict with ptrace
* MODE_MASK is unused in eicon driver.
* Conflicts with a ptrace stuff on arm.
drivers/isdn/hardware/eicon/divasync.h:259:1: warning: "MODE_MASK" redefined
include2/asm/ptrace.h:48:1: warning: this is the location of the previous definition
A set of tty line discipline cleanup patches were introduced before the
dawn of time, in kernel version 2.4.21. This patch performs that cleanup
for the hvsi driver.
The hvsi driver is used only on IBM pSeries PowerPC boxes. The driver was
originally written by Hollis Blanchard, who has delegated maintainership to
me. So this my first and maybe only patch in this official new role,
because this driver is otherwise bug-free :-)
Alan: "Actually its also a bug fix, tty->ldisc should be locked by refcounting
and the helpers do this for you."
Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Under certain rare circumstances, it appears that there can be be a
NULL-pointer deref when a user fiddles with terminal emeulation programs while
outpu is being sent to the console. This patch checks for and avoids a
NULL-pointer deref.
Signed-off-by: Hollis Blanchard <hollisbl@austin.ibm.com> Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Cc: Paul Fulghum <paulkf@microgate.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Neil Brown [Sat, 5 Aug 2006 19:14:29 +0000 (12:14 -0700)]
[PATCH] knfsd: fix race related problem when adding items to and svcrpc auth cache
If we don't find the item we are lookng for, we allocate a new one, and
then grab the lock again and search to see if it has been added while we
did the alloc. If it had been added we need to 'cache_put' the newly
created item that we are never going to use. But as it hasn't been
initialised properly, putting it can cause an oops.
So move the ->init call earlier to that it will always be fully initilised
if we have to put it.
Thanks to Philipp Matthias Hahn <pmhahn@svs.Informatik.Uni-Oldenburg.de>
for reporting the problem.
Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sat, 5 Aug 2006 19:14:25 +0000 (12:14 -0700)]
[PATCH] fadvise() make POSIX_FADV_NOREUSE a no-op
The POSIX_FADV_NOREUSE hint means "the application will use this range of the
file a single time". It seems to be intended that the implementation will use
this hint to perform drop-behind of that part of the file when the application
gets around to reading or writing it.
However for reasons which aren't obvious (or sane?) I mapped
POSIX_FADV_NOREUSE onto POSIX_FADV_WILLNEED. ie: it does readahead.
That's daft. So for now, make POSIX_FADV_NOREUSE a no-op.
This is a non-back-compatible change. If someone was using POSIX_FADV_NOREUSE
to perform readahead, they lose. The likelihood is low.
If/when we later implement POSIX_FADV_NOREUSE things will get interesting - to
do it fully we'll need to maintain file offset/length ranges and peform all
sorts of complex tricks, and managing the lifetime of those ranges' data
structures will be interesting..
A sensible implementation would probably ignore the file range and would
simply mark the entire file as needing some form of drop-behind treatment.
Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This can occur when release_console_sem() is called but the log
buffer still has contents that need to be flushed. The console drivers
are called while the console_may_schedule flag is still true. The
might_sleep() is triggered when fbcon calls console_conditional_schedule().
Fix by setting console_may_schedule to zero earlier, before the call to the
console drivers.
Jan Blunck [Sat, 5 Aug 2006 19:14:14 +0000 (12:14 -0700)]
[PATCH] fix vmstat per cpu usage
The per cpu variables are used incorrectly in vmstat.h.
Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Christoph Lameter <clameter@engr.sgi.com> Acked-by: Steve Fox <drfickle@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chuck Ebbert [Sat, 5 Aug 2006 19:14:11 +0000 (12:14 -0700)]
[PATCH] ptrace: make pid of child process available for PTRACE_EVENT_VFORK_DONE
When delivering PTRACE_EVENT_VFORK_DONE, provide pid of the child process
when tracer calls ptrace(PTRACE_GETEVENTMSG). This is already
(accidentally) available when the tracer is tracing VFORK in addition to
VFORK_DONE.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Cc: Daniel Jacobowitz <dan@debian.org> Cc: Albert Cahalan <acahalan@gmail.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Sat, 5 Aug 2006 19:14:07 +0000 (12:14 -0700)]
[PATCH] md: Fix a bug that recently crept into md/linear
A recent patch that allowed linear arrays to be reconfigured on-line
allowed in a bug which results in divide by zero - not all
mddev->array_size were converted to conf->array_size.
This patch finished the conversion and fixed the bug.
Thanks to Simon Kirby <sim@netnation.com> for the bug report.
Cc: Simon Kirby <sim@netnation.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ufs_get_locked_page is called twice in ufs code, one time in ufs_truncate
path(we allocated last block), and another time when fragments are
reallocated. In ideal world in the second case on allocation/free block
layer we should not know that things like `truncate' exists, but now with
such crutch like ufs_get_locked_page we can (or should?) skip truncated
pages.
As discussed earlier:
http://lkml.org/lkml/2006/6/28/136
this patch fixes such issue:
`ufs_get_locked_page' takes page from cache
after that `vmtruncate' takes page and deletes it from cache
`ufs_get_locked_page' locks page, and reports about EIO error.
Also because of find_lock_page always return valid page or NULL, we have no
need to check it if page not NULL.
static int unqueue_me(struct futex_q *q)
{
int ret = 0;
spinlock_t *lock_ptr;
/* In the common case we don't take the spinlock, which is nice. */
retry:
lock_ptr = q->lock_ptr;
if (lock_ptr != 0) {
spin_lock(lock_ptr);
/*
* q->lock_ptr can change between reading it and
* spin_lock(), causing us to take the wrong lock. This
* corrects the race condition.
[...]
and my compiler (gcc 4.1.0) makes the following out of it:
00000000000003c8 <unqueue_me>:
3c8: eb bf f0 70 00 24 stmg %r11,%r15,112(%r15)
3ce: c0 d0 00 00 00 00 larl %r13,3ce <unqueue_me+0x6>
3d0: R_390_PC32DBL .rodata+0x2a
3d4: a7 f1 1e 00 tml %r15,7680
3d8: a7 84 00 01 je 3da <unqueue_me+0x12>
3dc: b9 04 00 ef lgr %r14,%r15
3e0: a7 fb ff d0 aghi %r15,-48
3e4: b9 04 00 b2 lgr %r11,%r2
3e8: e3 e0 f0 98 00 24 stg %r14,152(%r15)
3ee: e3 c0 b0 28 00 04 lg %r12,40(%r11)
/* write q->lock_ptr in r12 */
3f4: b9 02 00 cc ltgr %r12,%r12
3f8: a7 84 00 4b je 48e <unqueue_me+0xc6>
/* if r12 is zero then jump over the code.... */
3fc: e3 20 b0 28 00 04 lg %r2,40(%r11)
/* write q->lock_ptr in r2 */
402: c0 e5 00 00 00 00 brasl %r14,402 <unqueue_me+0x3a>
404: R_390_PC32DBL _spin_lock+0x2
/* use r2 as parameter for spin_lock */
So the code becomes more or less:
if (q->lock_ptr != 0) spin_lock(q->lock_ptr)
instead of
if (lock_ptr != 0) spin_lock(lock_ptr)
Which caused the oops from above.
After adding a barrier gcc creates code without this problem:
[...] (the same)
3ee: e3 c0 b0 28 00 04 lg %r12,40(%r11)
3f4: b9 02 00 cc ltgr %r12,%r12
3f8: b9 04 00 2c lgr %r2,%r12
3fc: a7 84 00 48 je 48c <unqueue_me+0xc4>
400: c0 e5 00 00 00 00 brasl %r14,400 <unqueue_me+0x38>
402: R_390_PC32DBL _spin_lock+0x2
As a general note, this code of unqueue_me seems a bit fishy. The retry logic
of unqueue_me only works if we can guarantee, that the original value of
q->lock_ptr is always a spinlock (Otherwise we overwrite kernel memory). We
know that q->lock_ptr can change. I dont know what happens with the original
spinlock, as I am not an expert with the futex code.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@timesys.com> Signed-off-by: Christian Borntraeger <borntrae@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Sat, 5 Aug 2006 19:13:45 +0000 (12:13 -0700)]
[PATCH] drivers/edac/edac_mc.h must #include <linux/platform_device.h>
With CONFIG_PCI=n:
CC drivers/edac/edac_mc.o
drivers/edac/edac_mc.c: In function â\80\98add_mc_to_global_listâ\80\99:
drivers/edac/edac_mc.c:1362: error: implicit declaration of function â\80\98to_platform_deviceâ\80\99
drivers/edac/edac_mc.c:1362: error: invalid type argument of â\80\98->â\80\99
drivers/edac/edac_mc.c: In function â\80\98edac_mc_add_mcâ\80\99:
drivers/edac/edac_mc.c:1467: error: invalid type argument of â\80\98->â\80\99
drivers/edac/edac_mc.c: In function â\80\98edac_mc_del_mcâ\80\99:
drivers/edac/edac_mc.c:1504: error: invalid type argument of â\80\98->â\80\99
Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Make suspend possible with a traced process at a breakpoint
It should be possible to suspend, either to RAM or to disk, if there's a
traced process that has just reached a breakpoint. However, this is a
special case, because its parent process might have been frozen already and
then we are unable to deliver the "freeze" signal to the traced process.
If this happens, it's better to cancel the freezing of the traced process.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diego Calleja [Sun, 6 Aug 2006 04:15:58 +0000 (21:15 -0700)]
[LAPB]: Fix windowsize check
In bug #6954, Norbert Reinartz reported the following issue:
"Function lapb_setparms() in file net/lapb/lapb_iface.c checks if the given
parameters are valid. If the given window size is in the range of 8 .. 127,
lapb_setparms() fails and returns an error value of LAPB_INVALUE, even if bit
LAPB_EXTENDED in parms->mode is set.
If bit LAPB_EXTENDED in parms->mode is set and the window size is in the range
of 8 .. 127, the first check "(parms->mode & LAPB_EXTENDED)" results true and
the second check "(parms->window < 1 || parms->window > 127)" results false.
Both checks in conjunction result to false, thus the third check "(parms->window
< 1 || parms->window > 7)" is done by fault.
This third check results true, so that we leave lapb_setparms() by 'goto out_put'.
Seems that this bug doesn't cause any problems, because lapb_setparms() isn't
used to change the default values of LAPB. We are using kernel lapb in our
software project and also change the default parameters of lapb, so we found
this bug"
He also pasted a fix, that I've transformated into a patch:
Signed-off-by: Diego Calleja <diegocg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Fri, 4 Aug 2006 23:36:18 +0000 (16:36 -0700)]
[PKT_SCHED] RED: Fix overflow in calculation of queue average
Overflow can occur very easily with 32 bits, e.g., with 1 second
us_idle is approx. 2^20, which leaves only 11-Wlog bits for queue
length. Since the EWMA exponent is typically around 9, queue
lengths larger than 2^2 cause overflow. Whether the affected
branch is taken when us_idle is as high as 1 second, depends on
Scell_log, but with rather reasonable configuration Scell_log is
large enough to cause p->Stab to have zero index, which always
results zero shift (typically also few other small indices result
in zero shift).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
The datagram interface of LLC is broken in a couple of ways.
These were discovered when trying to use it to build an out-of-kernel
version of STP.
First it didn't pass the source address of the received packet
in recvfrom(). It needs to copy the source address of received LLC packets
into the socket control block. At the same time fix a security issue
because there was uninitialized data leakage. Every recvfrom call
was just copying out old data.
Second, LLC should not merge multiple packets in one receive call
on datagram sockets. LLC should preserve packet boundaries on
SOCK_DGRAM.
This fix goes against the old historical comments about UNIX98 semantics
but without this fix SOCK_DGRAM is broken and useless. So either ANK's
interpretation was incorect or UNIX98 standard was wrong.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Torokhov [Sat, 5 Aug 2006 02:53:24 +0000 (22:53 -0400)]
Input: ati_remote - add missing input_sync()
When emulating button toggle drivers need to send input_sync()
between 'down' and 'up' events, otherwise some users might miss
keypress because device's state is only considered finalized
after EV_SYN/SYN_REPORT is received.
MX300 does not have an EXTRA_BTN - it is a simple wheel mouse with
an additional task-switcher button, which is reported as side button
(and not task button).
Signed-off-by: Daniel Drake <dsd@gentoo.org> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Dean Nelson [Wed, 28 Jun 2006 18:50:09 +0000 (13:50 -0500)]
[IA64] make uncached allocator more node aware
The uncached allocator has a function, uncached_get_new_chunk(), that needs
to be serialized on a per node basis. It also has a global variable,
allocated_granules, which should be defined on a per node basis and protected
by that serialization. Additionally, all error returns from functions called
(like ia64_pal_mc_drain()) should be handled appropriately.
Signed-off-by: Dean Nelson <dcn@sgi.com> Acked-by: Jes Sorenson <jes@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Fri, 4 Aug 2006 16:56:40 +0000 (09:56 -0700)]
Merge branch 'fixes' of git://git.linux-nfs.org/pub/linux/nfs-2.6
* 'fixes' of git://git.linux-nfs.org/pub/linux/nfs-2.6:
SUNRPC: Fix obvious refcounting bugs in rpc_pipefs.
RPC: Ensure that we disconnect TCP socket when client requests error out
NLM/lockd: remove b_done
NFS: make 2 functions static
NFS: Release dcache_lock in an error path of nfs_path
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Propagate acpi_processor_preregister_performance return value.
[CPUFREQ] [2/2] demand load governor modules.
[CPUFREQ] [1/2] add __find_governor helper and clean up some error handling.
[CPUFREQ] Longhaul - Rename & fix multipliers table
[CPUFREQ] Longhaul - Fix power state test to do something more useful
[CPUFREQ] Longhaul - Readd accidentally dropped line
[CPUFREQ] Make longhaul_walk_callback() static
[CPUFREQ] X86_GX_SUSPMOD must depend on PCI
[CPUFREQ] Longhaul - Initialise later.
[CPUFREQ] Longhaul - Workaround issues with APIC.
[CPUFREQ] Longhaul - Hook into ACPI C states.
[CPUFREQ] return error when failing to set minfreq
Linus Torvalds [Fri, 4 Aug 2006 00:33:59 +0000 (17:33 -0700)]
Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[PATCH] ahci: skip protocol test altogether in spurious interrupt code
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6:
PNP: Add missing casts in printk() arguments
PCI: docking station: remove dock uevents
PCI: Unhide the SMBus on Asus PU-DLS
PCI Hotplug: add acpiphp to MAINTAINERS
PCI: pci/search: EXPORTs cannot be __devinit
PCIE: cleanup on probe error
pcie: fix warnings when CONFIG_PM=n