1) Disable Interrupts during DMA memcpy to avoid raise conditions.
2) Mark MDMA channel 0 as reserved, since were using it internally.
3) Add DMA based equivalents for insX and outsX.
4) Our insX and outsX only handles len <= 2^16.
Alexey Dobriyan [Sun, 20 May 2007 21:22:52 +0000 (01:22 +0400)]
Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.
This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
getting them indirectly
Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
they don't need sched.h
b) sched.h stops being dependency for significant number of files:
on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
after patch it's only 3744 (-8.3%).
Cross-compile tested on
all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
alpha alpha-up
arm
i386 i386-up i386-defconfig i386-allnoconfig
ia64 ia64-up
m68k
mips
parisc parisc-up
powerpc powerpc-up
s390 s390-up
sparc sparc-up
sparc64 sparc64-up
um-x86_64
x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
Herbert Xu [Sat, 19 May 2007 04:51:00 +0000 (14:51 +1000)]
[CRYPTO] api: Read module pointer before freeing algorithm
The function crypto_mod_put first frees the algorithm and then drops
the reference to its module. Unfortunately we read the module pointer
which after freeing the algorithm and that pointer sits inside the
object that we just freed.
So this patch reads the module pointer out before we free the object.
Thanks to Luca Tettamanti for reporting this.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Gerald Britton [Mon, 14 May 2007 17:53:01 +0000 (13:53 -0400)]
cciss: Fix pci_driver.shutdown while device is still active
Fix an Oops in the cciss driver caused by system shutdown while a filesystem
on a cciss device is still active. The cciss_remove_one function only
properly removes the device if the device has been cleanly released by its
users, which is not the case when the pci_driver.shutdown method is called.
This patch adds a new cciss_shutdown function to better match the pattern
used by various SCSI drivers: deactivate device interrupts and flush caches.
It also alters the cciss_remove_one function to match and readds the
__devexit annotation that was removed when cciss_remove_one was serving as
the pci_driver.shutdown method.
Signed-off-by: Gerald Britton <gbritton@alum.mit.edu> Acked-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
H. Peter Anvin [Thu, 17 May 2007 22:50:47 +0000 (15:50 -0700)]
Further update of the i386 boot documentation
A number of items in the i386 boot documentation have been either
vague, outdated or hard to read. This text revision adds several more
examples, including a memory map for a modern kernel load. It also
adds a field-by-field detailed description of the setup header, and a
bootloader ID for Qemu.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
[CRYPTO] tcrypt: Add missing error check
[CRYPTO] padlock: Make CRYPTO_DEV_PADLOCK a tristate again
1 is a power of two, therefore roundup_pow_of_two(1) should return 1. It does
in case the argument is a variable but in case it's a constant it behaves
wrong and returns 0. Probably nobody ever did it so this was never noticed.
Linus Torvalds [Fri, 18 May 2007 15:26:28 +0000 (08:26 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (32 commits)
[POWERPC] Remove build warnings in windfarm_core
[POWERPC] Pass per-file CFLAGs for platform specific op codes
[POWERPC] Correct #endif comment
[POWERPC] Fix ppc_rtas_progress_show()
[POWERPC] Fix sed command lines for zlib source construction
[POWERPC] Specify GNUTARGET on $(AR) invocations
[POWERPC] Make sure device node type/name is not NULL on hot-added nodes
[POWERPC] Small fixes for the Ebony device tree
[POWERPC] Fix warning on UP
[POWERPC] cell_defconfig: Disable cpufreq and pmi
[POWERPC] Fix IO space on PCI buses created from of_platform
[POWERPC] Add spinlock to request_phb_iospace()
[POWERPC] Fix make rules for treeImage.initrd
[POWERPC] Remove warning in mpic.c
[POWERPC] Update pasemi_defconfig
[POWERPC] pasemi: CONFIG_GENERIC_TBSYNC no longer needed
[POWERPC] Update iseries_defconfig
[POWERPC] Wire up some more syscalls
[POWERPC] Fix bug adding properties with flatdevtree.c's ft_set_prop()
[POWERPC] Remove fixup_bigphys_addr() for arch/powerpc to avoid link error
...
It turns out the kernel was correct, and the gcc complaint was a gcc
bug. The preferred stack boundary is expressed not in bytes, but in the
the log2() of the preferred boundary, so "-mpreferred-stack-boundary=2"
is in fact exactly what we want, but a gcc that is compiled for x86-64
will consider it an error (because the 64-bit calling sequence says that
the stack should be 16-byte aligned) even if we are then using "-m32" to
generate 32-bit code.
Tejun Heo [Thu, 17 May 2007 14:43:26 +0000 (16:43 +0200)]
libata: remove libata.spindown_compat
With STANDBYDOWN tracking added, libata.spindown_compat isn't
necessary anymore. If userspace shutdown(8) issues STANDBYNOW, libata
warns. If userspace shutdown(8) doesn't issue STANDBYNOW, libata does
the right thing. Userspace can tell whether kernel supports spindown
by testing whether sysfs node manage_start_stop exists as before.
Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Tejun Heo [Thu, 17 May 2007 11:13:57 +0000 (13:13 +0200)]
sata_nv: fix fallout of devres conversion
As with all other drivers, sata_nv's hpriv is allocated with
devm_kzalloc() and there's no need to free it explicitly. Kill
nv_remove_one() which incorrectly used kfree() instead of devm_kfree()
and use ata_pci_remove_one() directly.
Peer Chen [Fri, 11 May 2007 05:48:49 +0000 (22:48 -0700)]
drivers/ata: remove the wildcard from sata_nv driver
Because nvidia SATA controllers onward base on AHCI, so wildcard in sata_nv
driver is unnecessary. Also the wildcard sometimes cause sata_nv driver to
be loaded for AHCI controllers,which is not as expected.
Signed-off-by: Peer Chen <pchen@nvidia.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
pci_enable_msi failure is a normal event so we should not print any error.
Going over the code I spotted a missing pci_disable_msi() leak when irq
allocation fails. The whole code also needed a cleanup, so I combined the
two different calls to pci_request_irq into a single call making this
look a lot better. All #ifdef CONFIG_PCI_MSI's have been removed.
Compile tested with both CONFIG_PCI_MSI enabled and disabled.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Auke Kok [Thu, 17 May 2007 22:29:07 +0000 (15:29 -0700)]
ixgb: don't print error if pci_enable_msi() fails, cleanup minor leak
pci_enable_msi calls can fail for normal operational reasons. Driver
should not print an error message in that case. Fix a leak that leaves
msi enabled if pci_request_irq fails. We can remove CONFIG_PCI_MSI
ifdefs alltogether
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Scott Wood [Wed, 16 May 2007 20:06:59 +0000 (15:06 -0500)]
gianfar: Add I/O barriers when touching buffer descriptor ownership.
The hardware must not see that is given ownership of a buffer until it is
completely written, and when the driver receives ownership of a buffer,
it must ensure that any other reads to the buffer reflect its final
state. Thus, I/O barriers are added where required.
Without this patch, I have observed GCC reordering the setting of
bdp->length and bdp->status in gfar_new_skb. Hardware reordering
was also theoretically possible.
Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Eugene Surovegin [Wed, 16 May 2007 18:59:48 +0000 (11:59 -0700)]
ibm_emac: improved PHY support
Original patch is from Jeff Haran <jharan@brocade.com> with my minor style
fixes. His comments follow:
The first problem was in the function that configures the PHY for
autonegotiation, genmii_setup_aneg(). The original code does a
read/modify/write of the autonegotiation advertizement register (reg 4),
followed by a read/modify/write of the control register (reg 0). While
the original code follows the proper procedure as per reading the IEEE
specs, what I found is that on at least one PHY model (National DP83843)
the read of the control register comes back with the soft reset bit set
(bit 15). Because of the read/modify/write operation, this causes the
write to write a 1 back to the reset bit, which initiates a software
reset of the PHY. This software reset causes the PHY to return to its
power up state which advertizes all modes of operation, thus negating
the write to the autoneg advertizement register. The modification is to
spin reading the control register until the soft reset bit is clear
before doing the modify/write.
The second problem was in the function that configures the PHY for
forced operation, genmii_setup_forced(). The original code initiates a
software reset operation via a write of a 1 to bit 15 of the control
register (reg 0), but then proceeds to do a second write to that same
register without waiting until that reset bit is cleared by the PHY
itself (which according to the IEEE specs indicates that the PHY reset
is complete). This is a violation of how one is supposed to use this
software reset feature of these PHYs and I believe was the cause of
mysterious, difficult to reproduce link failures that we've observed on
some of our systems that use this driver. The fix is to modify the
function so that it spins waiting for the reset bit to clear after doing
the soft reset and before doing the subsequent write.
Signed-off-by: Jeff Haran <jharan@brocade.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Eugene Surovegin <ebs@ebshome.net> Signed-off-by: Jeff Garzik <jeff@garzik.org>
This workaround was added to deal with NAPI core and how
it affected dual port shared polling. It turned out not to
be necessary. Stopping device 0 only doesn't stop NAPI from
working completely after that.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Align the PHY setup of the sky2 driver with the vendor sk98lin (10.0.4.3)
driver. The PHY register settings are mostly black magic, even with access
to the documentation it isn't clear what the right values are. The changes
are mostly comments, the code change only affects the Yukon FE (100 mbit only)
version.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>
Dave Jones [Thu, 17 May 2007 22:02:21 +0000 (15:02 -0700)]
[IPV4]: Correct rp_filter help text.
As mentioned in http://bugzilla.kernel.org/show_bug.cgi?id=5015
The helptext implies that this is on by default.
This may be true on some distros (Fedora/RHEL have it enabled
in /etc/sysctl.conf), but the kernel defaults to it off.
Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Satyam Sharma [Thu, 17 May 2007 06:50:16 +0000 (23:50 -0700)]
[BLUETOOTH]: Fix locking in hci_sock_dev_event().
We presently use lock_sock() to acquire a lock on a socket in
hci_sock_dev_event(), but this goes BUG because lock_sock()
can sleep and we're already holding a read-write spinlock at
that point. So, we must use the non-sleeping BH version,
bh_lock_sock().
However, hci_sock_dev_event() is called from user context and
hence using simply bh_lock_sock() will deadlock against a
concurrent softirq that tries to acquire a lock on the same
socket. Hence, disabling BH's before acquiring the socket lock
and enable them afterwards, is the proper solution to fix
socket locking in hci_sock_dev_event().
Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski [Wed, 16 May 2007 05:46:18 +0000 (22:46 -0700)]
[NET]: lockdep classes in register_netdevice
After initializing dev->_xmit_lock register_netdevice()
sets lockdep class according to dev->type.
Idea of this patch - by David Miller.
Reported & tested by: "Yuriy N. Shkandybin" <jura@netams.com> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
David Woodhouse [Thu, 17 May 2007 10:48:12 +0000 (18:48 +0800)]
Fix incorrect prototype for ipxrtr_route_packet()
The function ipxrtr_route_packet() takes a 'len' argument of type
size_t. However, its prototype in af_ipx.c incorrectly suggests that the
corresponding argument is of type 'int' instead.
Discovered by building with --combine and letting the compiler see it
all at once.
Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Woodhouse [Thu, 17 May 2007 06:27:39 +0000 (14:27 +0800)]
NS16550A: Restore HS settings in EXCR2 on resume
After a suspend/resume cycle, the UART may have been reset into
low-speed mode -- either because it's actually been reset, or because
the firmware pokes at the old-style divisor registers. If we detected it
as a NS16550A SuperIO chip in the first place and set baud_base to
921600, then we should do so again in the resume path.
This patch adds that code to serial8250_resume_port(), and also makes
serial8250_resume() actually call serial8250_resume_port() for each port
instead of just calling uart_resume_port() directly. And thus fixes
serial port operation after suspend/resume.
It also fixes a bogus comment where we write the EXCR2 register with a
comment saying /* EXCR1 */
Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Thu, 17 May 2007 05:11:21 +0000 (22:11 -0700)]
mm: more rmap checking
Re-introduce rmap verification patches that Hugh removed when he removed
PG_map_lock. PG_map_lock actually isn't needed to synchronise access to
anonymous pages, because PG_locked and PTL together already do.
These checks were important in discovering and fixing a rare rmap corruption
in SLES9.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
grow_dev_page() simply passes GFP_NOFS to find_or_create_page. This means
the allocation of radix tree nodes is done with GFP_NOFS and the allocation
of a new page is done using GFP_NOFS.
The mapping has a flags field that contains the necessary allocation flags
for the page cache allocation. These need to be consulted in order to get
DMA and HIGHMEM allocations etc right. And yes a blockdev could be
allowing Highmem allocations if its a ramdisk.
Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The sysfs files /sys/power/disk and /sys/power/state do not work as
documented, since they allow the user to write only a few initial
characters of the input string to trigger the option (eg. 'echo pl >
/sys/power/disk' activates the platform mode of hibernation). Fix it.
Special thanks to Peter Moulder <Peter.Moulder@infotech.monash.edu.au> for
pointing out the problem.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Thu, 17 May 2007 05:11:19 +0000 (22:11 -0700)]
circular locking dependency found in QUOTA OFF
i_mutex on quota files is special. Unlike i_mutexes for other inodes it is
acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also
comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
file can be acquired under dqonoff_mutex). Move truncate_inode_pages()
call under dqonoff_mutex and save some problems with races...
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 17 May 2007 05:11:18 +0000 (22:11 -0700)]
i386: don't check_pgt_cache in flush_tlb_mm
No other architecture calls check_pgt_cache() from within flush_tlb_mm(),
and i386 is already calling check_pgt_cache() from the usual places,
tlb_finish_mmu() and cpu_idle() (the latter being odd, but not unusual).
flush_tlb_mm() has no business to be freeing pages: remove that line, which
sneaked in with slub's i386 support.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Andi Kleen <ak@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: William Lee Irwin III <wli@holomorphy.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Aloni [Thu, 17 May 2007 05:11:16 +0000 (22:11 -0700)]
make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core filename size
Make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core
filename size and change it to 128, so that extensive patterns such as
'/local/cores/%e-%h-%s-%t-%p.core' won't result in truncated filename
generation.
Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Thu, 17 May 2007 05:11:13 +0000 (22:11 -0700)]
gpio interface loosens call restrictions
Loosen gpio_{request,free}() and gpio_direction_{in,out}put() call context
restrictions slightly, so a common idiom is no longer an error: board init
code setting up spinlock-safe GPIOs before tasking is enabled.
The issue was caught by some paranoid code with might_sleep() checks. The
legacy platform-specific GPIO interfaces stick to spinlock-safe GPIOs, so this
change reflects current implementations and won't break anything.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Thu, 17 May 2007 05:11:12 +0000 (22:11 -0700)]
docbook: make kernel-locking table readable
Andi Kleen pointed out to me that the kernel locking cheat sheet
table entries are unreadable.
Make table entries smaller so that pdf and ps output is readable
(columns were being overwritten and garbled) by using abbreviations.
This allows the tables to fit on one page cleanly.
Add a Legend for the abbreviations:
SLIS: spin_lock_irqsave
SLI: spin_lock_irq
SL: spin_lock
SLBH: spin_lock_bh
DI: down_interruptible
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Thu, 17 May 2007 05:11:12 +0000 (22:11 -0700)]
parport: mailing list is subscribers-only
linux-parport is subscribers-only:
Your mail to 'Linux-parport' with the subject
Re: [QUESTION] parallel console configuration
Is being held until the list moderator can review it for approval.
The reason it is being held:
Post by non-member to a members-only list
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gerd Hoffmann [Thu, 17 May 2007 05:11:09 +0000 (22:11 -0700)]
Refine SCREEN_INFO sanity check for vgacon initialization
Refine SCREEN_INFO sanity check for vgacon initialization.
Checking video mode field only to see whenever SCREEN_INFO is
initialized is not enougth, in some cases it is zero although
a vga card is present. Lets additionally check cols and lines.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Alan <alan@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heiko Carstens [Thu, 17 May 2007 05:11:09 +0000 (22:11 -0700)]
Let smp_call_function_single return -EBUSY on UP
All architectures that have an implementation of smp_call_function_single
let it return -EBUSY if it is asked to execute func on the current cpu.
(akpm: except for x86_64). Therefore the UP version must always return
-EBUSY.
Bernhard Walle [Thu, 17 May 2007 05:11:06 +0000 (22:11 -0700)]
i386/x86-64: fix section mismatch
WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to
.init.text:mtrr_bp_init from .text between 'id entify_cpu' (at offset 0x6571)
and 'IRQ0x20_interrupt'
It's because identify_cpu() which is __cpuinit calls mtrr_bp_init() which is
__init(). __cpuinit() expands to nothing if CONFIG_HOTPLUG_CPU=y and so the
call is illegal.
Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aaron Durbin [Thu, 17 May 2007 05:11:06 +0000 (22:11 -0700)]
acpi: fix potential call to a freed memory section.
Strip __cpuinit[data] from Node <-> PXM routines and supporting data
structures. Also make pxm_to_node_map and node_to_pxm_map local to the
numa acpi module.
This fixes a bug triggered by the following conditions:
- boot on a machine with a SLIT table defined
- kernel is configured w/ CONFIG_HOTPLUG_CPU=n
- cat /sys/devices/system/node/node*/distance
This will cause an oops by calling into a freed memory section.
In particular, on x86_64, __node_distance calls node_to_pxm().
Signed-off-by: Aaron Durbin <adurbin@google.com> Cc: Len Brown <lenb@kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we have a maze of configuration variables that determine the
maximum slab size. Worst of all it seems to vary between SLAB and SLUB.
So define a common maximum size for kmalloc. For conveniences sake we use
the maximum size ever supported which is 32 MB. We limit the maximum size
to a lower limit if MAX_ORDER does not allow such large allocations.
For many architectures this patch will have the effect of adding large
kmalloc sizes. x86_64 adds 5 new kmalloc sizes. So a small amount of
memory will be needed for these caches (contemporary SLAB has dynamically
sizeable node and cpu structure so the waste is less than in the past)
Most architectures will then be able to allocate object with sizes up to
MAX_ORDER. We have had repeated breakage (in fact whenever we doubled the
number of supported processors) on IA64 because one or the other struct
grew beyond what the slab allocators supported. This will avoid future
issues and f.e. avoid fixes for 2k and 4k cpu support.
CONFIG_LARGE_ALLOCS is no longer necessary so drop it.
It fixes sparc64 with SLAB.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@ucw.cz> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SLUB: Do our own flags based on PG_active and PG_error
The atomicity when handling flags in SLUB is not necessary since both flags
used by SLUB are not updated in a racy way. Flag updates are either done
during slab creation or destruction or under slab_lock. Some of these flags
do not have the non atomic variants that we need. So define our own.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 17 May 2007 05:10:54 +0000 (22:10 -0700)]
slub: fix handling of oversized slabs
I'm getting zillions of undefined references to __kmalloc_size_too_large on
alpha. For some reason alpha is building out-of-line copies of kmalloc_slab()
into lots of compilation units.
It turns out that gcc just isn't smart enough to work out that
__builtin_contant_p(size)==true implies that __builtin_contant_p(index)==true.
So let's give it a bit of help.
Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is no user of destructors left. There is no reason why we should keep
checking for destructors calls in the slab allocators.
The RFC for this patch was discussed at
http://marc.info/?l=linux-kernel&m=117882364330705&w=2
Destructors were mainly used for list management which required them to take a
spinlock. Taking a spinlock in a destructor is a bit risky since the slab
allocators may run the destructors anytime they decide a slab is no longer
needed.
Patch drops destructor support. Any attempt to use a destructor will BUG().
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nick Piggin [Thu, 17 May 2007 05:10:49 +0000 (22:10 -0700)]
slob: implement RCU freeing
The SLOB allocator should implement SLAB_DESTROY_BY_RCU correctly, because
even on UP, RCU freeing semantics are not equivalent to simply freeing
immediately. This also allows SLOB to be used on SMP.
Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Rothwell [Thu, 17 May 2007 01:22:15 +0000 (11:22 +1000)]
[POWERPC] Remove build warnings in windfarm_core
drivers/macintosh/windfarm_core.c: In function 'wf_register_control':
drivers/macintosh/windfarm_core.c:219: warning: ignoring return value of 'device_create_file', declared with attribute warn_unused_result
drivers/macintosh/windfarm_core.c: In function 'wf_register_sensor':
drivers/macintosh/windfarm_core.c:329: warning: ignoring return value of 'device_create_file', declared with attribute warn_unused_result
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>