Chuck Ebbert [Thu, 18 Jun 2009 11:24:10 +0000 (19:24 +0800)]
crypto: padlock-aes - work around Nano CPU errata in ECB mode
The VIA Nano processor has a bug that makes it prefetch extra data
during encryption operations, causing spurious page faults. Extend
existing workarounds for ECB mode to copy the data to an temporary
buffer to avoid the problem.
Signed-off-by: Chuck Ebbert <cebbert@redhat.com> Acked-by: Harald Welte <HaraldWelte@viatech.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Matthew Wilcox [Wed, 17 Jun 2009 20:33:36 +0000 (16:33 -0400)]
ia64: Fix resource assignment for root busses
ia64 was assigning resources to root busses after allocations had
been made for child busses. Calling pcibios_setup_root_windows() from
pcibios_fixup_bus() solves this problem by assigning the resources to
the root bus before child busses are scanned.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Wilcox [Wed, 17 Jun 2009 20:33:33 +0000 (16:33 -0400)]
Fix pci_claim_resource
Instead of starting from the iomem or ioport roots, start from the
parent bus' resources. This fixes a bug where child resources would
appear above their parents resources if they had the same size.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 17 Jun 2009 18:53:48 +0000 (11:53 -0700)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Convert ia64 to use int-ll64.h
[IA64] Fix build error in paravirt_patchlist.c
[IA64] ia64 does not need umount2() syscall
[IA64] hook up new rt_tgsigqueueinfo syscall
[IA64] msi_ia64.c dmar_msi_type should be static
[IA64] remove obsolete hw_interrupt_type
[IA64] remove obsolete irq_desc_t typedef
[IA64] remove obsolete no_irq_type
[IA64] unexport fpswa.h
Linus Torvalds [Wed, 17 Jun 2009 17:42:21 +0000 (10:42 -0700)]
Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6
* 'kmemleak' of git://linux-arm.org/linux-2.6:
kmemleak: Fix some typos in comments
kmemleak: Rename kmemleak_panic to kmemleak_stop
kmemleak: Only use GFP_KERNEL|GFP_ATOMIC for the internal allocations
Samuel Ortiz [Mon, 15 Jun 2009 16:04:54 +0000 (18:04 +0200)]
mfd: early init for MFD running regulators
For MFDs running regulator cores, we really want them to be brought up early
during boot.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: Mike Rapoport <mike@compulab.co.il>
Philipp Zabel [Fri, 5 Jun 2009 16:31:02 +0000 (18:31 +0200)]
mfd: asic3: add clock handling for MFD cells
Since ASIC3 has to work on both PXA and S3C and since their
struct clk implementations differ, we can't register out
clocks with the clkdev mechanism (yet?).
For now we have to keep clock handling internal to this
driver and enable/disable the clocks via the
mfd_cell->enable/disable functions.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Daniel Ribeiro [Thu, 28 May 2009 18:43:37 +0000 (15:43 -0300)]
mfd: add PCAP driver
The PCAP Asic as present on EZX phones is a multi function device with
voltage regulators, ADC, touch screen controller, RTC, USB transceiver,
leds controller, and audio codec.
It has two SPI ports, typically one is connected to the application
processor and another to the baseband, this driver provides read/write
functions to its registers, irq demultiplexer and ADC
queueing/abstraction.
This chip is used on a lot of Motorola phones, it was manufactured by TI
as a custom product with the name PTWL93017, later this design evolved
into the ATLAS PMIC from Freescale (MC13783).
Signed-off-by: Daniel Ribeiro <drwyrm@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Linus Walleij [Thu, 21 May 2009 21:17:06 +0000 (23:17 +0200)]
mfd: add U300 AB3100 core support
This adds a core driver for the AB3100 mixed-signal circuit
found in the ST-Ericsson U300 series platforms. This driver
is a singleton proxy for all accesses to the AB3100
sub-drivers which will be merged on top of this one, RTC,
regulators, battery and system power control, vibrator,
LEDs, and an ALSA codec.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Reviewed-by: Mike Rapoport <mike@compulab.co.il> Reviewed-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Thomas Gleixner [Tue, 12 May 2009 20:45:15 +0000 (13:45 -0700)]
drivers/mfd: remove obsolete irq_desc_t typedef
The defines and typedefs (hw_interrupt_type, no_irq_type, irq_desc_t) have
been kept around for migration reasons. After more than two years it's
time to remove them finally.
This patch cleans up one of the remaining users. When all such patches
hit mainline we can remove the defines and typedefs finally.
Impact: cleanup
Convert the last remaining users and remove the typedef.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Adrian Bunk [Tue, 12 May 2009 20:45:14 +0000 (13:45 -0700)]
mfd/pcf50633-gpio.c: add MODULE_LICENSE
Add the missing MODULE_LICENSE("GPL").
Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Balaji Rao <balajirrao@openmoko.org> Cc: Andy Green <andy@openmoko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
mfd: Mark clocks_init as non-init in twl4030-core.c
Impact: Fix section mismatch.
clocks_init() has been called from twl4030_probe() which is a non-init
function. Since probing can be done anytime so clocks_init will be
called anytime too. So we mark clock_init() as non-init.
LD drivers/mfd/built-in.o
WARNING: drivers/mfd/built-in.o(.text+0x8dd9): Section mismatch in
reference from the function twl4030_probe() to the function
.init.text:clocks_init()
The function twl4030_probe() references
the function __init clocks_init().
This is often because twl4030_probe lacks a __init
annotation or the annotation of clocks_init is wrong.
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Catalin Marinas [Wed, 17 Jun 2009 17:29:02 +0000 (18:29 +0100)]
kmemleak: Only use GFP_KERNEL|GFP_ATOMIC for the internal allocations
Kmemleak allocates memory for pointer tracking and it tries to avoid
using GFP_ATOMIC if the caller doesn't require it. However other gfp
flags may be passed by the caller which aren't required by kmemleak.
This patch filters the gfp flags so that only GFP_KERNEL | GFP_ATOMIC
are used.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Linus Torvalds [Wed, 17 Jun 2009 16:51:50 +0000 (09:51 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] cpumask: new cpumask operators for arch/x86/kernel/cpu/cpufreq/powernow-k8.c
[CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c
[CPUFREQ] cpumask: avoid cpumask games in arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
[CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c
[CPUFREQ] powernow-k8: get drv data for correct CPU
[CPUFREQ] powernow-k8: read P-state from HW
[CPUFREQ] reduce scope of ACPI_PSS_BIOS_BUG_MSG[]
[CPUFREQ] Clean up convoluted code in arch/x86/kernel/tsc.c:time_cpufreq_notifier()
[CPUFREQ] minor correction to cpu-freq documentation
[CPUFREQ] powernow-k8.c: mess cleanup
[CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful
[CPUFREQ] powernow-k8: Set transition latency to 1 if ACPI tables export 0
[CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6:
[SCSI] aic79xx: make driver respect nvram for IU and QAS settings
[SCSI] don't attach ULD to Dell Universal Xport
[SCSI] lpfc 8.3.3 : Update driver version to 8.3.3
[SCSI] lpfc 8.3.3 : Add support for Target Reset handler entrypoint
[SCSI] lpfc 8.3.3 : Fix a couple of spin_lock and memory issues and a crash
[SCSI] lpfc 8.3.3 : FC/FCOE discovery fixes
[SCSI] lpfc 8.3.3 : Fix various SLI-3 vs SLI-4 differences
[SCSI] qla2xxx: Resolve a performance issue in interrupt
[SCSI] cnic, bnx2i: Fix build failure when CONFIG_PCI is not set.
[SCSI] nsp_cs: time_out reaches -1
[SCSI] qla2xxx: fix printk format warnings
[SCSI] ncr53c8xx: div reaches -1
[SCSI] compat: don't perform unneeded copy in sg_io code
[SCSI] zfcp: Update FC pass-through support
[SCSI] zfcp: Add FC pass-through support
[SCSI] FC Pass Thru support
Linus Torvalds [Wed, 17 Jun 2009 16:48:30 +0000 (09:48 -0700)]
Merge branch 'linux-next' of git://git.infradead.org/ubi-2.6
* 'linux-next' of git://git.infradead.org/ubi-2.6: (21 commits)
UBI: add reboot notifier
UBI: handle more error codes
UBI: fix multiple spelling typos
UBI: fix kmem_cache_free on error patch
UBI: print amount of reserved PEBs
UBI: improve messages in the WL worker
UBI: make gluebi a separate module
UBI: remove built-in gluebi
UBI: add notification API
UBI: do not switch to R/O mode on read errors
UBI: fix and clean-up error paths in WL worker
UBI: introduce new constants
UBI: fix race condition
UBI: minor serialization fix
UBI: do not panic if volume check fails
UBI: add dump_stack in checking code
UBI: fix races in I/O debugging checks
UBI: small debugging code optimization
UBI: improve debugging messages
UBI: re-name volumes_mutex to device_mutex
...
Linus Torvalds [Wed, 17 Jun 2009 16:46:33 +0000 (09:46 -0700)]
Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: start using hrtimers
hrtimer: export ktime_add_safe
UBIFS: do not forget to register BDI device
UBIFS: allow sync option in rootflags
UBIFS: remove dead code
UBIFS: use anonymous device
UBIFS: return proper error code if the compr is not present
UBIFS: return error if link and unlink race
UBIFS: reset no_space flag after inode deletion
Matthew Wilcox [Fri, 22 May 2009 20:49:49 +0000 (13:49 -0700)]
[IA64] Convert ia64 to use int-ll64.h
It is generally agreed that it would be beneficial for u64 to be an
unsigned long long on all architectures. ia64 (in common with several
other 64-bit architectures) currently uses unsigned long. Migrating
piecemeal is too painful; this giant patch fixes all compilation warnings
and errors that come as a result of switching to use int-ll64.h.
Note that userspace will still see __u64 defined as unsigned long. This
is important as it affects C++ name mangling.
[Updated by Tony Luck to change efi.h:efi_freemem_callback_t to use
u64 for start/end rather than unsigned long]
Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Wed, 17 Jun 2009 16:13:52 +0000 (09:13 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (47 commits)
MIPS: Add hibernation support
MIPS: Move Cavium CP0 hwrena impl bits to cpu-feature-overrides.h
MIPS: Allow CPU specific overriding of CP0 hwrena impl bits.
MIPS: Kconfig Add SYS_SUPPORTS_HUGETLBFS and enable it for some systems.
Hugetlbfs: Enable hugetlbfs for more systems in Kconfig.
MIPS: TLB support for hugetlbfs.
MIPS: Add hugetlbfs page defines.
MIPS: Add support files for hugetlbfs.
MIPS: Remove unused parameters from iPTE_LW.
Staging: Add octeon-ethernet driver files.
MIPS: Export erratum function needed by octeon-ethernet driver.
MIPS: Cavium-Octeon: Add more chip specific feature tests.
MIPS: Cavium-Octeon: Add more board type constants.
MIPS: Export cvmx_sysinfo_get needed by octeon-ethernet driver.
MIPS: Add named alloc functions to OCTEON boot monitor memory allocator.
MIPS: Alchemy: devboards: Convert to gpio calls.
MIPS: Alchemy: xxs1500: use linux gpio api.
MIPS: Alchemy: MTX-1: Use linux gpio api.
MIPS: Alchemy: Rewrite GPIO support.
MIPS: Alchemy: Remove unused au1000_gpio.h header
...
Jes Sorensen [Wed, 17 Jun 2009 16:04:40 +0000 (09:04 -0700)]
[IA64] Fix build error in paravirt_patchlist.c
Andrew cleaned up some #include tangles in:
commit 0d9c25dde878a636ee9a9b53923569171bf9a55b
headers: move module_bug_finalize()/module_bug_cleanup() definitions into module.h
which resulted in this build error for ia64:
CC arch/ia64/kernel/paravirt_patchlist.o
arch/ia64/kernel/paravirt_patchlist.c:43: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__initdata'
arch/ia64/kernel/paravirt_patchlist.c:54: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'paravirt_get_gate_patchlist'
arch/ia64/kernel/paravirt_patchlist.c:76: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'paravirt_get_gate_section'
make[1]: *** [arch/ia64/kernel/paravirt_patchlist.o] Error 1
The problem was that paravirt_patchlist.c was relying on some of the
nested includes (specifically that linux/bug.h included linux/module.h
Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Wed, 17 Jun 2009 15:46:57 +0000 (08:46 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
get rid of BKL in fs/sysv
get rid of BKL in fs/minix
get rid of BKL in fs/efs
befs ->pust_super() doesn't need BKL
Cleanup of adfs headers
9P doesn't need BKL in ->umount_begin()
fuse doesn't need BKL in ->umount_begin()
No instance of ->bmap() needs BKL
remove unlock_kernel() left accidentally
ext4: avoid unnecessary spinlock in critical POSIX ACL path
ext3: avoid unnecessary spinlock in critical POSIX ACL path
Wu Zhangjin [Thu, 4 Jun 2009 12:27:10 +0000 (20:27 +0800)]
MIPS: Add hibernation support
[Ralf: SMP support requires CPU hotplugging which MIPS currently doesn't
support. As implemented in this patch cache and tlb flushing will also be
invoked with interrupts disabled so smp_call_function() will blow up in
charming ways. So limit to !SMP.]
Reviewed-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Yan Hua <yanh@lemote.com> Reviewed-by: Arnaud Patard <apatard@mandriva.com> Reviewed-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Wu Zhangjin <wuzj@lemote.com> Signed-off-by: Hu Hongbing <huhb@lemote.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 28 May 2009 00:47:45 +0000 (17:47 -0700)]
Hugetlbfs: Enable hugetlbfs for more systems in Kconfig.
As part of adding hugetlbfs support for MIPS, I am adding a new
kconfig variable 'SYS_SUPPORTS_HUGETLBFS'. Since some mips cpu
varients don't yet support it, we can enable selection of HUGETLBFS on
a system by system basis from the arch/mips/Kconfig.
Signed-off-by: David Daney <ddaney@caviumnetworks.com> CC: William Irwin <wli@holomorphy.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Wed, 6 May 2009 00:35:21 +0000 (17:35 -0700)]
Staging: Add octeon-ethernet driver files.
The octeon-ethernet driver supports the sgmii, rgmii, spi, and xaui
ports present on the Cavium OCTEON family of SOCs. These SOCs are
multi-core mips64 processors with existing support over in arch/mips.
The driver files can be categorized into three basic groups:
1) Register definitions, these are named cvmx-*-defs.h
2) Main driver code, these have names that don't start cvmx-.
3) Interface specific functions and other utility code, names starting
with cvmx-
Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Manuel Lauss [Sat, 6 Jun 2009 12:09:55 +0000 (14:09 +0200)]
MIPS: Alchemy: Rewrite GPIO support.
The current in-kernel Alchemy GPIO support is far too inflexible for
all my use cases. To address this, the following changes are made:
* create generic functions which deal with manipulating the on-chip
GPIO1/2 blocks. Such functions are universally useful.
* Macros for GPIO2 shared interrupt management and block control.
* support for both built-in CONFIG_GPIOLIB and fast, inlined GPIO macros.
If CONFIG_GPIOLIB is not enabled, provide linux gpio framework
compatibility by directly inlining the GPIO1/2 functions. GPIO access
is limited to on-chip ones and they can be accessed as documented in
the datasheets (GPIO0-31 and 200-215).
If CONFIG_GPIOLIB is selected, two (2) gpio_chip-s, one for GPIO1 and
one for GPIO2, are registered. GPIOs can still be accessed by using
the numberspace established in the databooks.
However this is not yet flexible enough for my uses: My Alchemy
systems have a documented "external" gpio interface (fixed, different
numberspace) and can support a variety of baseboards, some of which
are equipped with I2C gpio expanders. I want to be able to provide
the default 16 GPIOs of the CPU board numbered as 0..15 and also
support gpio expanders, if present, starting as gpio16.
To achieve this, a new Kconfig symbol for Alchemy is introduced,
CONFIG_ALCHEMY_GPIO_INDIRECT, which boards can enable to signal
that they don't want the Alchemy numberspace exposed to the outside
world, but instead want to provide their own. Boards are now respon-
sible for providing the linux gpio interface glue code (either in a
custom gpio.h header (in board include directory) or with gpio_chips).
To make the board-specific inlined gpio functions work, the MIPS
Makefile must be changed so that the mach-au1x00/gpio.h header is
included _after_ the board headers, by moving the inclusion of
the mach-au1x00/ to the end of the header list.
See arch/mips/include/asm/mach-au1x00/gpio.h for more info.
Ralf Baechle [Wed, 17 Jun 2009 10:06:28 +0000 (11:06 +0100)]
MIPS: ioctl.h: Cleanup.
o Rewrite to use <asm-generic/ioctl.h>. Cuts down the file from 40 to
16 lines.
o Delete _IOC_VOID, _IOC_OUT, _IOC_IN and _IOC_INOUT. They were added
for 2.1.14 but I was not able to find any user - not even historical
ones.
Imre Kaloz [Tue, 2 Jun 2009 12:22:06 +0000 (14:22 +0200)]
MIPS: Sibyte: Remove standalone kernel support
CFE is the only supported and used bootloader on the SiByte boards,
the standalone kernel support has been never used outside Broadcom.
Remove it and make the kernel use CFE by default.
Signed-off-by: Imre Kaloz <kaloz@openwrt.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Florian Fainelli [Thu, 21 May 2009 17:49:47 +0000 (19:49 +0200)]
MIPS: RB532: Check irq number when handling GPIO interrupts
This patch makes sure that we are not going to clear
or change the interrupt status of a GPIO interrupt
superior to 13 as this is the maximum number of GPIO
interrupt source (p.232 of the RC32434 reference manual).
David Daney [Tue, 12 May 2009 19:41:53 +0000 (12:41 -0700)]
MIPS: Allow R2 CPUs to turn off generation of 'ehb' instructions.
Some CPUs do not need ehb instructions after writing CP0 registers.
By allowing ehb generation to be overridden in
cpu-feature-overrides.h, we can save a few instructions in the TLB
handler hot paths.
Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Wed, 20 May 2009 18:40:59 +0000 (11:40 -0700)]
MIPS: Fold the TLB refill at the vmalloc path if possible.
Try to fold the 64-bit TLB refill handler opportunistically at the
beginning of the vmalloc path so as to avoid splitting execution flow in
half and wasting cycles for a branch required at that point then. Resort
to doing the split if either of the newly created parts would not fit into
its designated slot.
Original-patch-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Wed, 20 May 2009 18:40:58 +0000 (11:40 -0700)]
MIPS: Replace some magic numbers with symbolic values in tlbex.c
The logic used to split the r4000 refill handler is liberally
sprinkled with magic numbers. We attempt to explain what they are and
normalize them against a new symbolic value (MIPS64_REFILL_INSNS).
CC: David VomLehn <dvomlehn@cisco.com> Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Wed, 17 Jun 2009 10:06:24 +0000 (11:06 +0100)]
MIPS: SB1250: Sort out merge mistake.
A wrong resolution of a merge conflict made the recently deleted wrong
error check in sb1250_set_affinity. Send the zombie back to the empire
of the undead.
were uncorrectly merged.
The former removes one pair of lock/unlock_kernel(), but the latter adds
several unlock_kernel(). Finally a few unlock_kernel() calls left.
Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Theodore Ts'o [Mon, 8 Jun 2009 19:22:25 +0000 (15:22 -0400)]
ext4: avoid unnecessary spinlock in critical POSIX ACL path
If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem
to do POSIX ACL checks on any files not owned by the caller, and it does
this for every single pathname component that it looks up.
That obviously can be pretty expensive if the filesystem isn't careful
about it, especially with locking. That's doubly sad, since the common
case tends to be that there are no ACL's associated with the files in
question.
ext4 already caches the ACL data so that it doesn't have to look it up
over and over again, but it does so by taking the inode->i_lock spinlock
on every lookup. Which is a noticeable overhead even if it's a private
lock, especially on CPU's where the serialization is expensive (eg Intel
Netburst aka 'P4').
For the special case of not actually having any ACL's, all that locking is
unnecessary. Even if somebody else were to be changing the ACL's on
another CPU, we simply don't care - if we've seen a NULL ACL, we might as
well use it.
So just load the ACL speculatively without any locking, and if it was
NULL, just use it. If it's non-NULL (either because we had a cached
entry, or because the cache hasn't been filled in at all), it means that
we'll need to get the lock and re-load it properly.
(This commit was ported from a patch originally authored by Linus for
ext3.)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Mon, 8 Jun 2009 19:22:24 +0000 (15:22 -0400)]
ext3: avoid unnecessary spinlock in critical POSIX ACL path
If a filesystem supports POSIX ACL's, the VFS layer expects the filesystem
to do POSIX ACL checks on any files not owned by the caller, and it does
this for every single pathname component that it looks up.
That obviously can be pretty expensive if the filesystem isn't careful
about it, especially with locking. That's doubly sad, since the common
case tends to be that there are no ACL's associated with the files in
question.
ext3 already caches the ACL data so that it doesn't have to look it up
over and over again, but it does so by taking the inode->i_lock spinlock
on every lookup. Which is a noticeable overhead even if it's a private
lock, especially on CPU's where the serialization is expensive (eg Intel
Netburst aka 'P4').
For the special case of not actually having any ACL's, all that locking is
unnecessary. Even if somebody else were to be changing the ACL's on
another CPU, we simply don't care - if we've seen a NULL ACL, we might as
well use it.
So just load the ACL speculatively without any locking, and if it was
NULL, just use it. If it's non-NULL (either because we had a cached
entry, or because the cache hasn't been filled in at all), it means that
we'll need to get the lock and re-load it properly.
This is noticeable even on Nehalem, which does locking quite well (much
better than P4). From lmbench:
Processor, Processes - times in microseconds - smaller is better
--------------------------------------------------------------------
Host OS Mhz null null open slct fork exec sh
call I/O stat clos TCP proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ----
- before:
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.45 2.18 69.1 273. 1141
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.95 1.48 2.28 69.9 253. 1140
nehalem.l Linux 2.6.30- 3193 0.04 0.10 0.95 1.42 2.19 68.6 284. 1141
- after:
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.44 2.12 68.3 282. 1094
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.20 67.0 308. 1123
nehalem.l Linux 2.6.30- 3193 0.04 0.09 0.92 1.39 2.36 67.4 293. 1148
where you can see what appears to be a roughly 3% improvement in stat
and open/close latencies from just the removal of the locking overhead.
Of course, this only matters for files you don't own (the owner never
needs to do the ACL checks), but that's the common case for libraries,
header files, and executables. As well as for the base components of any
absolute pathname, even if you are the owner of the final file.
[ At some point we probably want to move this ACL caching logic entirely
into the VFS layer (and only call down to the filesystem when
uncached), but in the meantime this improves ext3 a bit.
A similar fix to btrfs makes a much bigger difference (15x improvement
in lmbench) due to broken caching. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>