Linus Torvalds [Tue, 15 Feb 2011 23:25:11 +0000 (15:25 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
pci: use security_capable() when checking capablities during config space read
Andrea Arcangeli [Tue, 15 Feb 2011 18:02:45 +0000 (19:02 +0100)]
thp: prevent hugepages during args/env copying into the user stack
Transparent hugepages can only be created if rmap is fully
functional. So we must prevent hugepages to be created while
is_vma_temporary_stack() is true.
This also optmizes away some harmless but unnecessary setting of
khugepaged_scan.address and it switches some BUG_ON to VM_BUG_ON.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 15 Feb 2011 23:19:45 +0000 (15:19 -0800)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI / Video: Probe for output switch method when searching video devices.
ACPI / Wakeup: Enable button GPEs unconditionally during initialization
ACPI / ACPICA: Avoid crashing if _PRW is defined for the root object
ACPI: Fix acpi_os_read_memory() and acpi_os_write_memory() (v2)
Linus Torvalds [Tue, 15 Feb 2011 20:07:35 +0000 (12:07 -0800)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (21 commits)
dmaengine: add slave-dma maintainer
dma: ipu_idmac: do not lose valid received data in the irq handler
dmaengine: imx-sdma: fix up param for the last BD in sdma_prep_slave_sg()
dmaengine: imx-sdma: correct sdmac->status in sdma_handle_channel_loop()
dmaengine: imx-sdma: return sdmac->status in sdma_tx_status()
dmaengine: imx-sdma: set sdmac->status to DMA_ERROR in err_out of sdma_prep_slave_sg()
dmaengine: imx-sdma: remove IMX_DMA_SG_LOOP handling in sdma_prep_slave_sg()
dmaengine i.MX dma: initialize dma capabilities outside channel loop
dmaengine i.MX DMA: do not initialize chan_id field
dmaengine i.MX dma: check sg entries for valid addresses and lengths
dmaengine i.MX dma: set maximum segment size for our device
dmaengine i.MX SDMA: reserve channel 0 by not registering it
dmaengine i.MX SDMA: initialize dma capabilities outside channel loop
dmaengine i.MX SDMA: do not initialize chan_id field
dmaengine i.MX sdma: check sg entries for valid addresses and lengths
dmaengine i.MX sdma: set maximum segment size for our device
DMA: PL08x: fix channel pausing to timeout rather than lockup
DMA: PL08x: fix infinite wait when terminating transfers
dmaengine: imx-sdma: fix inconsistent naming in sdma_assign_cookie()
dmaengine: imx-sdma: propagate error in sdma_probe() instead of returning 0
...
Linus Torvalds [Tue, 15 Feb 2011 20:06:38 +0000 (12:06 -0800)]
Merge branch 'for-2.6.38' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.38' of git://linux-nfs.org/~bfields/linux:
nfsd: break lease on unlink due to rename
nfsd4: acquire only one lease per file
nfsd4: modify fi_delegations under recall_lock
nfsd4: remove unused deleg dprintk's.
nfsd4: split lease setting into separate function
nfsd4: fix leak on allocation error
nfsd4: add helper function for lease setup
nfsd4: split up nfsd_break_deleg_cb
NFSD: memory corruption due to writing beyond the stat array
NFSD: use nfserr for status after decode_cb_op_status
nfsd: don't leak dentry count on mnt_want_write failure
Linus Torvalds [Tue, 15 Feb 2011 18:19:18 +0000 (10:19 -0800)]
Merge branches 'core-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
Revert "lockdep, timer: Fix del_timer_sync() annotation"
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timer debug: Hide kernel addresses via %pK in /proc/timer_list
Linus Torvalds [Tue, 15 Feb 2011 18:18:48 +0000 (10:18 -0800)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Fix text_poke_smp_batch() deadlock
perf tools: Fix thread_map event synthesizing in top and record
watchdog, nmi: Lower the severity of error messages
ARM: oprofile: Fix backtraces in timer mode
oprofile: Fix usage of CONFIG_HW_PERF_EVENTS for oprofile_perf_init and friends
Linus Torvalds [Tue, 15 Feb 2011 18:18:29 +0000 (10:18 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, dmi, debug: Log board name (when present) in dmesg/oops output
x86, ioapic: Don't warn about non-existing IOAPICs if we have none
x86: Fix mwait_usable section mismatch
x86: Readd missing irq_to_desc() in fixup_irq()
x86: Fix section mismatch in LAPIC initialization
Linus Torvalds [Tue, 15 Feb 2011 16:06:36 +0000 (08:06 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu()
drop out of RCU in return_reval
split do_revalidate() into RCU and non-RCU cases
in do_lookup() split RCU and non-RCU cases of need_revalidate
nothing in do_follow_link() is going to see RCU
task_show_regs used to be a debugging aid in the early bringup days
of Linux on s390. /proc/<pid>/status is a world readable file, it
is not a good idea to show the registers of a process. The only
correct fix is to remove task_show_regs.
Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Wright [Tue, 15 Feb 2011 01:21:49 +0000 (17:21 -0800)]
pci: use security_capable() when checking capablities during config space read
This reintroduces commit 47970b1b which was subsequently reverted
as f00eaeea. The original change was broken and caused X startup
failures and generally made privileged processes incapable of reading
device dependent config space. The normal capable() interface returns
true on success, but the LSM interface returns 0 on success. This thinko
is now fixed in this patch, and has been confirmed to work properly.
So, once again...Eric Paris noted that commit de139a3 ("pci: check caps
from sysfs file open to read device dependent config space") caused the
capability check to bypass security modules and potentially auditing.
Rectify this by calling security_capable() when checking the open file's
capabilities for config space reads.
Reported-by: Eric Paris <eparis@redhat.com> Tested-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: James Morris <jmorris@namei.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Alex Riesen <raa.lkml@gmail.com> Cc: Sedat Dilek <sedat.dilek@googlemail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: James Morris <jmorris@namei.org>
Naga Chumbalkar [Mon, 14 Feb 2011 22:47:17 +0000 (22:47 +0000)]
x86, dmi, debug: Log board name (when present) in dmesg/oops output
The "Type 2" SMBIOS record that contains Board Name is not
strictly required and may be absent in the SMBIOS on some
platforms.
( Please note that Type 2 is not listed in Table 3 in Sec 6.2
("Required Structures and Data") of the SMBIOS v2.7
Specification. )
Use the Manufacturer Name (aka System Vendor) name.
Print Board Name only when it is present.
Before the fix:
(i) dmesg output: DMI: /ProLiant DL380 G6, BIOS P62 01/29/2011
(ii) oops output: Pid: 2170, comm: bash Not tainted 2.6.38-rc4+ #3 /ProLiant DL380 G6
After the fix:
(i) dmesg output: DMI: HP ProLiant DL380 G6, BIOS P62 01/29/2011
(ii) oops output: Pid: 2278, comm: bash Not tainted 2.6.38-rc4+ #4 HP ProLiant DL380 G6
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: <stable@kernel.org> # .3x - good for debugging, please apply as far back as it applies cleanly
LKML-Reference: <20110214224423.2182.13929.sendpatchset@nchumbalkar.americas.hpqcorp.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Paul Bolle [Mon, 14 Feb 2011 21:52:38 +0000 (22:52 +0100)]
x86, ioapic: Don't warn about non-existing IOAPICs if we have none
mp_find_ioapic() prints errors like:
ERROR: Unable to locate IOAPIC for GSI 13
if it can't find the IOAPIC that manages that specific GSI. I
see errors like that at every boot of a laptop that apparently
doesn't have any IOAPICs.
But if there are no IOAPICs it doesn't seem to be an error that
none can be found. A solution that gets rid of this message is
to directly return if nr_ioapics (still) is zero. (But keep
returning -1 in that case, so nothing breaks from this change.)
The call chain that generates this error is:
pnpacpi_allocated_resource()
case ACPI_RESOURCE_TYPE_IRQ:
pnpacpi_parse_allocated_irqresource()
acpi_get_override_irq()
mp_find_ioapic()
Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Mon, 14 Feb 2011 22:49:29 +0000 (14:49 -0800)]
Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm
* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: 6657/1: hw_breakpoint: fix ptrace breakpoint advertising on unsupported arch
ARM: 6656/1: hw_breakpoint: avoid UNPREDICTABLE behaviour when reading DBGDSCR
ARM: 6658/1: collie: do actually pass locomo_info to locomo driver
ARM: 6659/1: Thumb-2: Make CONFIG_OABI_COMPAT depend on !CONFIG_THUMB2_KERNEL
ARM: 6654/1: perf/oprofile: fix off-by-one in stack check
ARM: fixup SMP alternatives in modules
ARM: make SWP emulation explicit on !CPU_USE_DOMAINS
ARM: Avoid building unsafe kernels on OMAP2 and MX3
ARM: pxa: Properly configure PWM period for palm27x
ARM: pxa: only save/restore registers when pm functions are defined
ARM: pxa/colibri: use correct SD detect pin
ARM: pxa: fix mfpr_sync to read from valid offset
Tsutomu Itoh [Mon, 14 Feb 2011 00:45:29 +0000 (00:45 +0000)]
Btrfs: check return value of alloc_extent_map()
I add the check on the return value of alloc_extent_map() to several places.
In addition, alloc_extent_map() returns only the address or NULL.
Therefore, check by IS_ERR() is unnecessary. So, I remove IS_ERR() checking.
Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
space_args.space_slots is an unsigned 64-bit type controlled by a
possibly unprivileged caller. The comparison as a signed int type
allows providing values that are treated as negative and cause the
subsequent allocation size calculation to wrap, or be truncated to 0.
By providing a size that's truncated to 0, kmalloc() will return
ZERO_SIZE_PTR. It's also possible to provide a value smaller than the
slot count. The subsequent loop ignores the allocation size when
copying data in, resulting in a heap overflow or write to ZERO_SIZE_PTR.
The fix changes the slot count type and comparison typecast to u64,
which prevents truncation or signedness errors, and also ensures that we
don't copy more data than we've allocated in the subsequent loop. Note
that zero-size allocations are no longer possible since there is already
an explicit check for space_args.space_slots being 0 and truncation of
this value is no longer an issue.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Signed-off-by: Josef Bacik <josef@redhat.com> Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Linus Torvalds [Mon, 14 Feb 2011 18:10:07 +0000 (10:10 -0800)]
Merge branch 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
RTC: Fix minor compile warning
RTC: Convert rtc drivers to use the alarm_irq_enable method
RTC: Fix rtc driver ioctl specific shortcutting
Chris Mason [Mon, 14 Feb 2011 17:52:08 +0000 (12:52 -0500)]
Btrfs: don't release pages when we can't clear the uptodate bits
Btrfs tracks uptodate state in an rbtree as well as in the
page bits. This is supposed to enable us to use block sizes other than
the page size, but there are a few parts still missing before that
completely works.
But, our readpage routine trusts this additional range based tracking
of uptodateness, much in the same way the buffer head up to date bits
are trusted for the other filesystems.
The problem is that sometimes we need to allocate memory in order to
split records in the rbtree, even when we are just clearing bits. This
can be difficult when our clearing function is called GFP_ATOMIC, which
can happen in the releasepage path.
So, what happens today looks like this:
releasepage called with GFP_ATOMIC
btrfs_releasepage calls clear_extent_bit
clear_extent_bit fails to allocate ram, leaving the up to date bit set
btrfs_releasepage returns success
The end result is the page being gone, but btrfs thinking the range is
up to date. Later on if someone tries to read that same page, the
btrfs readpage code will return immediately thinking the page is already
up to date.
This commit fixes things to fail the releasepage when we can't clear the
extent state bits. It covers both data pages and metadata tree blocks.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Thu, 10 Feb 2011 17:35:00 +0000 (12:35 -0500)]
Btrfs: fix page->private races
There is a race where btrfs_releasepage can drop the
page->private contents just as alloc_extent_buffer is setting
up pages for metadata. Because of how the Btrfs page flags work,
this results in us skipping the crc on the page during IO.
This patch sovles the race by waiting until after the extent buffer
is inserted into the radix tree before it sets page private.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
J. Bruce Fields [Sun, 6 Feb 2011 21:46:30 +0000 (16:46 -0500)]
nfsd: break lease on unlink due to rename
4795bb37effb7b8fe77e2d2034545d062d3788a8 "nfsd: break lease on unlink,
link, and rename", only broke the lease on the file that was being
renamed, and didn't handle the case where the target path refers to an
already-existing file that will be unlinked by a rename--in that case
the target file should have any leases broken as well.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Tue, 1 Feb 2011 00:20:39 +0000 (19:20 -0500)]
nfsd4: acquire only one lease per file
Instead of acquiring one lease each time another client opens a file,
nfsd can acquire just one lease to represent all of them, and reference
count it to determine when to release it.
This fixes a regression introduced by c45821d263a8a5109d69a9e8942b8d65bcd5f31a "locks: eliminate fl_mylease
callback": after that patch, only the struct file * is used to determine
who owns a given lease. But since we recently converted the server to
share a single struct file per open, if we acquire multiple leases on
the same file from nfsd, it then becomes impossible on unlocking a lease
to determine which of those leases (all of whom share the same struct
file *) we meant to remove.
Thanks to Takashi Iwai <tiwai@suse.de> for catching a bug in a previous
version of this patch.
Tested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFSD: memory corruption due to writing beyond the stat array
If nfsd fails to find an exported via NFS file in the readahead cache, it
should increment corresponding nfsdstats counter (ra_depth[10]), but due to a
bug it may instead write to ra_depth[11], corrupting the following field.
In a kernel with NFSDv4 compiled in the corruption takes the form of an
increment of a counter of the number of NFSv4 operation 0's received; since
there is no operation 0, this is harmless.
In a kernel with NFSDv4 disabled it corrupts whatever happens to be in the
memory beyond nfsdstats.
Signed-off-by: Konstantin Khorenko <khorenko@openvz.org> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Dan Williams [Mon, 14 Feb 2011 08:42:08 +0000 (00:42 -0800)]
dmaengine: add slave-dma maintainer
Slave-dma has become the predominant usage model for dmaengine and needs
special attention. Memory-to-memory dma usage cases will continue to be
maintained by Dan.
Cc: Alan Cox <alan@linux.intel.com> Acked-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
dma: ipu_idmac: do not lose valid received data in the irq handler
Currently when two or more buffers are queued by the camera driver
and so the double buffering is enabled in the idmac, we lose one
frame comming from CSI since the reporting of arrival of the first
frame is deferred by the DMAIC_7_EOF interrupt handler and reporting
of the arrival of the last frame is not done at all. So when requesting
N frames from the image sensor we actually receive N - 1 frames in
user space.
The reason for this behaviour is that the DMAIC_7_EOF interrupt
handler misleadingly assumes that the CUR_BUF flag is pointing to the
buffer used by the IDMAC. Actually it is not the case since the
CUR_BUF flag will be flipped by the FSU when the FSU is sending the
<TASK>_NEW_FRM_RDY signal when new frame data is delivered by the CSI.
When sending this singal, FSU updates the DMA_CUR_BUF and the
DMA_BUFx_RDY flags: the DMA_CUR_BUF is flipped, the DMA_BUFx_RDY
is cleared, indicating that the frame data is beeing written by
the IDMAC to the pointed buffer. DMA_BUFx_RDY is supposed to be
set to the ready state again by the MCU, when it has handled the
received data. DMAIC_7_CUR_BUF flag won't be flipped here by the
IPU, so waiting for this event in the EOF interrupt handler is wrong.
Actually there is no spurious interrupt as described in the comments,
this is the valid DMAIC_7_EOF interrupt indicating reception of the
frame from CSI.
The patch removes code that waits for flipping of the DMAIC_7_CUR_BUF
flag in the DMAIC_7_EOF interrupt handler. As the comment in the
current code denotes, this waiting doesn't help anyway. As a result
of this removal the reporting of the first arrived frame is not
deferred to the time of arrival of the next frame and the drivers
software flag 'ichan->active_buffer' is in sync with DMAIC_7_CUR_BUF
flag, so the reception of all requested frames works.
This has been verified on the hardware which is triggering the
image sensor by the programmable state machine, allowing to
obtain exact number of frames. On this hardware we do not tolerate
losing frames.
This patch also removes resetting the DMA_BUFx_RDY flags of
all channels in ipu_disable_channel() since transfers on other
DMA channels might be triggered by other running tasks and the
buffers should always be ready for data sending or reception.
Signed-off-by: Anatolij Gustschin <agust@denx.de> Reviewed-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
David Miller [Mon, 14 Feb 2011 00:37:07 +0000 (16:37 -0800)]
klist: Fix object alignment on 64-bit.
Commit c0e69a5bbc6f ("klist.c: bit 0 in pointer can't be used as flag")
intended to make sure that all klist objects were at least pointer size
aligned, but used the constant "4" which only works on 32-bit.
Use "sizeof(void *)" which is correct in all cases.
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: stable <stable@kernel.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 13 Feb 2011 15:59:48 +0000 (07:59 -0800)]
Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
devicetree-discuss is moderated for non-subscribers
MAINTAINERS: Add entry for GPIO subsystem
dt: add documentation of ARM dt boot interface
dt: Remove obsolete description of powerpc boot interface
dt: Move device tree documentation out of powerpc directory
spi/spi_sh_msiof: fix wrong address calculation, which leads to an Oops
Linus Torvalds [Sun, 13 Feb 2011 15:58:50 +0000 (07:58 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: hda - add quirk for Ordissimo EVE using a realtek ALC662
ALSA: hrtimer: remove superfluous tasklet invocation
ALSA: hrtimer: handle delayed timer interrupts
ALSA: HDA: Add subwoofer quirk for Acer Aspire 8942G
ALSA: hda - Don't handle empty patch files
ALSA: hda - Fix missing CA initialization for HDMI/DP
ALSA: usbaudio - Enable the E-MU 0204 USB
ALSA: hda - switch lfe with side in mixer for 4930g
ASoC: Improve WM8994 digital power sequencing
ASoC: Create an AIF1ADCDAT signal widget to match AIF2
asoc: davinci: da830/omap-l137: correct cpu_dai_name
ASoC: fill in snd_soc_pcm_runtime.card before calling snd_soc_dai_link.init()
It turns out it breaks several distributions. Looks like the stricter
selinux checks fail due to selinux policies not being set to allow the
access - breaking X, but also lspci.
So while the change was clearly the RightThing(tm) to do in theory, in
practice we have backwards compatibility issues making it not work.
Reported-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: David Airlie <airlied@linux.ie> Acked-by: Alex Riesen <raa.lkml@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: James Morris <jmorris@namei.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 12 Feb 2011 17:10:24 +0000 (09:10 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: call __jbd2_log_start_commit with j_state_lock write locked
ext4: serialize unaligned asynchronous DIO
ext4: make grpinfo slab cache names static
ext4: Fix data corruption with multi-block writepages support
ext4: fix up ext4 error handling
ext4: unregister features interface on module unload
ext4: fix panic on module unload when stopping lazyinit thread
Theodore Ts'o [Sat, 12 Feb 2011 13:18:24 +0000 (08:18 -0500)]
jbd2: call __jbd2_log_start_commit with j_state_lock write locked
On an SMP ARM system running ext4, I've received a report that the
first J_ASSERT in jbd2_journal_commit_transaction has been triggering:
J_ASSERT(journal->j_running_transaction != NULL);
While investigating possible causes for this problem, I noticed that
__jbd2_log_start_commit() is getting called with j_state_lock only
read-locked, in spite of the fact that it's possible for it might
j_commit_request. Fix this by grabbing the necessary information so
we can test to see if we need to start a new transaction before
dropping the read lock, and then calling jbd2_log_start_commit() which
will grab the write lock.
Eric Sandeen [Sat, 12 Feb 2011 13:17:34 +0000 (08:17 -0500)]
ext4: serialize unaligned asynchronous DIO
ext4 has a data corruption case when doing non-block-aligned
asynchronous direct IO into a sparse file, as demonstrated
by xfstest 240.
The root cause is that while ext4 preallocates space in the
hole, mappings of that space still look "new" and
dio_zero_block() will zero out the unwritten portions. When
more than one AIO thread is going, they both find this "new"
block and race to zero out their portion; this is uncoordinated
and causes data corruption.
Dave Chinner fixed this for xfs by simply serializing all
unaligned asynchronous direct IO. I've done the same here.
The difference is that we only wait on conversions, not all IO.
This is a very big hammer, and I'm not very pleased with
stuffing this into ext4_file_write(). But since ext4 is
DIO_LOCKING, we need to serialize it at this high level.
I tried to move this into ext4_ext_direct_IO, but by then
we have the i_mutex already, and we will wait on the
work queue to do conversions - which must also take the
i_mutex. So that won't work.
This was originally exposed by qemu-kvm installing to
a raw disk image with a normal sector-63 alignment. I've
tested a backport of this patch with qemu, and it does
avoid the corruption. It is also quite a lot slower
(14 min for package installs, vs. 8 min for well-aligned)
but I'll take slow correctness over fast corruption any day.
Mingming suggested that we can track outstanding
conversions, and wait on those so that non-sparse
files won't be affected, and I've implemented that here;
unaligned AIO to nonsparse files won't take a perf hit.
[tytso@mit.edu: Keep the mutex as a hashed array instead
of bloating the ext4 inode]
[tytso@mit.edu: Fix up namespace issues so that global
variables are protected with an "ext4_" prefix.]
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Sat, 12 Feb 2011 13:12:18 +0000 (08:12 -0500)]
ext4: make grpinfo slab cache names static
In 2.6.37 I was running into oopses with repeated module
loads & unloads. I tracked this down to:
fb1813f4 ext4: use dedicated slab caches for group_info structures
(this was in addition to the features advert unload problem)
The kstrdup & subsequent kfree of the cache name was causing
a double free. In slub, at least, if I read it right it allocates
& frees the name itself, slab seems to do something different...
so in slub I think we were leaking -our- cachep->name, and double
freeing the one allocated by slub.
After getting lost in slab/slub/slob a bit, I just looked at other
sized-caches that get allocated. jbd2, biovec, sgpool all do it
more or less the way jbd2 does. Below patch follows the jbd2
method of dynamically allocating a cache at mount time from
a list of static names.
(This might also possibly fix a race creating the caches with
parallel mounts running).
[Folded in a fix from Dan Carpenter which fixed an off-by-one error in
the original patch]
Cc: stable@kernel.org Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Thomas Gleixner [Sat, 12 Feb 2011 10:51:03 +0000 (11:51 +0100)]
x86: Readd missing irq_to_desc() in fixup_irq()
commit a3c08e5d(x86: Convert irq_chip access to new functions)
accidentally zapped desc = irq_to_desc(irq); in the vector loop.
So we lock some random irq descriptor.
Add it back.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@kernel.org> # .37
Michael Karcher [Sat, 12 Feb 2011 00:40:16 +0000 (01:40 +0100)]
ACPI / Video: Probe for output switch method when searching video devices.
This patch reverts one hunk of 677bd810eedce61edf15452491781ff046b92edc
"ACPI video: remove output switching control", namely the removal of
probing for _DOS/_DOD when searching for video devices.
This is needed on some Fujitsu Laptops (at least S7110, P8010) for the
ACPI backlight interface to work, as an these machines, neither ROM nor
posting methods are available, and after removal of output switching,
none of the caps triggers, which prevents the backlight search from
being entered.
Tested on a Fujitsu Lifebook S7110 and Fujitsu Lifebook P8010.
This probably fixes https://bugzilla.kernel.org/show_bug.cgi?id=27312
for the people who have no entry in /sys/class/backlight.
This is the complete list of public (starting with "_") methods implemented
on the S7110, BIOS rev 1.34:
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de> Acked-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
ACPI / Wakeup: Enable button GPEs unconditionally during initialization
Commit 9630bdd (ACPI: Use GPE reference counting to support shared
GPEs) introduced a suspend regression where boxes resume immediately
after being suspended due to the lid or sleep button wakeup status
not being cleared properly. This happens if the GPEs corresponding
to those devices are not enabled all the time, which apparently is
expected by some BIOSes.
To fix this problem, enable button and lid GPEs unconditionally
during initialization and keep them enabled all the time, regardless
of whether or not the ACPI button driver is used.
References: https://bugzilla.kernel.org/show_bug.cgi?id=27372 Reported-and-tested-by: Ferenc Wágner <wferi@niif.hu> Cc: stable@kernel.org Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
ACPI / ACPICA: Avoid crashing if _PRW is defined for the root object
Some ACPI BIOSes define _PRW for the root object which causes
acpi_setup_gpe_for_wake() to crash when trying to dereference the
bogus device_node pointer. Avoid the crash by checking if
wake_device is not the root object before attempting to set up the
"implicit notify" mechanism for it.
The problem was introduced by commit bba63a296ffab20e08d9e8252d2f0d99
(ACPICA: Implicit notify support) that added the wake_device argument
to acpi_setup_gpe_for_wake().
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: don't always drop malformed replies on the floor (try #3)
cifs: clean up checks in cifs_echo_request
[CIFS] Do not send SMBEcho requests on new sockets until SMBNegotiate
Linus Torvalds [Sat, 12 Feb 2011 00:16:03 +0000 (16:16 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
pci: use security_capable() when checking capablities during config space read
security: add cred argument to security_capable()
tpm_tis: Use timeouts returned from TPM
Linus Torvalds [Sat, 12 Feb 2011 00:15:15 +0000 (16:15 -0800)]
Merge branch 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung
* 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: SAMSUNG: Ensure struct sys_device is declared in plat/pm.h
ARM: S5PV310: Cleanup System MMU
ARM: S5PV310: Add support System MMU on SMDKV310
This code makes two calls to clk_get, then test both return values and
fails if either failed.
The problem is that in the first inner if, where the first call to
clk_get has failed, it don't know if the second call has failed as well.
So it don't know whether clk_get should be called on the result of the
second call. Of course, it would be possible to test that value again.
A simpler solution is just to test the result of calling clk_get
directly after each call.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
position p1,p2;
expression e;
statement S;
@@
Johannes Weiner [Thu, 10 Feb 2011 23:01:34 +0000 (15:01 -0800)]
vmscan: fix zone shrinking exit when scan work is done
Commit 3e7d34497067 ("mm: vmscan: reclaim order-0 and use compaction
instead of lumpy reclaim") introduced an indefinite loop in
shrink_zone().
It meant to break out of this loop when no pages had been reclaimed and
not a single page was even scanned. The way it would detect the latter
is by taking a snapshot of sc->nr_scanned at the beginning of the
function and comparing it against the new sc->nr_scanned after the scan
loop. But it would re-iterate without updating that snapshot, looping
forever if sc->nr_scanned changed at least once since shrink_zone() was
invoked.
This is not the sole condition that would exit that loop, but it
requires other processes to change the zone state, as the reclaimer that
is stuck obviously can not anymore.
This is only happening for higher-order allocations, where reclaim is
run back to back with compaction.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Michal Hocko <mhocko@suse.cz> Tested-by: Kent Overstreet<kent.overstreet@gmail.com> Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the page is going to be written to, __do_page needs to break COW.
However, the old page (before breaking COW) was never mapped mapped into
the current pte (__do_fault is only called when the pte is not present),
so vmscan can't have marked the old page as PageMlocked due to being
mapped in __do_fault's VMA. Therefore, __do_fault() does not need to
worry about clearing PageMlocked() on the old page.
Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mlock: fix race when munlocking pages in do_wp_page()
vmscan can lazily find pages that are mapped within VM_LOCKED vmas, and
set the PageMlocked bit on these pages, transfering them onto the
unevictable list. When do_wp_page() breaks COW within a VM_LOCKED vma,
it may need to clear PageMlocked on the old page and set it on the new
page instead.
This change fixes an issue where do_wp_page() was clearing PageMlocked
on the old page while the pte was still pointing to it (as well as
rmap). Therefore, we were not protected against vmscan immediately
transfering the old page back onto the unevictable list. This could
cause pages to get stranded there forever.
I propose to move the corresponding code to the end of do_wp_page(),
after the pte (and rmap) have been pointed to the new page.
Additionally, we can use munlock_vma_page() instead of
clear_page_mlock(), so that the old page stays mlocked if there are
still other VM_LOCKED vmas mapping it.
Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Thu, 10 Feb 2011 23:01:30 +0000 (15:01 -0800)]
memblock: don't adjust size in memblock_find_base()
While applying patch to use memblock to find aperture for 64bit x86.
Ingo found system with 1g + force_iommu
> No AGP bridge found
> Node 0: aperture @ 38000000 size 32 MB
> Aperture pointing to e820 RAM. Ignoring.
> Your BIOS doesn't leave a aperture memory hole
> Please enable the IOMMU option in the BIOS setup
> This costs you 64 MB of RAM
> Cannot allocate aperture memory hole (0,65536K)
fails because memblock core code align the size with 512M. That could
make size way too big.
So don't align the size in that case.
actually __memblock_alloc_base, the another caller already align that
before calling that function.
BTW. x86 does not use __memblock_alloc_base...
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Soren Hansen [Thu, 10 Feb 2011 23:01:28 +0000 (15:01 -0800)]
nbd: remove module-level ioctl mutex
Commit 2a48fc0ab242417 ("block: autoconvert trivial BKL users to private
mutex") replaced uses of the BKL in the nbd driver with mutex
operations. Since then, I've been been seeing these lock ups:
Instrumenting the nbd module's ioctl handler with some extra logging
clearly shows the NBD_DO_IT ioctl being invoked which is a long-lived
ioctl in the sense that it doesn't return until another ioctl asks the
driver to disconnect. However, that other ioctl blocks, waiting for the
module-level mutex that replaced the BKL, and then we're stuck.
This patch removes the module-level mutex altogether. It's clearly
wrong, and as far as I can see, it's entirely unnecessary, since the nbd
driver maintains per-device mutexes, and I don't see anything that would
require a module-level (or kernel-level, for that matter) mutex.
Roland Stigge [Thu, 10 Feb 2011 23:01:23 +0000 (15:01 -0800)]
drivers/gpio/pca953x.c: add a mutex to fix race condition
Add a mutex to register communication and handling. Without the mutex,
GPIOs didn't switch as expected when toggled in a fast sequence of
status changes of multiple outputs.
Signed-off-by: Roland Stigge <stigge@antcom.de> Acked-by: Eric Miao <eric.y.miao@gmail.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Marc Zyngier <maz@misterjones.org> Cc: Ben Gardner <bgardner@wabtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 10 Feb 2011 23:01:22 +0000 (15:01 -0800)]
ptrace: use safer wake up on ptrace_detach()
The wake_up_process() call in ptrace_detach() is spurious and not
interlocked with the tracee state. IOW, the tracee could be running or
sleeping in any place in the kernel by the time wake_up_process() is
called. This can lead to the tracee waking up unexpectedly which can be
dangerous.
The wake_up is spurious and should be removed but for now reduce its
toxicity by only waking up if the tracee is in TRACED or STOPPED state.
This bug can possibly be used as an attack vector. I don't think it
will take too much effort to come up with an attack which triggers oops
somewhere. Most sleeps are wrapped in condition test loops and should
be safe but we have quite a number of places where sleep and wakeup
conditions are expected to be interlocked. Although the window of
opportunity is tiny, ptrace can be used by non-privileged users and with
some loading the window can definitely be extended and exploited.
Boaz Harrosh [Thu, 10 Feb 2011 23:01:20 +0000 (15:01 -0800)]
vfs: call rcu_barrier after ->kill_sb()
In commit fa0d7e3de6d6 ("fs: icache RCU free inodes"), we use rcu free
inode instead of freeing the inode directly. It causes a crash when we
rmmod immediately after we umount the volume[1].
So we need to call rcu_barrier after we kill_sb so that the inode is
freed before we do rmmod. The idea is inspired by Aneesh Kumar.
rcu_barrier will wait for all callbacks to end before preceding. The
original patch was done by Tao Ma, but synchronize_rcu() is not enough
here.
Tested-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 11 Feb 2011 23:53:38 +0000 (15:53 -0800)]
Fix possible filp_cachep memory corruption
In commit 31e6b01f4183 ("fs: rcu-walk for path lookup") we started doing
path lookup using RCU, which then falls back to a careful non-RCU lookup
in case of problems (LOOKUP_REVAL). So do_filp_open() has this "re-do
the lookup carefully" looping case.
However, that means that we must not release the open-intent file data
if we are going to loop around and use it once more!
Fix this by moving the release of the open-intent data to the function
that allocates it (do_filp_open() itself) rather than the helper
functions that can get called multiple times (finish_open() and
do_last()). This makes the logic for the lifetime of that field much
more obvious, and avoids the possible double free.
Reported-by: J. R. Okajima <hooanon05@yahoo.co.jp> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Will Deacon [Fri, 11 Feb 2011 15:01:42 +0000 (16:01 +0100)]
ARM: 6657/1: hw_breakpoint: fix ptrace breakpoint advertising on unsupported arch
The ptrace debug information register was advertising breakpoint and
watchpoint resources for unsupported debug architectures. This meant
that setting breakpoints on these architectures would appear to succeed,
although they would never fire in reality.
This patch fixes the breakpoint slot probing so that it returns 0 when
running on an unsupported debug architecture.
Reported-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Dave Martin [Fri, 11 Feb 2011 15:41:20 +0000 (16:41 +0100)]
ARM: 6659/1: Thumb-2: Make CONFIG_OABI_COMPAT depend on !CONFIG_THUMB2_KERNEL
rmk says: "You might as well make OABI_COMPAT depend on !THUMB2_KERNEL.
OABI userland is useless without FPA support."
nwfpe doesn't work with Thumb-2 anyway and will probably never get
ported, so I can't argue with that.
This patch implements the dependency change.
Signed-off-by: Dave Martin <dave.martin@linaro.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
David Teigland [Fri, 11 Feb 2011 22:44:31 +0000 (16:44 -0600)]
dlm: use single thread workqueues
The recent commit to use cmwq for send and recv threads dcce240ead802d42b1e45ad2fcb2ed4a399cb255 introduced problems,
apparently due to multiple workqueue threads. Single threads
make the problems go away, so return to that until we fully
understand the concurrency issues with multiple threads.
Signed-off-by: David Teigland <teigland@redhat.com>
Dmitry Torokhov [Fri, 4 Feb 2011 08:37:26 +0000 (00:37 -0800)]
Input: ads7846 - check proper condition when freeing gpio
When driver uses custom pendown detection method gpio_pendown is not
set up and so we should not try to free it, otherwise we are presented
with:
------------[ cut here ]------------
WARNING: at drivers/gpio/gpiolib.c:1258 gpio_free+0x100/0x12c()
Modules linked in:
[<c0061208>] (unwind_backtrace+0x0/0xe4) from [<c0091f58>](warn_slowpath_common+0x4c/0x64)
[<c0091f58>] (warn_slowpath_common+0x4c/0x64) from [<c0091f88>](warn_slowpath_null+0x18/0x1c)
[<c0091f88>] (warn_slowpath_null+0x18/0x1c) from [<c024e610>](gpio_free+0x100/0x12c)
[<c024e610>] (gpio_free+0x100/0x12c) from [<c03e9fbc>](ads7846_probe+0xa38/0xc5c)
[<c03e9fbc>] (ads7846_probe+0xa38/0xc5c) from [<c02cff14>](spi_drv_probe+0x18/0x1c)
[<c02cff14>] (spi_drv_probe+0x18/0x1c) from [<c028bca4>](driver_probe_device+0xc8/0x184)
[<c028bca4>] (driver_probe_device+0xc8/0x184) from [<c028bdc8>](__driver_attach+0x68/0x8c)
[<c028bdc8>] (__driver_attach+0x68/0x8c) from [<c028b4c8>](bus_for_each_dev+0x48/0x74)
[<c028b4c8>] (bus_for_each_dev+0x48/0x74) from [<c028ae08>](bus_add_driver+0xa0/0x220)
[<c028ae08>] (bus_add_driver+0xa0/0x220) from [<c028c0c0>](driver_register+0xa8/0x134)
[<c028c0c0>] (driver_register+0xa8/0x134) from [<c0050550>](do_one_initcall+0xcc/0x1a4)
[<c0050550>] (do_one_initcall+0xcc/0x1a4) from [<c00084e4>](kernel_init+0x14c/0x214)
[<c00084e4>] (kernel_init+0x14c/0x214) from [<c005b494>](kernel_thread_exit+0x0/0x8)
---[ end trace 4053287f8a5ec18f ]---
Also rearrange ads7846_setup_pendown() to have only one exit point
returning success.
Reported-by: Sourav Poddar <sourav.poddar@ti.com> Acked-by: Wolfram Sang <w.sang@pengutronix.de> Reviewed-by: Charulatha V <charu@ti.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Chris Wright [Thu, 10 Feb 2011 23:58:56 +0000 (15:58 -0800)]
pci: use security_capable() when checking capablities during config space read
Eric Paris noted that commit de139a3 ("pci: check caps from sysfs file
open to read device dependent config space") caused the capability check
to bypass security modules and potentially auditing. Rectify this by
calling security_capable() when checking the open file's capabilities
for config space reads.
Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: James Morris <jmorris@namei.org>