Linus Torvalds [Sun, 30 May 2010 16:02:02 +0000 (09:02 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kyle/parisc-2.6:
parisc: Call pagefault_disable/pagefault_enable in kmap_atomic/kunmap_atomic
parisc: Remove unnecessary macros from entry.S
parisc: LWS fixes for syscall.S
parisc: Delete unnecessary nop's in entry.S
parisc: Avoid interruption in critical region in entry.S
parisc: invoke oom-killer from page fault
parisc: clear floating point exception flag on SIGFPE signal
parisc: Use of align_frame provides stack frame.
Linus Torvalds [Sun, 30 May 2010 16:00:03 +0000 (09:00 -0700)]
Revert "cpusets: randomize node rotor used in cpuset_mem_spread_node()"
This reverts commit 0ac0c0d0f837c499afd02a802f9cf52d3027fa3b, which
caused cross-architecture build problems for all the wrong reasons.
IA64 already added its own version of __node_random(), but the fact is,
there is nothing architectural about the function, and the original
commit was just badly done. Revert it, since no fix is forthcoming.
Requested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 30 May 2010 15:56:39 +0000 (08:56 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: clean up on forwarded aborted mds request
ceph: fix leak of osd authorizer
ceph: close out mds, osd connections before stopping auth
ceph: make lease code DN specific
fs/ceph: Use ERR_CAST
ceph: renew auth tickets before they expire
ceph: do not resend mon requests on auth ticket renewal
ceph: removed duplicated #includes
ceph: avoid possible null dereference
ceph: make mds requests killable, not interruptible
sched: add wait_for_completion_killable_timeout
parisc: Call pagefault_disable/pagefault_enable in kmap_atomic/kunmap_atomic
Based on the generic implementation of kmap_atomic and kunmap_atomic,
we should call pagefault_disable and pagefault_enable in our PA8000
implementation.
The define for kmap_atomic_prot was also missing, and I updated
kmap_atomic_pfn to use the generic implementation because of the
change to kmap_atomic.
I believe that this change is needed to fix the fork copy-on-write
bug.
Signed-off-by: John David Anglin <dave.anglin@nrc-cnrc.gc.ca> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
The EXTR, DEP and DEPI macros are unnecessary. There are PA 1.X
pneumonics available with the same functionality, and the DEP and DEPI
macros conflict with assembler pneumonics.
Tested on a variety of 32 and 64-bit systems.
Signed-off-by: John David Anglin <dave.anglin@nrc-cnrc.gc.ca> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
1) Gate immediately and save a branch.
2) Fix off by one error in checking entry number.
3) Use sr7 instead of sr3 in error return path as sr3 might not
contain correct value.
4) Enable locking on UP systems to prevent incorrect operation of
the cas_action critical region on page faults.
Tested on several systems, including UP c3750 with 2.6.33.2 kernel.
Signed-off-by: John David Anglin <dave.anglin@nrc-cnrc.gc.ca> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Nick Piggin [Thu, 22 Apr 2010 16:06:23 +0000 (16:06 +0000)]
parisc: invoke oom-killer from page fault
As explained in commit 1c0fe6e3bd, we want to call the architecture independent
oom killer when getting an unexplained OOM from handle_mm_fault, rather than
simply killing current.
Cc: linux-parisc@vger.kernel.org Cc: linux-arch@vger.kernel.org Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
Helge Deller [Mon, 3 May 2010 20:44:21 +0000 (20:44 +0000)]
parisc: clear floating point exception flag on SIGFPE signal
Clear the floating point exception flag before returning to
user space. This is needed, else the libc trampoline handler
may hit the same SIGFPE again while building up a trampoline
to a signal handler.
Linus Torvalds [Sat, 29 May 2010 22:31:57 +0000 (15:31 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (26 commits)
ALSA: snd-usb-caiaq: Bump version number to 1.3.21
ALSA: Revert "ALSA: snd-usb-caiaq: Set default input mode of A4DJ"
ALSA: snd-usb-caiaq: Simplify single case to an 'if'
ALSA: snd-usb-caiaq: Restore 'Control vinyl' input mode on A4DJ
ALSA: hda: Use LPIB for a Shuttle device
ALSA: hda: Add support for another Lenovo ThinkPad Edge in conexant codec
ALSA: hda: Use LPIB for Sony VPCS11V9E
ALSA: usb-audio: fix feature unit parser for UAC2
ALSA: asihpi - Minor code cleanup
ALSA: asihpi - Add support for new ASI8800 family
ALSA: asihpi - Fix bug preventing outstream_write preload from happening
ALSA: asihpi - Fix imbalanced lock path in hw_message
ALSA: asihpi - Remove support for old ASI8800 family
ALSA: asihpi - Add hd radio blend functions
ALSA: asihpi - Remove unused io map functions
ALSA: usb-audio: add support for UAC2 pitch control
ALSA: usb-audio: parse UAC2 endpoint descriptors correctly
ALSA: usb-audio: fix return values
ALSA: usb-audio: parse more format descriptors with structs
sound: Add missing spin_unlock
...
Mark Hills [Sat, 29 May 2010 15:53:23 +0000 (16:53 +0100)]
ALSA: snd-usb-caiaq: Restore 'Control vinyl' input mode on A4DJ
This feature was undocumented on early A4DJ units. It is indicated
by lighting both the 'line' and 'phono' lamps at the same time.
Newer units document this and the newer Windows drivers enable this
for all units, so restore the functionality.
This patch simplifies the code and changes the mode mapping to match
the A8DJ, favouring simpler code and consistency over keeping the
existing mapping.
Both 'Control vinyl' and 'Phono' input modes enable the hardware
preamp. The difference is the input impedance.
Daniel T Chen [Sat, 29 May 2010 15:04:11 +0000 (11:04 -0400)]
ALSA: hda: Use LPIB for a Shuttle device
BugLink: https://launchpad.net/bugs/551949
Symptom: On the reporter's Shuttle device, using PulseAudio in Ubuntu
10.04 LTS results in "popping clicking" audio with the PA crashing
shortly thereafter.
Test case: Using Ubuntu 10.04 LTS (Linux 2.6.32.12), Linux 2.6.33, or
Linux 2.6.34, adjust the HDA device's volume with PulseAudio.
Resolution: add SSID for this machine to the position_fix quirk table,
explicitly specifying the LPIB method.
Reported-and-Tested-By: Christian Mehlis <mehlis@inf.fu-berlin.de> Cc: <stable@kernel.org> Signed-off-by: Daniel T Chen <crimsun@ubuntu.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Sage Weil [Fri, 28 May 2010 23:43:16 +0000 (16:43 -0700)]
ceph: clean up on forwarded aborted mds request
If an mds request is aborted (timeout, SIGKILL), it is left registered to
keep our state in sync with the mds. If we get a forward notification,
though, we know the request didn't succeed and we can unregister it
safely. We were trying to resend it, but then bailing out (and not
unregistering) in __do_request.
Sage Weil [Sat, 29 May 2010 16:41:23 +0000 (09:41 -0700)]
ceph: close out mds, osd connections before stopping auth
The auth module (part of the mon_client) is needed to free any
ceph_authorizer(s) used by the mds and osd connections. Flush the msgr
workqueue before stopping monc to ensure that the destroy_authorizer
auth op is available when those connections are closed out.
Sage Weil [Tue, 25 May 2010 23:45:25 +0000 (16:45 -0700)]
ceph: make lease code DN specific
The lease code includes a mask in the CEPH_LOCK_* namespace, but that
namespace is changing, and only one mask (formerly _DN == 1) is used, so
hard code for that value for now.
If we ever extend this code to handle leases over different data types we
can extend it accordingly.
Sage Weil [Tue, 25 May 2010 22:38:06 +0000 (15:38 -0700)]
ceph: do not resend mon requests on auth ticket renewal
We only want to send pending mon requests when we successfully
authenticate. If we are already authenticated, like when we renew our
ticket, there is no need to resend pending requests.
Andrea Gelmini [Sun, 23 May 2010 19:47:58 +0000 (21:47 +0200)]
ceph: removed duplicated #includes
fs/ceph/auth.c: linux/slab.h is included more than once.
fs/ceph/super.h: linux/slab.h is included more than once.
Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Mon, 24 May 2010 18:15:51 +0000 (11:15 -0700)]
ceph: make mds requests killable, not interruptible
The underlying problem is that many mds requests can't be restarted. For
example, a restarted create() would return -EEXIST if the original request
succeeds. However, we do not want a hung MDS to hang the client too. So,
use the _killable wait_for_completion variants to abort on SIGKILL but
nothing else.
Sage Weil [Sat, 29 May 2010 16:12:30 +0000 (09:12 -0700)]
sched: add wait_for_completion_killable_timeout
Add missing _killable_timeout variant for wait_for_completion that will
return when a timeout expires or the task is killed.
CC: Ingo Molnar <mingo@elte.hu> CC: Andreas Herrmann <andreas.herrmann3@amd.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Sage Weil <sage@newdream.net>
Linus Torvalds [Fri, 28 May 2010 23:14:17 +0000 (16:14 -0700)]
Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
intel_idle: native hardware cpuidle driver for latest Intel processors
ACPI: acpi_idle: touch TS_POLLING only in the non-MWAIT case
acpi_pad: uses MONITOR/MWAIT, so it doesn't need to clear TS_POLLING
sched: clarify commment for TS_POLLING
ACPI: allow a native cpuidle driver to displace ACPI
cpuidle: make cpuidle_curr_driver static
cpuidle: add cpuidle_unregister_driver() error check
cpuidle: fail to register if !CONFIG_CPU_IDLE
ACPI: Don't let acpi_pad needlessly mark TSC unstable
acpi pad driver kind of aggressively marks TSC as unstable at init
time, on mwait capable and non X86_FEATURE_NONSTOP_TSC systems. This is
irrespective of whether pad driver is ever going to be used on the
system or deep C-states are supported/used. This will affect every user
who just happens to compile in (or get a kernel version which
compiles in) acpi pad driver.
Move mark_tsc_unstable() out of init to the actual idle invocation path
of the pad driver.
There is also another bug/missing_feature in the code that it does not
support 'always running apic timer' and switches to broadcast mode
unconditionally. Shaohua, can you take a look at that please.
Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Mon, 8 Mar 2010 19:07:30 +0000 (14:07 -0500)]
intel_idle: native hardware cpuidle driver for latest Intel processors
This EXPERIMENTAL driver supersedes acpi_idle on
Intel Atom Processors, Intel Core i3/i5/i7 Processors
and associated Intel Xeon processors.
It does not support the Intel Core2 processor or earlier.
For kernels configured with ACPI, CONFIG_INTEL_IDLE=y
allows intel_idle to probe before the ACPI processor driver.
Booting with "intel_idle.max_cstate=0" disables intel_idle
and the system will fall back on ACPI's "acpi_idle".
Typical Linux distributions load ACPI processor module early,
making CONFIG_INTEL_IDLE=m not easily useful on ACPI platforms.
intel_idle probes all processors at module_init time.
Processors that are hot-added later will be limited
to using C1 in idle.
Len Brown [Mon, 24 May 2010 18:27:44 +0000 (14:27 -0400)]
ACPI: acpi_idle: touch TS_POLLING only in the non-MWAIT case
commit d306ebc28649b89877a22158fe0076f06cc46f60
(ACPI: Be in TS_POLLING state during mwait based C-state entry)
fixed an important power & performance issue where ACPI c2 and c3 C-states
were clearing TS_POLLING even when using MWAIT (ACPI_STATE_FFH).
That bug had been causing us to receive redundant scheduling interrups
when we had already been woken up by MONITOR/MWAIT.
Following up on that...
In the MWAIT case, we don't have to subsequently
check need_resched(), as that c heck was there
for the TS_POLLING-clearing case.
Note that not only does the cpuidle calling function
already check need_resched() before calling us, the
low-level entry into monitor/mwait calls it twice --
guaranteeing that a write to the trigger address
can not go un-noticed.
Also, in this case, we don't have to set TS_POLLING
when we wake, because we never cleared it.
Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Venkatesh Pallipadi <venki@google.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits)
netlink: bug fix: wrong size was calculated for vfinfo list blob
netlink: bug fix: don't overrun skbs on vf_port dump
xt_tee: use skb_dst_drop()
netdev/fec: fix ifconfig eth0 down hang issue
cnic: Fix context memory init. on 5709.
drivers/net: Eliminate a NULL pointer dereference
drivers/net/hamradio: Eliminate a NULL pointer dereference
be2net: Patch removes redundant while statement in loop.
ipv6: Add GSO support on forwarding path
net: fix __neigh_event_send()
vhost: fix the memory leak which will happen when memory_access_ok fails
vhost-net: fix to check the return value of copy_to/from_user() correctly
vhost: fix to check the return value of copy_to/from_user() correctly
vhost: Fix host panic if ioctl called with wrong index
net: fix lock_sock_bh/unlock_sock_bh
net/iucv: Add missing spin_unlock
net: ll_temac: fix checksum offload logic
net: ll_temac: fix interrupt bug when interrupt 0 is used
sctp: dubious bitfields in sctp_transport
ipmr: off by one in __ipmr_fill_mroute()
...
Architectures that handle DMA-non-coherent memory need to set
ARCH_KMALLOC_MINALIGN to make sure that kmalloc'ed buffer is
DMA-safe: the buffer doesn't share a cache with the others.
Linus Torvalds [Fri, 28 May 2010 17:07:48 +0000 (10:07 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
remove detritus left by "mm: make read_cache_page synchronous"
fix fs/sysv s_dirt handling
fat: convert to use the new truncate convention.
ext2: convert to use the new truncate convention.
tmpfs: convert to use the new truncate convention
fs: convert simple fs to new truncate
kill spurious reference to vmtruncate
fs: introduce new truncate sequence
fs/super: fix kernel-doc warning
fs/minix: bugfix, number of indirect block ptrs per block depends on block size
rename the generic fsync implementations
drop unused dentry argument to ->fsync
fs: Add missing mutex_unlock
Fix racy use of anon_inode_getfd() in perf_event.c
get rid of the magic around f_count in aio
VFS: fix recent breakage of FS_REVAL_DOT
Revert "anon_inode: set S_IFREG on the anon_inode"
Al Viro [Fri, 28 May 2010 15:34:50 +0000 (11:34 -0400)]
remove detritus left by "mm: make read_cache_page synchronous"
gets minix get_dir_page() in sync with its analogs; back in 2007
Nick has switched read_cache_page() and friends to sync behaviour
(i.e. they wait for the page to get unlocked, check if it's uptodate
and if it isn't return ERR_PTR(-EIO) instead) and removed the
duplicate logics from the callers. In case of fs/minix/dir.c he'd
removed only half of that...
Scott Feldman [Fri, 28 May 2010 10:42:43 +0000 (03:42 -0700)]
netlink: bug fix: wrong size was calculated for vfinfo list blob
The wrong size was being calculated for vfinfo. In one case, it was over-
calculating using nlmsg_total_size on attrs, in another case, it was
under-calculating by assuming ifla_vf_* structs are packed together, but
each struct is it's own attr w/ hdr (and padding).
Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Fri, 28 May 2010 10:42:18 +0000 (03:42 -0700)]
netlink: bug fix: don't overrun skbs on vf_port dump
Noticed by Patrick McHardy: was continuing to fill skb after a
nla_put_failure, ignoring the size calculated by upper layer. Now,
return -EMSGSIZE on any overruns, but also allow netdev to
fail ndo_get_vf_port with error other than -EMSGSIZE, thus unwinding
nest.
Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 28 May 2010 10:41:17 +0000 (03:41 -0700)]
xt_tee: use skb_dst_drop()
After commit 7fee226a (net: add a noref bit on skb dst), its wrong to
use : dst_release(skb_dst(skb)), since we could decrement a refcount
while skb dst was not refcounted.
We should use skb_dst_drop(skb) instead.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bryan Wu [Fri, 28 May 2010 10:40:39 +0000 (03:40 -0700)]
netdev/fec: fix ifconfig eth0 down hang issue
BugLink: http://bugs.launchpad.net/bugs/559065
In fec open/close function, we need to use phy_connect and phy_disconnect
operation before we start/stop phy. Otherwise it will cause system hang.
Only call fec_enet_mii_probe() in open function, because the first open
action will cause NULL pointer error.
Signed-off-by: Bryan Wu <bryan.wu@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the structure to avoid the following warning:
WARNING: drivers/serial/built-in.o(.data+0x534): Section mismatch in reference from the variable s5p_serial_drv to the function .devexit.text:s3c24xx_serial_remove()
The variable s5p_serial_drv references
the function __devexit s3c24xx_serial_remove()
If the reference is valid then annotate the
variable with __exit* (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console,
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Thomas Abraham [Fri, 28 May 2010 02:41:16 +0000 (11:41 +0900)]
ARM: S5P: Regoster clk_xusbxti clock for hsotg driver
The clk_xusbxti clock is added to the list of clocks to be
registred during boot time clock registration.
Signed-off-by: Thomas Abraham <thomas.ab@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
[ben-linux@fluff.org: edited title] Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Michael Chan [Thu, 27 May 2010 23:31:41 +0000 (16:31 -0700)]
cnic: Fix context memory init. on 5709.
We need to zero context memory on 5709 in the function cnic_init_context().
Without this, iscsid restart on 5709 will not work because of stale data.
TX context blocks should not be initialized by cnic_init_context() because
of the special remapping on 5709.
Update version to 2.1.2.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 27 May 2010 23:14:30 +0000 (16:14 -0700)]
ipv6: Add GSO support on forwarding path
Currently we disallow GSO packets on the IPv6 forward path.
This patch fixes this.
Note that I discovered that our existing GSO MTU checks (e.g.,
IPv4 forwarding) are buggy in that they skip the check altogether,
when they really should be checking gso_size + header instead.
I have also been lazy here in that I haven't bothered to segment
the GSO packet by hand before generating an ICMP message. Someone
should add that to be 100% correct.
Reported-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 27 May 2010 23:09:39 +0000 (16:09 -0700)]
net: fix __neigh_event_send()
commit 7fee226ad23 (net: add a noref bit on skb dst) missed one spot
where an skb is enqueued, with a possibly not refcounted dst entry.
__neigh_event_send() inserts skb into arp_queue, so we must make sure
dst entry is refcounted, or dst entry can be freed by garbage collector
after caller exits from rcu protected section.
Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andreas Herrmann [Fri, 28 May 2010 07:57:12 +0000 (09:57 +0200)]
ALSA: hda: Add support for another Lenovo ThinkPad Edge in conexant codec
On a Thinkpad Edge 13 "01972NG" I had the problem that speakers played
sound although headphones were plugged in. Using model=ideapad with
latest alsa-git kernel fixed this. So adding this quirk to use ideapad
for another Thinkpad Edge variant seems sensible.
Cc: Jerone Young <jerone.young@canonical.com> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Daniel T Chen [Thu, 27 May 2010 22:32:18 +0000 (18:32 -0400)]
ALSA: hda: Use LPIB for Sony VPCS11V9E
BugLink: https://launchpad.net/bugs/586347
Symptom: On the Sony VPCS11V9E, using GStreamer-based applications with
PulseAudio in Ubuntu 10.04 LTS results in stuttering audio. It appears
to worsen with increased I/O.
Test case: use Rhythmbox under increased I/O pressure. This symptom is
reproducible in the current daily stable alsa-driver snapshots (at least
up until 21 May 2010; later snapshots fail to build from source due to
missing preprocessor directives when compiled against 2.6.32).
Resolution: add SSID for this machine to the position_fix quirk table,
explicitly specifying the LPIB method.
Reported-and-Tested-By: Lauri Kainulainen <lauri@sokkelo.net> Cc: <stable@kernel.org> Signed-off-by: Daniel T Chen <crimsun@ubuntu.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Daniel Mack [Thu, 27 May 2010 18:15:14 +0000 (20:15 +0200)]
ALSA: usb-audio: fix feature unit parser for UAC2
Fix a small off-by-one bug which causes the feature unit to announce a
wrong number of channels. This leads to illegal requests sent to the
firmware eventually.
Signed-off-by: Daniel Mack <daniel@caiaq.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>
npiggin@suse.de [Wed, 26 May 2010 15:05:37 +0000 (01:05 +1000)]
ext2: convert to use the new truncate convention.
I also have commented a possible bug in existing ext2 code, marked with XXX.
Cc: linux-ext4@vger.kernel.org Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
npiggin@suse.de [Wed, 26 May 2010 15:05:34 +0000 (01:05 +1000)]
kill spurious reference to vmtruncate
Lots of filesystems calls vmtruncate despite not implementing the old
->truncate method. Switch them to use simple_setsize and add some
comments about the truncate code where it seems fitting.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
npiggin@suse.de [Wed, 26 May 2010 15:05:33 +0000 (01:05 +1000)]
fs: introduce new truncate sequence
Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.
simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.
simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).
To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
the event of write failure after allocation, so this must be performed
in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.
Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed. This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).
Cc: Christoph Hellwig <hch@lst.de> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs/minix: bugfix, number of indirect block ptrs per block depends on block size
The MINIX filesystem driver used a constant number of indirect block
pointers in an indirect block. This worked only for filesystems with 1kb
block, while the MINIX default block size is now 4kb. As a consequence,
large files were read incorrectly on such filesystems and writing a
large file would cause the filesystem to become corrupted. This patch
computes the number of indirect block pointers based on the block size,
making the driver work for each block size.
I would like to thank Feiran Zheng ('Fam') for pointing out the cause
of the corruption.
Signed-off-by: Erik van der Kouwe <vdkouwe@cs.vu.nl> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.
This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect. In addition add some documentation for both methods.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 26 May 2010 21:40:29 +0000 (17:40 -0400)]
Fix racy use of anon_inode_getfd() in perf_event.c
once anon_inode_getfd() is called, you can't expect *anything* about
struct file that descriptor points to - another thread might be doing
whatever it likes with descriptor table at that point.
Cc: stable <stable@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Wed, 26 May 2010 19:13:55 +0000 (15:13 -0400)]
get rid of the magic around f_count in aio
__aio_put_req() plays sick games with file refcount. What
it wants is fput() from atomic context; it's almost always
done with f_count > 1, so they only have to deal with delayed
work in rare cases when their reference happens to be the
last one. Current code decrements f_count and if it hasn't
hit 0, everything is fine. Otherwise it keeps a pointer
to struct file (with zero f_count!) around and has delayed
work do __fput() on it.
Better way to do it: use atomic_long_add_unless( , -1, 1)
instead of !atomic_long_dec_and_test(). IOW, decrement it
only if it's not the last reference, leave refcount alone
if it was. And use normal fput() in delayed work.
I've made that atomic_long_add_unless call a new helper -
fput_atomic(). Drops a reference to file if it's safe to
do in atomic (i.e. if that's not the last one), tells if
it had been able to do that. aio.c converted to it, __fput()
use is gone. req->ki_file *always* contributes to refcount
now. And __fput() became static.