git.proxmox.com Git - mirror_ubuntu-hirsute-kernel.git/log

SUNRPC: display human-readable procedure name in rpc_iostats output

Add fields to the rpc_procinfo struct that allow the display of a
human-readable name for each procedure in the rpc_iostats output.

Also fix it so that the NFSv4 stats are broken up correctly by
sub-procedure number. NFSv4 uses only two real RPC procedures:
NULL, and COMPOUND.

Test plan:
Mount with NFSv2, NFSv3, and NFSv4, and do "cat /proc/self/mountstats".

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: add RPC I/O statistics to /proc/self/mountstats

NFS client now shows various RPC I/O metrics in /proc/self/mountstats.

Test plan:
Mount/umount while doing "cat /proc/self/mountstats", multiple iterations
of connectathon locking suite. Test with NFS version 2, 3, and 4.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: provide a mechanism for collecting stats in the RPC client

Add a simple mechanism for collecting stats in the RPC client.  Stats are
tabulated during xprt_release.  Note that per_cpu shenanigans are not
required here because the RPC client already serializes on the transport
write lock.

Test plan:
Compile kernel with CONFIG_NFS enabled.  Basic performance regression
testing with high-speed networking and high performance server.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: introduce per-task RPC iostats

Account for various things that occur while an RPC task is executed.
Separate timers for RPC round trip and RPC execution time show how
long RPC requests wait in queue before being sent. Eventually these
will be accumulated at xprt_release time in one place where they can
be viewed from userland.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: add a handful of per-xprt counters

Monitor generic transport events. Add a transport switch callout to
format transport counters for export to user-land.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: track length of RPC wait queues

RPC wait queue length will eventually be exported to userland via the RPC
iostats interface.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: report how long an NFS file system has been mounted

Add a field in nfs_server to record a timestamp when a mount succeeds.
Report the number of seconds the file system has been mounted via
nfs_show_stats().

Test plan:
Mount an NFS file system, watch the mountstats reports and compare with
clock time.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: add hooks to account for NFSERR_JUKEBOX errors

Make an inode or an nfs_server struct available in the logic that handles
JUKEBOX/DELAY type errors so the NFS client can account for them.

This patch is split out from the main nfs iostat patch to highlight minor
architectural changes required to support this statistic.

Test plan:
None.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: add I/O performance counters

Invoke the byte and event counter macros where we want to count bytes and
events.

Clean-up: fix a possible NULL dereference in nfs_lock, and simplify
nfs_file_open.

Test-plan:
fsx and iozone on UP and SMP systems, with and without pre-emption. Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: introduce mechanism for tracking NFS client metrics

Add a per-superblock performance counter facility to the NFS client.  This
facility mimics the counters available for block devices and for
networking.  Expose these new counters via the new /proc/self/mountstats
interface.

Thanks to Andrew Morton and Trond Myklebust for their review and comments.

Test plan:
fsx and iozone on UP and SMP systems, with and without pre-emption.  Watch
for memory overwrite bugs, and performance loss (significantly more CPU
required per op).

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: clean up some mount options

Get rid of "lock" and "posix", and spell out "vers=".

Test plan:
None.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: show retransmit settings when displaying mount options

Sometimes it's important to know the exact RPC retransmit settings the
kernel is using for an NFS mount point. Add this facility to the NFS
client's show_options method.

Test plan:
Set various retransmit settings via the mount command, and check that the
settings are reflected in /proc/mounts.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

VFS: New /proc file /proc/self/mountstats

Create a new file under /proc/self, called mountstats, where mounted file
systems can export information (configuration options, performance counters,
and so on). Use a mechanism similar to /proc/mounts and s_ops->show_options.

This mechanism does not violate namespace security, and is safe to use while
other processes are unmounting file systems.

Thanks to Mike Waychison for his review and comments.

Test-plan:
Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: more verbose output for rpc auth weak error

This patch adds server ip address to be printed out when "server
requires stronger authentication" error occured.

Signed-off-by: Levent Serinol <lserinol@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: Code comments update in NFS

read_cache_mtime is no longer used in nfs_inode. This patch removes
references of read_cache_mtime in the code comments.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: sem2mutex idmap.c

semaphore to mutex conversion.

the conversion was generated via scripts, and the result was validated
automatically via a script as well.

build and boot tested.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: kzalloc conversion in fs/nfs

this converts fs/nfs to kzalloc() usage.
compile tested with make allyesconfig

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFSv4: Kill braindead gcc warnings

nfs4_open_revalidate: 'res' may be used uninitialized
nfs4_callback_compound: ‘hdr_res.nops’ may be used uninitialized
'op_nr’ may be used uninitialized
encode_getattr_res: ‘savep’ may be used uninitialized

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFSv4: Do not call rpciod_down() before call to destroy_nfsv4_state()

The reason is that the idmapper cleanup may call flush_workqueue() on
rpciod_workqueue.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Ensure that rpc_mkpipe returns a refcounted dentry

If not, we cannot guarantee that idmap->idmap_dentry, gss_auth->dentry and
clnt->cl_dentry are valid dentries.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Run rpci->queue_timeout on the rpciod workqueue instead of generic

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

SUNRPC: Auto-load RPC authentication kernel modules

This patch adds a request_module call to rpcauth_create which will try
to auto-load the kernel module for the requested authentication flavor.
For kernels with modular sunrpc, this reduces the admin overhead for
the user.

Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: reduce the number of false cache invalidations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: "const static" vs "static const" in nfs4

My previous "const static" vs "static const" cleanup missed a single case,
patch below takes care of it.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFSv4: Don't invalidate cached attributes if change attribute is unchanged

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: writes should not clobber utimes() calls

Ensure that we flush out writes in the case when someone calls utimes() in
order to set the file times.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

lockd: Don't expose the process pid to the NLM server

Instead we use the nlm_lockowner->pid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NLM: nlm_alloc_call should not immediately fail on signal

Currently, nlm_alloc_call tests for a signal before it even tries to
allocate memory.
Fix it so that it tries at least once.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

VFS: Fix __posix_lock_file() copy of private lock area

The struct file_lock->fl_u area must be copied using the fl_copy_lock()
operation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: Fix buglet in fs/nfs/write.c

I've been reading through fs/nfs/write.c trying to track down a bug
that seems to be related to pages loosing a refcount and getting
freed too early (you interested in detail??) and I spotted a little
bug which the following patch should fix.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: Avoid races between writebacks and truncation

Currently, there is no serialisation between NFS asynchronous writebacks
and truncation at the page level due to the fact that nfs_sync_inode()
cannot lock the pages that it is about to write out.

This means that it is possible to be flushing out data (and calling something
like set_page_writeback()) while the page cache is busy evicting the page.
Oops...

Use the hooks provided in try_to_release_page() to ensure that dirty pages
are always written back to storage before we evict them.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

NFS: Fix a busy inodes issue...

The nfs_open_context may live longer than the file descriptor that spawned
it, so it needs to carry a reference to the vfsmount. If not, then
generic_shutdown_super() may end up being called before reads and writes
have been flushed out.

Make a couple of functions static while we're at it...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

Linux 2.6.16

[PATCH] Remove obsolete CREDITS address

This address is going to be obsolete, so I should update it.

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] SB1: Check for -mno-sched-prolog if building corelis debug kernel.
  [MIPS] Sibyte: Fix race in sb1250_gettimeoffset().
  [MIPS] Sibyte: Fix interrupt timer off by one bug.
  [MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width.
  [MIPS] Protect more of timer_interrupt() by xtime_lock.
  [MIPS] Work around bad code generation for <asm/io.h>.
  [MIPS] Simple patch to power off DBAU1200
  [MIPS] Fix DBAu1550 software power off.
  [MIPS] local_r4k_flush_cache_page fix
  [MIPS] SB1: Fix interrupt disable hazard.
  [MIPS] Get rid of the IP22-specific code in arclib.
  Update MAINTAINERS entry for MIPS.

[TG3]: 40-bit DMA workaround part 2

The 40-bit DMA workaround recently implemented for 5714, 5715, and
5780 needs to be expanded because there may be other tg3 devices
behind the EPB Express to PCIX bridge in the 5780 class device.

For example, some 4-port card or mother board designs have 5704 behind
the 5714.

All devices behind the EPB require the 40-bit DMA workaround.

Thanks to Chris Elmquist again for reporting the problem and testing
the patch.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[AX.25]: Fix potencial memory hole.

If the AX.25 dialect chosen by the sysadmin is set to DAMA master / 3
(or DAMA slave / 2, if CONFIG_AX25_DAMA_SLAVE=n) ax25_kick() will fall
through the switch statement without calling ax25_send_iframe() or any
other function that would eventually free skbn thus leaking the packet.

Fix by restricting the sysctl inferface to allow only actually supported
AX.25 dialects.

The system administration mistake needed for this to happen is rather
unlikely, so this is an uncritical hole.

Coverity #651.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

[PATCH] Kconfig: swap VIDEO_CX88_ALSA and VIDEO_CX88_DVB

VIDEO_CX88_ALSA should not be between VIDEO_CX88_DVB and
VIDEO_CX88_DVB_ALL_FRONTENDS

When cx88-alsa was added to cx88/Kconfig, it was added in between
VIDEO_CX88_DVB and VIDEO_CX88_DVB_ALL_FRONTENDS. This caused
undesireable effects to the appearance of the menu options in
menuconfig.

This fix reorders cx88-alsa and cx88-dvb in Kconfig, to match saa7134,
and restore the correct menuconfig appearance.

Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Fixed em28xx based system lockup

Fixed em28xx based system lockup, device needs to be initialized before
starting the isoc transfer otherwise the system will completly lock up.

Signed-off-by: Markus Rechberger <mrechberger@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] disable unshare(CLONE_VM) for now

sys_unshare() does mmput(new_mm).  This is not enough if we have
mm->core_waiters.

This patch is a temporary fix for soon to be released 2.6.16.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
[ Checked with Uli: "I'm not planning to use unshare(CLONE_VM).  It's
  not needed for any functionality planned so far.  What we (as in Red
  Hat) need unshare() for now is the filesystem side." ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[MIPS] SB1: Check for -mno-sched-prolog if building corelis debug kernel.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Sibyte: Fix race in sb1250_gettimeoffset().

From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:

sb1250_gettimeoffset() simply reads the current cpu 0 timer remaining
value, however once this counter reaches 0 and the interrupt is raised,
it immediately resets and begins to count down again.

If sb1250_gettimeoffset() is called on cpu 1 via do_gettimeofday() after
the timer has reset but prior to cpu 0 processing the interrupt and
taking write_seqlock() in timer_interrupt() it will return a full value
(or close to it) causing time to jump backwards 1ms. Once cpu 0 handles
the interrupt and timer_interrupt() gets far enough along it will jump
forward 1ms.

Fix this problem by implementing mips_hpt_*() on sb1250 using a spare
timer unrelated to the existing periodic interrupt timers. It runs at
1Mhz with a full 23bit counter. This eliminated the custom
do_gettimeoffset() for sb1250 and allowed use of the generic
fixed_rate_gettimeoffset() using mips_hpt_*() and timerhi/timerlo.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Sibyte: Fix interrupt timer off by one bug.

From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:

The timers need to be loaded with 1 less than the desired interval not
the interval itself.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Sibyte: Fix M_SCD_TIMER_INIT and M_SCD_TIMER_CNT wrong field width.

From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:

Field width should be 23 bits not 20 bits.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Protect more of timer_interrupt() by xtime_lock.

From Dave Johnson <djohnson+linuxmips@sw.starentnetworks.com>:

* do_timer() expects the arch-specific handler to take the lock as it
modifies jiffies[_64] and xtime.
* writing timerhi/lo in timer_interrupt() will mess up
fixed_rate_gettimeoffset() which reads timerhi/lo.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Work around bad code generation for <asm/io.h>.

If a call to set_io_port_base() was being followed by usage of
mips_io_port_base in the same function gcc was possibly using the old
value due to some clever abuse of const. Adding a barrier will keep
the optimization and result in correct code with latest gcc.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Simple patch to power off DBAU1200

Signed-off-by: Matej Kupljen <matej.kupljen@ultra.si>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Fix DBAu1550 software power off.

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] local_r4k_flush_cache_page fix

If dcache_size != icache_size or dcache_size != scache_size, or
set-associative cache, icache/scache does not flushed properly. Make
blast_?cache_page_indexed() masks its index value correctly. Also,
use physical address for physically indexed pcache/scache.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] SB1: Fix interrupt disable hazard.

The SB1 core has a three cycle interrupt disable hazard but we were
wrongly treating it as fully interlocked.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Get rid of the IP22-specific code in arclib.

This breaks the kernel build if sgiwd93 was configured as a module.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

Update MAINTAINERS entry for MIPS.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[NET]: Fix race condition in sk_wait_event().

It is broken, the condition is checked out of socket lock. It is
wonderful the bug survived for so long time.

[ This fixes bugzilla #6233:
race condition in tcp_sendmsg when connection became established ]

Signed-off-by: David S. Miller <davem@davemloft.net>

[PATCH] fix free swap cache latency

Lee Revell reported 28ms latency when process with lots of swapped memory
exits.

2.6.15 introduced a latency regression when unmapping: in accounting the
zap_work latency breaker, pte_none counted 1, pte_present PAGE_SIZE, but a
swap entry counted nothing at all. We think of pages present as the slow
case, but Lee's trace shows that free_swap_and_cache's radix tree lookup
can make a lot of work - and we could have been doing it many thousands of
times without a latency break.

Move the zap_work update up to account swap entries like pages present.
This does account non-linear pte_file entries, and unmap_mapping_range
skipping over swap entries, by the same amount even though they're quick:
but neither of those cases deserves complicating the code (and they're
treated no worse than they were in 2.6.14).

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] kbuild: fix buffer overflow in modpost

Jiri Benc <jbenc@suse.cz> reported that modpost would stop with SIGABRT if
used with long filepaths.
The error looked like:
>   Building modules, stage 2.
>   MODPOST
> *** glibc detected *** scripts/mod/modpost: realloc(): invalid next size:
+0x0809f588 ***
> [...]

Fix this by allocating at least the required memory + SZ bytes each time.
Before we sometimes ended up allocating too little memory resuting in the
glibc detected bug above.  Based on patch originally submitted by: Jiri
Benc <jbenc@suse.cz>

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] nfsservctl(): remove user-triggerable printk

A user can use nfsservctl() to spam the logs.

This can happen because the arguments to the nfsservctl() system call are
versioned. This is a good thing. However, when a bad version is detected,
the kernel prints a message and then returns an error.

Signed-off-by: Peter Staubach <staubach@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] fix race in pagevec_strip?

We can call try_to_release_page() with PagePrivate off and a valid
page->mapping This may cause all sorts of trouble for the filesystem
*_releasepage() handlers. XFS bombs out in that case.

Lock the page before checking for page private.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] dm stripe: Fix bounds

The dm-stripe target currently does not enforce that the size of a stripe
device be a multiple of the chunk-size.  Under certain conditions, this can
lead to I/O requests going off the end of an underlying device.  This
test-case shows one example.

echo "0 100 linear /dev/hdb1 0" | dmsetup create linear0
echo "0 100 linear /dev/hdb1 100" | dmsetup create linear1
echo "0 200 striped 2 32 /dev/mapper/linear0 0 /dev/mapper/linear1 0" | \
   dmsetup create stripe0
dd if=/dev/zero of=/dev/mapper/stripe0 bs=1k

This will produce the output:
dd: writing '/dev/mapper/stripe0': Input/output error
97+0 records in
96+0 records out

And in the kernel log will be:
attempt to access beyond end of device
dm-0: rw=0, want=104, limit=100

The patch will check that the table size is a multiple of the stripe
chunk-size when the table is created, which will prevent the above striped
device from being created.

This should not affect tools like LVM or EVMS, since in all the cases I can
think of, striped devices are always created with the sizes being a
multiple of the chunk-size.

The size of a stripe device must be a multiple of its chunk-size.

(akpm: that typecast is quite gratuitous)

Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] x86: check for online cpus before bringing them up

Bryce reported a bug wherein offlining CPU0 (on x86 box) and then
subsequently onlining it resulted in a lockup.

On x86, CPU0 is never offlined.  The subsequent attempt to online CPU0
doesn't take that into account.  It actually tries to bootup the already
booted CPU.  Following patch fixes the problem (as acknowledged by Bryce).
Please consider for inclusion in 2.6.16.

Check if cpu is already online.

Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] v9fs: fix overzealous dropping of dentry which breaks dcache

There is a d_drop in dir_release which caused problems as it invalidates
dcache entries too soon. This was likely a part of the wierd cwd behavior
folks were seeing.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] posix-timers: fix requeue accounting when signal is ignored

When the posix-timer signal is ignored then the timer is rearmed by the
callback function. The requeue pending accounting has to be fixed up else
the state might be wrong.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] time_interpolator: add __read_mostly

The pointer to the current time interpolator and the current list of time
interpolators are typically only changed during bootup. Adding
__read_mostly takes them away from possibly hot cachelines.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] page migration: Fail with error if swap not setup

Currently the migration of anonymous pages will silently fail if no swap is
setup. This patch makes page migration functions check for available swap
and fail with -ENODEV if no swap space is available.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] unshare: Use rcu_assign_pointer when setting sighand

The sighand pointer only needs the rcu_read_lock on the
read side. So only depending on task_lock protection
when setting this pointer is not enough. We also need
a memory barrier to ensure the initialization is seen first.

Use rcu_assign_pointer as it does this for us, and clearly
documents that we are setting an rcu readable pointer.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Paul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[netdrvr] fix array overflows in Chelsio driver

Adrian Bunk wrote:
> The Coverity checker spotted the following two array overflows in
> drivers/net/chelsio/sge.c (in both cases, the arrays contain 3
> elements):
[snip]

This is a bug. The array should contain 2 elements. Here is the fix.

Signed-off-by: Scott Bardone <sbardone@chelsio.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

[PATCH] e1000 endianness bugs

return -E_NO_BIG_ENDIAN_TESTING;

[E1000]: Fix 4 missed endianness conversions on RX descriptor fields.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6

Merge branch 'e100-fixes' of git://198.78.49.142/~jbrandeb/linux-2.6

Merge git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge

* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc-merge:
  powerpc: update defconfigs
  [PATCH] powerpc: properly configure DDR/P5IOC children devs
  [PATCH] powerpc: remove duplicate EXPORT_SYMBOLS
  [PATCH] powerpc: RTC memory corruption
  [PATCH] powerpc: enable NAP only on cpus who support it to avoid memory corruption
  [PATCH] powerpc: Clarify wording for CRASH_DUMP Kconfig option
  [PATCH] powerpc/64: enable CONFIG_BLK_DEV_SL82C105
  [PATCH] powerpc: correct cacheflush loop in zImage
  powerpc: Fix problem with time going backwards
  powerpc: Disallow lparcfg being a module

powerpc: update defconfigs

Signed-off-by: Paul Mackerras <paulus@samba.org>

[PATCH] powerpc: properly configure DDR/P5IOC children devs

The dynamic add path for PCI Host Bridges can fail to configure children
adapters under P5IOC controllers. It fails to properly fixup bus/device
resources, and it fails to properly enable EEH. Both of these steps
need to occur before any children devices are enabled in
pci_bus_add_devices().

Signed-off-by: John Rose <johnrose@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[PATCH] powerpc: remove duplicate EXPORT_SYMBOLS

remove warnings when building a 64bit kernel.
smp_call_function triggers also with 32bit kernel.

WARNING: vmlinux: duplicate symbol 'smp_call_function' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:164:EXPORT_SYMBOL(smp_call_function);
arch/powerpc/kernel/smp.c:300:EXPORT_SYMBOL(smp_call_function);

WARNING: vmlinux: duplicate symbol 'ioremap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:113:EXPORT_SYMBOL(ioremap);
arch/powerpc/mm/pgtable_64.c:321:EXPORT_SYMBOL(ioremap);

WARNING: vmlinux: duplicate symbol '__ioremap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:117:EXPORT_SYMBOL(__ioremap);
arch/powerpc/mm/pgtable_64.c:322:EXPORT_SYMBOL(__ioremap);

WARNING: vmlinux: duplicate symbol 'iounmap' previous definition was in vmlinux
arch/powerpc/kernel/ppc_ksyms.c:118:EXPORT_SYMBOL(iounmap);
arch/powerpc/mm/pgtable_64.c:323:EXPORT_SYMBOL(iounmap);

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[PATCH] powerpc: RTC memory corruption

We should be memset'ing the data we are pointing to, not the pointer
itself. This is in an error path so we probably don't hit it much.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[PATCH] powerpc: enable NAP only on cpus who support it to avoid memory corruption

This patch fixes incorrect setting of powersave_nap to 1 on all
PowerMacs, potentially causing memory corruption on some models. This
bug was introuced by me during the 32/64 bits merge.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[PATCH] powerpc: Clarify wording for CRASH_DUMP Kconfig option

The wording of the CRASH_DUMP Kconfig option is not very clear. It gives you a
kernel that can be used _as_ the kdump kernel, not a kernel that can boot into
a kdump kernel.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[PATCH] powerpc/64: enable CONFIG_BLK_DEV_SL82C105

Enable the onboard IDE driver for p610, p615 and p630.
They have the CD connected to this card. All other RS/6000 systems with this
controller have no connectors and dont need this option.

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[PATCH] powerpc: correct cacheflush loop in zImage

Correct the loop for cacheflush. No idea where I copied the code from,
but the original does not work correct. Maybe the flush is not needed.

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>

powerpc: Fix problem with time going backwards

The recent changes to keep gettimeofday in sync with xtime had the side
effect that it was occasionally possible for the time reported by
gettimeofday to go back by a microsecond. There were two reasons:
(1) when we recalculated the offsets used by gettimeofday every 2^31
timebase ticks, we lost an accumulated fractional microsecond, and
(2) because the update is done some time after the notional start of
jiffy, if ntp is slowing the clock, it is possible to see time go backwards
when the timebase factor gets reduced.

This fixes it by (a) slowing the gettimeofday clock by about 1us in
2^31 timebase ticks (a factor of less than 1 in 3.7 million), and (b)
adjusting the timebase offsets in the rare case that the gettimeofday
result could possibly go backwards (i.e. when ntp is slowing the clock
and the timer interrupt is late). In this case the adjustment will
reduce to zero eventually because of (a).

Signed-off-by: Paul Mackerras <paulus@samba.org>

Merge master.kernel.org:/home/rmk/linux-2.6-arm

* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] 3362/1: [cleanup] - duplicate decleration of mem_fclk_21285
  [ARM] 3365/1: [cleanup] header for compat.c exported functions
  [ARM] 3364/1: [cleanup] warning fix - definitions for enable_hlt and disable_hlt
  [ARM] 3363/1: [cleanup] process.c - fix warnings
  [ARM] 3358/1: [S3C2410] add missing SPI DMA resources
  [ARM] 3357/1: enable frontlight on collie
  [ARM] Fix "thead" typo

[PATCH] Fix ext2 readdir f_pos re-validation logic

This fixes not one, but _two_, silly (but admittedly hard to hit) bugs
in the ext2 filesystem "readdir()" function. It also cleans up the code
to avoid the unnecessary goto mess.

The bugs were related to re-valiating the f_pos value after somebody had
either done an "lseek()" on the directory to an invalid offset, or when
the offset had become invalid due to a file being unlinked in the
directory. The code would not only set the f_version too eagerly, it
would also not update f_pos appropriately for when the offset fixup took
place.

When that happened, we'd occasionally subsequently fail the readdir()
even when we shouldn't (no real harm done, but an ugly printk, and
obviously you would end up not necessarily seeing all entries).

Thanks to Masoud Sharbiani <masouds@google.com> who noticed the problem
and had a test-case for it, and also fixed up a thinko in the first
version of this patch.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Masoud Sharbiani <masouds@google.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[ARM] 3362/1: [cleanup] - duplicate decleration of mem_fclk_21285

Patch from Ben Dooks

arch/arm/kernel/setup.c declares mem_fclk_21285 when
this is already declared in include/asm-arm/system.h

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[ARM] 3365/1: [cleanup] header for compat.c exported functions

Patch from Ben Dooks

arch/arm/kernel/compat.c exports two functions,
convert_to_tag_list and squash_mem_tags which
are not defined in any header files, and not
used outside arch/arm/kernel.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[ARM] 3364/1: [cleanup] warning fix - definitions for enable_hlt and disable_hlt

Patch from Ben Dooks

The enable_hlt and disable_hlt should be declared in
include/asm/setup.h. This fixes sparse errors from
arch/arm/kernel/process.c

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[ARM] 3363/1: [cleanup] process.c - fix warnings

Patch from Ben Dooks

Fix the following warnings from sparse:

arch/arm/kernel/process.c:86:6: warning: symbol 'default_idle' was not declared. Should it be static?
arch/arm/kernel/process.c:378:5: warning: symbol 'dump_fpu' was not declared. Should it be static?

Include <linux/elfcore.h> for dump_fpu() decleration, and
make default_idle() static as it is not used outside the file.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[PATCH] ieee80211: Fix QoS is not active problem

Fix QoS is not active even the network and the card is QOS enabled.
The problem is we pass the wrong ieee80211_network address to
ipw_handle_beacon/ipw_handle_probe_response, thus the
ieee80211_network->qos_data.active will not be set, causing the driver
not sending QoS frames at all.

Signed-off-by: Hong Liu <hong.liu@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

[PATCH] ieee80211: Fix CCMP decryption problem when QoS is enabled

Use the correct STYPE for Qos data.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

e100: fix eeh on pseries during ethtool -t

Olaf Hering reported a problem on pseries with e100 where ethtool -t would
cause a bus error, and the e100 driver would stop working. Due to the new
load ucode command the cb list must be allocated before calling
e100_init_hw, so remove the call and just let e100_up take care of it.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

[PATCH] fs/namespace.c:dup_namespace(): fix a use after free

The Coverity checker spotted the following bug in dup_namespace():

<--  snip  -->

        if (!new_ns->root) {
                up_write(&namespace_sem);
                kfree(new_ns);
                goto out;
        }
...
out:
        return new_ns;

<--  snip  -->

Callers expect a non-NULL result to not be freed.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[ARM] 3358/1: [S3C2410] add missing SPI DMA resources

Patch from Albrecht Dreß

Add DMA resources to s3c2410 spi platform devices - dma_(alloc|free)_coherent should now work as expected.

Signed-off-by: Albrecht Dreß <albrecht.dress@lios-tech.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[ARM] 3357/1: enable frontlight on collie

Patch from Pavel Machek

Enable frontlight during collie bootup, so that display is actually
readable in anything other than bright sunlight.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[ARM] Fix "thead" typo

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

[PATCH] Consistent capabilites associated with MPOL_MOVE_ALL

It seems that setting scheduling policy and priorities is also the kind of
thing that might be performed in apps that also use the NUMA API, so it
would seem consistent to use CAP_SYS_NICE for NUMA also.

So use CAP_SYS_NICE for controlling migration permissions.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Page migration documentation update

Update the documentation for page migration.

- Fix up bits and pieces in cpusets.txt

- Rework text in vm/page-migration to be clearer and reflect the final
  version of page migration in 2.6.16. Mention Andi Kleen's numactl
  package that contains user space tools for page migration via
  libnuma. Add reference to numa_maps and to the manpage in numactl.

- Add todo list for outstanding issues

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] page migration: fail if page is in a vma flagged VM_LOCKED

page migration currently simply retries a couple of times if try_to_unmap()
fails without inspecting the return code.

However, SWAP_FAIL indicates that the page is in a vma that has the
VM_LOCKED flag set (if ignore_refs ==1). We can check for that return code
and avoid retrying the migration.

migrate_page_remove_references() now needs to return a reason why the
failure occured. So switch migrate_page_remove_references to use -Exx
style error messages.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] "s390: multiple subchannel sets support" fix

It seems this patch got dropped (it was in addition to the `s390:
improve response code handling in chsc_enable_facility()' patch).

Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Merge git://oss.sgi.com:8090/oss/git/rc-fixes

* git://oss.sgi.com:8090/oss/git/rc-fixes:
Fix a direct I/O locking issue revealed by the new mutex code.

Fix a direct I/O locking issue revealed by the new mutex code.
Affects only XFS (i.e. DIO_OWN_LOCKING case) - currently it is
not possible to get i_mutex locking correct when using DIO_OWN
direct I/O locking in a filesystem due to indeterminism in the
possible return code/lock/unlock combinations. This can cause
a direct read to attempt a double i_mutex unlock inside XFS.

We're now ensuring __blockdev_direct_IO always exits with the
inode i_mutex (still) held for a direct reader.

Tested with the three different locking modes (via direct block
device access, ext3 and XFS) - both reading and writing; cannot
find any regressions resulting from this change, and it clearly
fixes the mutex_unlock warning originally reported here:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114189068126253&w=2

Signed-off-by: Nathan Scott <nathans@sgi.com>
Acked-by: Christoph Hellwig <hch@lst.de>

[PATCH] JFS: Take logsync lock before testing mp->lsn

This fixes a race where lsn could be cleared before taking the lock

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] zfcp: fix device registration issues
  [SCSI] scsi_transport_fc: fix FC_HOST_NUM_ATTRS
  [SCSI] scsi: aha152x pcmcia driver needs spi transport
  [SCSI] zfcp: correctly set this_id for hosts
  [SCSI] Add Brownie to blacklist

[PATCH] Plug kdump shutdown race window

lapic_shutdown() re-enables interrupts which is un-desirable for panic
case, so use local_irq_save() and local_irq_restore() to keep the irqs
disabled for kexec on panic case, and close a possible race window while
kdump shutdown as shown in this stack trace

   -- BUG: spinlock lockup on CPU#1, bash/4396, c52781a0
   [<c01c1870>] _raw_spin_lock+0xb7/0xd2
   [<c029e148>] _spin_lock+0x6/0x8
   [<c011b33f>] scheduler_tick+0xe7/0x328
   [<c0128a7c>] update_process_times+0x51/0x5d
   [<c0114592>] smp_apic_timer_interrupt+0x4f/0x58
   [<c01141ff>] lapic_shutdown+0x76/0x7e
   [<c0104d7c>] apic_timer_interrupt+0x1c/0x30
   [<c01141ff>] lapic_shutdown+0x76/0x7e
   [<c0116659>] machine_crash_shutdown+0x83/0xaa
   [<c013cc36>] crash_kexec+0xc1/0xe3
   [<c029e148>] _spin_lock+0x6/0x8
   [<c013cc22>] crash_kexec+0xad/0xe3
   [<c0215280>] __handle_sysrq+0x84/0xfd
   [<c018d937>] write_sysrq_trigger+0x2c/0x35
   [<c015e47b>] vfs_write+0xa2/0x13b
   [<c015ea73>] sys_write+0x3b/0x64
   [<c0103c69>] syscall_call+0x7/0xb

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>