git.proxmox.com Git - mirror_ubuntu-zesty-kernel.git/log

md/raid1: store behind-write pages in bi_vecs.

When performing write-behind we allocate pages to store the data
during write.
Previously we just keep a list of pages. Now we keep a list of
bi_vec which includes offset and size.
This means that the r1bio has complete information to create a new
bio which will be needed for retrying after write errors.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md/raid1: clear bad-block record when write succeeds.

If we succeed in writing to a block that was recorded as
being bad, we clear the bad-block record.

This requires some delayed handling as the bad-block-list update has
to happen in process-context.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md/raid1: avoid writing to known-bad blocks on known-bad drives.

If we have seen any write error on a drive, then don't write to
any known-bad blocks on that drive.
If necessary, we divide the write request up into pieces just
like we do for reads, so each piece is either all written or
all not written to any given drive.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md: update documentation for md/rdev/state sysfs interface

Previous patches in the bad block series extended behavior of
rdev's 'state' interface but lacked documentation update.
Fix it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: make it easier to wait for bad blocks to be acknowledged.

It is only safe to choose not to write to a bad block if that bad
block is safely recorded in metadata - i.e. if it has been
'acknowledged'.

If it hasn't we need to wait for the acknowledgement.

We support that using rdev->blocked wait and
md_wait_for_blocked_rdev by introducing a new device flag
'BlockedBadBlock'.

This flag is only advisory.
It is cleared whenever we acknowledge a bad block, so that a waiter
can re-check the particular bad blocks that it is interested it.

It should be set by a caller when they find they need to wait.
This (set after test) is inherently racy, but as
md_wait_for_blocked_rdev already has a timeout, losing the race will
have minimal impact.

When we clear "Blocked" was also clear "BlockedBadBlocks" incase it
was set incorrectly (see above race).

We also modify the way we manage 'Blocked' to fit better with the new
handling of 'BlockedBadBlocks' and to make it consistent between
externally managed and internally managed metadata.   This requires
that each raidXd loop checks if the metadata needs to be written and
triggers a write (md_check_recovery) if needed.  Otherwise a queued
write request might cause raidXd to wait for the metadata to write,
and only that thread can write it.

Before writing metadata, we set FaultRecorded for all devices that
are Faulty, then after writing the metadata we clear Blocked for any
device for which the Fault was certainly Recorded.

The 'faulty' device flag now appears in sysfs if the device is faulty
*or* it has unacknowledged bad blocks.  So user-space which does not
understand bad blocks can continue to function correctly.
User space which does, should not assume a device is faulty until it
sees the 'faulty' flag, and then sees the list of unacknowledged bad
blocks is empty.

Signed-off-by: NeilBrown <neilb@suse.de>

md: add 'write_error' flag to component devices.

If a device has ever seen a write error, we will want to handle
known-bad-blocks differently.
So create an appropriate state flag and export it via sysfs.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md/raid1: avoid reading known bad blocks during resync

When performing resync/etc, keep the size of the request
small enough that it doesn't overlap any known bad blocks.
Devices with badblocks at the start of the request are completely
excluded.
If there is nowhere to read from due to bad blocks, record
a bad block on each target device.

Now that we never read from known-bad-blocks we can allow devices with
known-bad-blocks into a RAID1.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid1: avoid reading from known bad blocks.

Now that we have a bad block list, we should not read from those
blocks.
There are several main parts to this:
  1/ read_balance needs to check for bad blocks, and return not only
     the chosen device, but also how many good blocks are available
     there.
  2/ fix_read_error needs to avoid trying to read from bad blocks.
  3/ read submission must be ready to issue multiple reads to
     different devices as different bad blocks on different devices
     could mean that a single large read cannot be served by any one
     device, but can still be served by the array.
     This requires keeping count of the number of outstanding requests
     per bio.  This count is stored in 'bi_phys_segments'
  4/ retrying a read needs to also be ready to submit a smaller read
     and queue another request for the rest.

This does not yet handle bad blocks when reading to perform resync,
recovery, or check.

'md_trim_bio' will also be used for RAID10, so put it in md.c and
export it.

Signed-off-by: NeilBrown <neilb@suse.de>

md: Disable bad blocks and v0.90 metadata.

v0.90 metadata cannot record bad blocks, so when loading metadata
for such a device, set shift to -1.

Signed-off-by: NeilBrown <neilb@suse.de>

md: load/store badblock list from v1.x metadata

Space must have been allocated when array was created.
A feature flag is set when the badblock list is non-empty, to
ensure old kernels don't load and trust the whole device.

We only update the on-disk badblocklist when it has changed.
If the badblocklist (or other metadata) is stored on a bad block, we
don't cope very well.

If metadata has no room for bad block, flag bad-blocks as disabled,
and do the same for 0.90 metadata.

Signed-off-by: NeilBrown <neilb@suse.de>

md: don't allow arrays to contain devices with bad blocks.

As no personality understand bad block lists yet, we must
reject any device that is known to contain bad blocks.
As the personalities get taught, these tests can be removed.

This only applies to raid1/raid5/raid10.
For linear/raid0/multipath/faulty the whole concept of bad blocks
doesn't mean anything so there is no point adding the checks.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md: add documentation for bad block log

Previous patch in the bad block series added new sysfs interfaces
([unacknowledged_]bad_blocks) for each rdev without documentation.
Add it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/bad-block-log: add sysfs interface for accessing bad-block-log.

This can show the log (providing it fits in one page) and
allows bad blocks to be 'acknowledged' meaning that they
have safely been recorded in metadata.

Clearing bad blocks is not allowed via sysfs (except for
code testing). A bad block can only be cleared when
a write to the block succeeds.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md: beginnings of bad block management.

This the first step in allowing md to track bad-blocks per-device so
that we can fail individual blocks rather than the whole device.

This patch just adds a data structure for recording bad blocks, with
routines to add, remove, search the list.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors
  Btrfs: use the commit_root for reading free_space_inode crcs
  Btrfs: reduce extent_state lock contention for metadata
  Btrfs: remove lockdep magic from btrfs_next_leaf
  Btrfs: make a lockdep class for each root
  Btrfs: switch the btrfs tree locks to reader/writer
  Btrfs: fix deadlock when throttling transactions
  Btrfs: stop using highmem for extent_buffers
  Btrfs: fix BUG_ON() caused by ENOSPC when relocating space
  Btrfs: tag pages for writeback in sync
  Btrfs: fix enospc problems with delalloc
  Btrfs: don't flush delalloc arbitrarily
  Btrfs: use find_or_create_page instead of grab_cache_page
  Btrfs: use a worker thread to do caching
  Btrfs: fix how we merge extent states and deal with cached states
  Btrfs: use the normal checksumming infrastructure for free space cache
  Btrfs: serialize flushers in reserve_metadata_bytes
  Btrfs: do transaction space reservation before joining the transaction
  Btrfs: try to only do one btrfs_search_slot in do_setxattr

md: remove suspicious size_of()

When calling bioset_create we pass the size of the front_pad as
   sizeof(mddev)
which looks suspicious as mddev is a pointer and so it looks like a
common mistake where
   sizeof(*mddev)
was intended.
The size is actually correct as we want to store a pointer in the
front padding of the bios created by the bioset, so make the intent
more explicit by using
   sizeof(mddev_t *)

Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: optimize the negative xattr caching
  xfs: prevent against ioend livelocks in xfs_file_fsync
  xfs: flag all buffers as metadata
  xfs: encapsulate a block of debug code

Merge branch 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

* 'nfs-for-3.1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
  NFSv4: Don't use the delegation->inode in nfs_mark_return_delegation()
  nfs: don't use d_move in nfs_async_rename_done
  RDMA: Increasing RPCRDMA_MAX_DATA_SEGS
  SUNRPC: Replace xprt->resend and xprt->sending with a priority queue
  SUNRPC: Allow caller of rpc_sleep_on() to select priority levels
  SUNRPC: Support dynamic slot allocation for TCP connections
  SUNRPC: Clean up the slot table allocation
  SUNRPC: Initalise the struct xprt upon allocation
  SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot
  pnfs: simplify pnfs files module autoloading
  nfs: document nfsv4 sillyrename issues
  NFS: Convert nfs4_set_ds_client to EXPORT_SYMBOL_GPL
  SUNRPC: Convert the backchannel exports to EXPORT_SYMBOL_GPL
  SUNRPC: sunrpc should not explicitly depend on NFS config options
  NFS: Clean up - simplify the switch to read/write-through-MDS
  NFS: Move the pnfs write code into pnfs.c
  NFS: Move the pnfs read code into pnfs.c
  NFS: Allow the nfs_pageio_descriptor to signal that a re-coalesce is needed
  NFS: Use the nfs_pageio_descriptor->pg_bsize in the read/write request
  NFS: Cache rpc_ops in struct nfs_pageio_descriptor
  ...

Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Convert to DIV_ROUND_UP_SECTOR_T usage for sectors / dev_max_sectors
  kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage
  iscsi-target: Add iSCSI fabric support for target v4.1
  iscsi: Add Serial Number Arithmetic LT and GT into iscsi_proto.h
  iscsi: Use struct scsi_lun in iscsi structs instead of u8[8]
  iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi

Merge branch 'integration' into for-linus

Btrfs: make sure reserve_metadata_bytes doesn't leak out strange errors

The btrfs transaction code will return any errors that come from
reserve_metadata_bytes. We need to make sure we don't return funny
things like 1 or EAGAIN.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

signals: sys_ssetmask/sys_rt_sigsuspend should use set_current_blocked()

sys_ssetmask(), sys_rt_sigsuspend() and compat_sys_rt_sigsuspend()
change ->blocked directly. This is not correct, see the changelog in
e6fa16ab "signal: sigprocmask() should do retarget_shared_pending()"

Change them to use set_current_blocked().

Another change is that now we are doing ->saved_sigmask = ->blocked
lockless, it doesn't make any sense to do this under ->siglock.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Matt Fleming <matt.fleming@linux.intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sparc: rename atomic_add_unless

Should have been done in commit 1af08a1407f4 ("This is in preparation
for more generic atomic").

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Arun Sharma <asharma@fb.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Hans-Christian Egtvedt" <hans-christian.egtvedt@atmel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

proc: make struct proc_dir_entry::name a terminal array rather than a pointer

Since __proc_create() appends the name it is given to the end of the PDE
structure that it allocates, there isn't a need to store a name pointer.
Instead we can just replace the name pointer with a terminal char array of
_unspecified_ length.  The compiler will simply append the string to statically
defined variables of PDE type overlapping any hole at the end of the structure
and, unlike specifying an explicitly _zero_ length array, won't give a warning
if you try to statically initialise it with a string of more than zero length.

Also, whilst we're at it:

(1) Move namelen to end just prior to name and reduce it to a single byte
     (name shouldn't be longer than NAME_MAX).

(2) Move pde_unload_lock two places further on so that if it's four bytes in
     size on a 64-bit machine, it won't cause an unused hole in the PDE struct.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sound: oss: rename local change_bits to avoid powerpc bitsops.h definition

This collides with powerpc exported functions from bitops.h. Rename the
local copy in the oss soundblaster mixer and ad1848 driver.

Signed-off-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Btrfs: use the commit_root for reading free_space_inode crcs

Now that we are using regular file crcs for the free space cache,
we can deadlock if we try to read the free_space_inode while we are
updating the crc tree.

This commit fixes things by using the commit_root to read the crcs. This is
safe because we the free space cache file would already be loaded if
that block group had been changed in the current transaction.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: reduce extent_state lock contention for metadata

For metadata buffers that don't straddle pages (all of them), btrfs
can safely use the page uptodate bits and extent_buffer uptodate bit
instead of needing to use the extent_state tree.

This greatly reduces contention on the state tree lock.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: remove lockdep magic from btrfs_next_leaf

Before the reader/writer locks, btrfs_next_leaf needed to keep
the path blocking to avoid making lockdep upset.

Now that btrfs_next_leaf only takes read locks, this isn't required.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: make a lockdep class for each root

This patch was originally from Tejun Heo. lockdep complains about the btrfs
locking because we sometimes take btree locks from two different trees at the
same time. The current classes are based only on level in the btree, which
isn't enough information for lockdep to figure out if the lock is safe.

This patch makes a class for each type of tree, and lumps all the FS trees that
actually have files and directories into the same class.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: switch the btrfs tree locks to reader/writer

The btrfs metadata btree is the source of significant
lock contention, especially in the root node.   This
commit changes our locking to use a reader/writer
lock.

The lock is built on top of rw spinlocks, and it
extends the lock tracking to remember if we have a
read lock or a write lock when we go to blocking.  Atomics
count the number of blocking readers or writers at any
given time.

It removes all of the adaptive spinning from the old code
and uses only the spinning/blocking hints inside of btrfs
to decide when it should continue spinning.

In read heavy workloads this is dramatically faster.  In write
heavy workloads we're still faster because of less contention
on the root node lock.

We suffer slightly in dbench because we schedule more often
during write locks, but all other benchmarks so far are improved.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix deadlock when throttling transactions

Hit this nice little deadlock.  What happens is this

__btrfs_end_transaction with throttle set, --use_count so it equals 0
  btrfs_commit_transaction
    <somebody else actually manages to start the commit>
    btrfs_end_transaction --use_count so now its -1 <== BAD
      we just return and wait on the transaction

This is bad because we just return after our use_count is -1 and don't let go
of our num_writer count on the transaction, so the guy committing the
transaction just sits there forever.  Fix this by inc'ing our use_count if we're
going to call commit_transaction so that if we call btrfs_end_transaction it's
valid.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: stop using highmem for extent_buffers

The extent_buffers have a very complex interface where
we use HIGHMEM for metadata and try to cache a kmap mapping
to access the memory.

The next commit adds reader/writer locks, and concurrent use
of this kmap cache would make it even more complex.

This commit drops the ability to use HIGHMEM with extent buffers,
and rips out all of the related code.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix BUG_ON() caused by ENOSPC when relocating space

When we balanced the chunks across the devices, BUG_ON() in
__finish_chunk_alloc() was triggered.

------------[ cut here ]------------
kernel BUG at fs/btrfs/volumes.c:2568!
[SNIP]
Call Trace:
[<ffffffffa049525e>] btrfs_alloc_chunk+0x8e/0xa0 [btrfs]
[<ffffffffa04546b0>] do_chunk_alloc+0x330/0x3a0 [btrfs]
[<ffffffffa045c654>] btrfs_reserve_extent+0xb4/0x1f0 [btrfs]
[<ffffffffa045c86b>] btrfs_alloc_free_block+0xdb/0x350 [btrfs]
[<ffffffffa048a8d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs]
[<ffffffffa04476fd>] __btrfs_cow_block+0x14d/0x5e0 [btrfs]
[<ffffffffa044660d>] ? read_block_for_search+0x14d/0x4d0 [btrfs]
[<ffffffffa0447c9b>] btrfs_cow_block+0x10b/0x240 [btrfs]
[<ffffffffa044dd5e>] btrfs_search_slot+0x49e/0x7a0 [btrfs]
[<ffffffffa044f07d>] btrfs_insert_empty_items+0x8d/0xf0 [btrfs]
[<ffffffffa045e973>] insert_with_overflow+0x43/0x110 [btrfs]
[<ffffffffa045eb0d>] btrfs_insert_dir_item+0xcd/0x1f0 [btrfs]
[<ffffffffa0489bd0>] ? map_extent_buffer+0xb0/0xc0 [btrfs]
[<ffffffff812276ad>] ? rb_insert_color+0x9d/0x160
[<ffffffffa046cc40>] ? inode_tree_add+0xf0/0x150 [btrfs]
[<ffffffffa0474801>] btrfs_add_link+0xc1/0x1c0 [btrfs]
[<ffffffff811dacac>] ? security_inode_init_security+0x1c/0x30
[<ffffffffa04a28aa>] ? btrfs_init_acl+0x4a/0x180 [btrfs]
[<ffffffffa047492f>] btrfs_add_nondir+0x2f/0x70 [btrfs]
[<ffffffffa046af16>] ? btrfs_init_inode_security+0x46/0x60 [btrfs]
[<ffffffffa0474ac0>] btrfs_create+0x150/0x1d0 [btrfs]
[<ffffffff81159c63>] ? generic_permission+0x23/0xb0
[<ffffffff8115b415>] vfs_create+0xa5/0xc0
[<ffffffff8115ce6e>] do_last+0x5fe/0x880
[<ffffffff8115dc0d>] path_openat+0xcd/0x3d0
[<ffffffff8115e029>] do_filp_open+0x49/0xa0
[<ffffffff8116a965>] ? alloc_fd+0x95/0x160
[<ffffffff8114f0c7>] do_sys_open+0x107/0x1e0
[<ffffffff810bcc3f>] ? audit_syscall_entry+0x1bf/0x1f0
[<ffffffff8114f1e0>] sys_open+0x20/0x30
[<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b
[SNIP]
RIP  [<ffffffffa049444a>] __finish_chunk_alloc+0x20a/0x220 [btrfs]

The reason is:
Task1 Space balance task
do_chunk_alloc()
  __finish_chunk_alloc()
    update device info
    in the chunk tree
      alloc system metadata block
relocate system metadata block group
  set system metadata block group
  readonly, This block group is the
  only one that can allocate space. So
  there is no free space that can be
  allocated now.
        find no space and don't try
        to alloc new chunk, and then
        return ENOSPC
  BUG_ON() in __finish_chunk_alloc()
  was triggered.

Fix this bug by allocating a new system metadata chunk before relocating the
old one if we find there is no free space which can be allocated after setting
the old block group to be read-only.

Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: tag pages for writeback in sync

Everybody else does this, we need to do it too. If we're syncing, we need to
tag the pages we're going to write for writeback so we don't end up writing the
same stuff over and over again if somebody is constantly redirtying our file.
This will keep us from having latencies with heavy sync workloads. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix enospc problems with delalloc

So I had this brilliant idea to use atomic counters for outstanding and reserved
extents, but this turned out to be a bad idea. Consider this where we have 1
outstanding extent and 1 reserved extent

Reserver Releaser
atomic_dec(outstanding) now 0
atomic_read(outstanding)+1 get 1
atomic_read(reserved) get 1
don't actually reserve anything because
they are the same
atomic_cmpxchg(reserved, 1, 0)
atomic_inc(outstanding)
atomic_add(0, reserved)
free reserved space for 1 extent

Then the reserver now has no actual space reserved for it, and when it goes to
finish the ordered IO it won't have enough space to do it's allocation and you
get those lovely warnings.

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: don't flush delalloc arbitrarily

Kill the check to see if we have 512mb of reserved space in delalloc and
shrink_delalloc if we do. This causes unexpected latencies and we have other
logic to see if we need to throttle. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: use find_or_create_page instead of grab_cache_page

grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to
GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we
need GFP_NOFS so we don't deadlock. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>

Btrfs: use a worker thread to do caching

A user reported a deadlock when copying a bunch of files.  This is because they
were low on memory and kthreadd got hung up trying to migrate pages for an
allocation when starting the caching kthread.  The page was locked by the person
starting the caching kthread.  To fix this we just need to use the async thread
stuff so that the threads are already created and we don't have to worry about
deadlocks.  Thanks,

Reported-by: Roman Mamedov <rm@romanrm.ru>
Signed-off-by: Josef Bacik <josef@redhat.com>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
jfs: clean up some compiler warnings

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
GFS2: Fix mount hang caused by certain access pattern to sysfs files

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (22 commits)
  ALSA: hda - Cirrus Logic CS421x support
  ALSA: Make pcm.h self-contained
  ALSA: hda - Allow codec-specific set_power_state ops
  ALSA: hda - Add post_suspend patch ops
  ALSA: hda - Make CONFIG_SND_HDA_POWER_SAVE depending on CONFIG_PM
  ALSA: hda - Make sure mute led reflects master mute state
  ALSA: hda - Fix invalid mute led state on resume of IDT codecs
  ASoC: Revert "ASoC: SAMSUNG: Add I2S0 internal dma driver"
  ALSA: hda - Add support of the 4 internal speakers on certain HP laptops
  ALSA: Make snd_pcm_debug_name usable outside pcm_lib
  ALSA: hda - Fix DAC filling for multi-connection pins in Realtek parser
  ASoC: dapm - Add methods to retrieve snd_card and soc_card from dapm context.
  ASoC: SAMSUNG: Add I2S0 internal dma driver
  ASoC: SAMSUNG: Modify I2S driver to support idma
  ASoC: davinci: add missing break statement
  ASoC: davinci: fix codec start and stop functions
  ASoC: dapm - add DAPM macro for external enum widgets
  ASoC: Acknowledge WM8962 interrupts before acting on them
  ASoC: sgtl5000: guide user when regulator support is needed
  ASoC: sgtl5000: refactor registering internal ldo
  ...

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (53 commits)
  Input: synaptics - fix reporting of min coordinates
  Input: tegra-kbc - enable key autorepeat
  Input: kxtj9 - fix locking typo in kxtj9_set_poll()
  Input: kxtj9 - fix bug in probe()
  Input: intel-mid-touch - remove pointless checking for variable 'found'
  Input: hp_sdc - staticize hp_sdc_kicker()
  Input: pmic8xxx-keypad - fix a leak of the IRQ during init failure
  Input: cy8ctmg110_ts - set reset_pin and irq_pin from platform data
  Input: cy8ctmg110_ts - constify i2c_device_id table
  Input: cy8ctmg110_ts - fix checking return value of i2c_master_send
  Input: lifebook - make dmi callback functions return 1
  Input: atkbd - make dmi callback functions return 1
  Input: gpio_keys - switch to using SIMPLE_DEV_PM_OPS
  Input: gpio_keys - add support for device-tree platform data
  Input: aiptek - remove double define
  Input: synaptics - set minimum coordinates as reported by firmware
  Input: synaptics - process button bits in AGM packets
  Input: synaptics - rename set_slot to be more descriptive
  Input: synaptics - fuzz position for touchpad with reduced filtering
  Input: synaptics - set resolution for MT_POSITION_X/Y axes
  ...

Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze

* 'next' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Do not show error message for 32 interrupt lines
  Revert "microblaze: PCI fix typo fault in of_node pointer moving into pci_bus"
  microblaze: PCI fix typo fault in of_node pointer moving into pci_bus
  microblaze: Add support for early console on mdm
  microblaze: Simplify early console binding from DT
  microblaze: Get early printk console earlier
  microblaze: Standardise cpuinfo output for cache policy
  microblaze: Unprivileged stream instruction awareness
  microblaze: trivial: Fix typo fault
  microblaze: exec: Remove redundant set_fs(USER_DS)
  microblaze: Remove duplicated prototype of start_thread()
  microblaze: Fix unaligned value saving to the stack for system with MMU
  microblaze/irqs: Do not trace arch_local_{*,irq_*} functions

staging: brcm80211: Fix double include introduced by bad merge

A merge with Linus' tree added a double include of linux/interrupt.h.
Fix by removing one of the includes.

Signed-off-by: Daniel Morsing <daniel.morsing@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ARM: tegra: only select MACH_HAS_SND_SOC_TEGRA_WM8903 if SND_SOC

This fixes:
warning: (MACH_HARMONY && MACH_KAEN && MACH_SEABOARD) selects MACH_HAS_SND_SOC_TEGRA_WM8903 which has unmet direct dependencies (SOUND && !M68K && SND && SND_SOC)

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Stephen Warren <swarren@nvidia.com>
Acked-By: Colin Cross <ccross@android.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

ALSA: hda - Fix duplicated DAC assignments for Realtek

Copying hp_pins and speaker_pins from line_out_pins may confuse the
parser, and it can lead to duplicated initializations for the same pin
with a wrong DAC assignment. The problem appears in 3.0 kernel code.

Cc: <stable@kernel.org> (for 3.0)
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: asihpi - off by one in asihpi_hpi_ioctl()

"adapter" is used as an array index in the adapters[] array so
the off by one would make us read past the end.

1c073b67979 "ALSA: asihpi - Remove spurious adapter index check"
reverted Dan Rosenberg's check that would have prevented the
overflow here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: hda - Fix Oops with Realtek quirks with NULL adc_nids

Somce quirk models don't set adc_nids but let the parser filling it.
But the recent code has unnecessary NULL-checks of spec->input_mux,
and it resulted in NULL dereferences.
This patch fixes that regression.

Reported-and-tested-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

gro: Only reset frag0 when skb can be pulled

Currently skb_gro_header_slow unconditionally resets frag0 and
frag0_len. However, when we can't pull on the skb this leaves
the GRO fields in an inconsistent state.

This patch fixes this by only resetting those fields after the
pskb_may_pull test.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

microblaze: Do not show error message for 32 interrupt lines

When interrupt controller uses 32 interrupts lines the kernel
show error message about mismatch in kind-of-intr parameter
because it exceeds u32. Recast fixs this issue.

Signed-off-by: Michal Simek <monstr@monstr.eu>

ALSA: asihpi - bug fix pa use before init.

Fixes bug introduced by 1c073b67.
Also declare pa local to block in which it is used.

Signed-off-by: Eliot Blennerhassett <eblennerhassett@audioscience.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'next' into for-linus

ALSA: hda - Add support for vref-out based mute LED control on IDT codecs

This patch also registers all necessary callbacks to support mute LED
only when such control is enabled. And it keeps codec AFG in D0 or D1
state all the time when aggressive power managemnt is enabled for vref-out
control (and mute LED) work correctly.

Signed-off-by: Vitaliy Kulikov <Vitaliy.Kulikov@idt.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

xfs: optimize the negative xattr caching

Since the addition of file capabilities every write needs to read xattrs to
check if we have any capabilities to clear. In Linux 3.0 Andi Kleen added
a flag to cache the fact that we do not have any attributes on an inode.
Make sure to already mark a file as not having any attributes when reading
it from disk in case it doesn't even have an attribute fork. Based on an
earlier patch from Andi Kleen.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfs: prevent against ioend livelocks in xfs_file_fsync

We need to take some locks to prevent new ioends from coming in when we wait
for all existing ones to go away. Up to Linux 3.0 that was done using the
i_mutex held by the VFS fsync code, but now that we are called without
it we need to take care of it ourselves. Use the I/O lock instead of
i_mutex just like we do in other places.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfs: flag all buffers as metadata

Now that REQ_META bios aren't treated specially in the CFQ I/O schedule
anymore, we can tag all buffers as metadata to make blktrace traces more
meaningful. Note that we use buffers also to zero out partial blocks
in the preallocation / hole punching code, and while they operate on
data blocks the zeros written certainly aren't data. I think this case
is borderline metadata enough to not bother special casing it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>

xfs: encapsulate a block of debug code

Pull into a helper function some debug-only code that validates a
xfs_da_blkinfo structure that's been read from disk.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>

dmaengine: imx-sdma: add device tree probe support

It adds device tree probe support for imx-sdma driver.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Vinod Koul <vinod.koul@intel.com>

dmaengine: imx-sdma: sdma_get_firmware does not need to copy fw_name

It does not need to allocate space and copy fw_name in function
sdma_get_firmware().

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Vinod Koul <vinod.koul@intel.com>

dmaengine: imx-sdma: use platform_device_id to identify sdma version

It might be not good to use software defined version to identify sdma
device type, when hardware does not define such version. Instead,
soc name is stable enough to define the device type.

The patch uses platform_device_id rather than version number passed
by platform data to identify sdma device type/version.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Vinod Koul <vinod.koul@intel.com>

mmc: sdhci-esdhc-imx: add device tree probe support

The patch adds device tree probe support for sdhci-esdhc-imx driver.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Cc: Chris Ball <cjb@laptop.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-pltfm: dt device does not pass parent to sdhci_alloc_host

Neither platform based nor dt based device needs to pass the parent
to sdhci_alloc_host. There is no difference between platform and dt
on this point.

The patch makes the change to pass device itself than its parent to
sdhci_alloc_host for dt case too. Otherwise the probe function of
sdhci based drivers which is shared between platform and dt will
fail on dt case.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Chris Ball <cjb@laptop.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc-imx: get rid of the uses of cpu_is_mx()

The patch removes all the uses of cpu_is_mx(). Instead, it utilizes
platform_device_id to distinguish the esdhc differences among SoCs.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Cc: Chris Ball <cjb@laptop.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc-imx: do not reference platform data after probe

The patch copies platform data into pltfm_imx_data and reference
the data there than platform data after probe.

This work is inspired by Grant Likely and Troy Kisky.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Troy Kisky <troy.kisky@boundarydevices.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Cc: Chris Ball <cjb@laptop.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc-imx: extend card_detect and write_protect support for mx5

The patch extends card_detect and write_protect support to get mx5
family and more scenarios supported.  The changes include:

* Turn platform_data from optional to mandatory
* Add cd_types and wp_types into platform_data to cover more use
   cases
* Remove the use of flag ESDHC_FLAG_GPIO_FOR_CD
* Adjust some machine codes to adopt the platform_data changes
* Work around the issue that software reset will get card detection
   circuit stop working

With this patch, card_detect and write_protect gets supported on
mx5 based platforms.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Chris Ball <cjb@laptop.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Acked-by: Chris Ball <cjb@laptop.org>

net/fec: add device tree probe support

It adds device tree probe support for fec driver.

Signed-off-by: Jason Liu <jason.hui@linaro.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>

net: ibm_newemac: convert it to use of_get_phy_mode

The patch extends 'enum phy_interface_t' and of_get_phy_mode a little
bit with PHY_INTERFACE_MODE_NA and PHY_INTERFACE_MODE_SMII added,
and then converts ibm_newemac net driver to use of_get_phy_mode
getting phy mode from device tree.

It also resolves the namespace conflict on phy_read/write between
common mdiobus interface and ibm_newemac private one.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David Miller <davem@davemloft.net>

dt/net: add helper function of_get_phy_mode

It adds the helper function of_get_phy_mode getting phy interface
from device tree.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David Miller <davem@davemloft.net>

net/fec: gasket needs to be enabled for some i.mx

On the recent i.mx (mx25/50/53), there is a gasket inside fec
controller which needs to be enabled no matter phy works in MII
or RMII mode.

The current code enables the gasket only when phy interface is RMII.
It's broken when the driver works with a MII phy. The patch uses
platform_device_id to distinguish the SoCs that have the gasket and
enables it on these SoCs for both MII and RMII mode.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Reported-by: Troy Kisky <troy.kisky@boundarydevices.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>

serial/imx: add device tree probe support

It adds device tree probe support for imx tty/serial driver.

Signed-off-by: Jeremy Kerr <jeremy.kerr@canonical.com>
Signed-off-by: Jason Liu <jason.hui@linaro.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Grant Likely <grant.likely@secretlab.ca>

serial/imx: get rid of the uses of cpu_is_mx1()

The patch removes all the uses of cpu_is_mx1(). Instead, it uses
the .id_table of platform_driver to distinguish the uart device type,
IMX1_UART and IMX21_UART. The IMX21_UART type runs on all i.mx
except i.mx1.

A couple of !cpu_is_mx1 logic gets turned into is_imx21_uart,
as the codes wrapped there are really IMX21 type uart specific.

It also removes macro MX1_UCR3_REF25 and MX1_UCR3_REF30 which are
not used anywhere.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Grant Likely <grant.likely@secretlab.ca>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  merge fchmod() and fchmodat() guts, kill ancient broken kludge
  xfs: fix misspelled S_IS...()
  xfs: get rid of open-coded S_ISREG(), etc.
  vfs: document locking requirements for d_move, __d_move and d_materialise_unique
  omfs: fix (mode & S_IFDIR) abuse
  btrfs: S_ISREG(mode) is not mode & S_IFREG...
  ima: fmode_t misspelled as mode_t...
  pci-label.c: size_t misspelled as mode_t
  jffs2: S_ISLNK(mode & S_IFMT) is pointless
  snd_msnd ->mode is fmode_t, not mode_t
  v9fs_iop_get_acl: get rid of unused variable
  vfs: dont chain pipe/anon/socket on superblock s_inodes list
  Documentation: Exporting: update description of d_splice_alias
  fs: add missing unlock in default_llseek()

MD: generate an event when array sync is complete

This patch causes MD to generate an event (for device-mapper) when the
synchronization thread is reaped. This is expected behavior for device-mapper.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

MD bitmap: Revert DM dirty log hooks

Revert most of commit e384e58549a2e9a83071ad80280c1a9053cfd84c
md/bitmap: prepare for storing write-intent-bitmap via dm-dirty-log.

MD should not need to use DM's dirty log - we decided to use md's
bitmaps instead.

Keeping the DIV_ROUND_UP clean-ups that were part of commit
e384e58549a2e9a83071ad80280c1a9053cfd84c, however.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

MD: raid1 s/sysfs_notify_dirent/sysfs_notify_dirent_safe

If device-mapper creates a RAID1 array that includes devices to
be rebuilt, it will deref a NULL pointer when finished because
sysfs is not used by device-mapper instantiated RAID devices.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: Avoid BUG caused by multiple failures.

While preparing to write a stripe we keep the parity block or blocks
locked (R5_LOCKED) - towards the end of schedule_reconstruction.

If the array is discovered to have failed before this write completes
we can leave those blocks LOCKED, and init_stripe will notice that a
free stripe still has a locked block and will complain.

So clear the R5_LOCKED flag in handle_failed_stripe, and demote the
'BUG' to a 'WARN_ON'.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: move rdev->corrected_errors counting

Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: move rdev->corrected_errors counting

Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid1: move rdev->corrected_errors counting

Read errors are considered to corrected if write-back and re-read
cycle is finished without further problems. Thus moving the rdev->
corrected_errors counting after the re-reading looks more reasonable
IMHO. Also included a couple of whitespace fixes on sync_page_io().

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: get rid of unnecessary casts on page_address()

page_address() returns void pointer, so the casts can be removed.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: Improve decision on whether to fail a device with a read error.

Normally we would fail a device with a READ error. However if doing
so causes the array to fail, it is better to leave the device
in place and just return the read error to the caller.

The current test for decide if the array will fail is overly
simplistic.
We have a function 'enough' which can tell if the array is failed or
not, so use it to guide the decision.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: Make use of new recovery_disabled handling

When we get a read error during recovery, RAID10 previously
arranged for the recovering device to appear to fail so that
the recovery stops and doesn't restart. This is misleading and wrong.

Instead, make use of the new recovery_disabled handling and mark
the target device and having recovery disabled.

Add appropriate checks in add_disk and remove_disk so that devices
are removed and not re-added when recovery is disabled.

Signed-off-by: NeilBrown <neilb@suse.de>

md: change managed of recovery_disabled.

If we hit a read error while recovering a mirror, we want to abort the
recovery without necessarily failing the disk - as having a disk this
a read error is better than not having an array at all.

Currently this is managed with a per-array flag "recovery_disabled"
and is only implemented for RAID1. For RAID10 we will need finer
grained control as we might want to disable recovery for individual
devices separately.

So push more of the decision making into the personality.
'recovery_disabled' is now a 'cookie' which is copied when the
personality want to disable recovery and is changed when a device is
added to the array as this is used as a trigger to 'try recovery
again'.

This will allow RAID10 to get the control that it needs.

Signed-off-by: NeilBrown <neilb@suse.de>

md: remove ro check in md_check_recovery()

Commit c89a8eee6154 ("Allow faulty devices to be removed from a
readonly array.") added some work on ro array in the function,
but it couldn't be done since we didn't allow the ro array to be
handled from the beginning. Fix it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: introduce link/unlink_rdev() helpers

There are places where sysfs links to rdev are handled
in a same way. Add the helper functions to consolidate
them.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid: use printk_ratelimited instead of printk_ratelimit

As per printk_ratelimit comment, it should not be used.

Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de>
Signed-off-by: NeilBrown <neilb@suse.de>

md: use proper little-endian bitops

Using __test_and_{set,clear}_bit_le() with ignoring its return value
can be replaced with __{set,clear}_bit_le().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: finalise new merged handle_stripe.

handle_stripe5() and handle_stripe6() are now virtually identical.
So discard one and rename the other to 'analyse_stripe()'.

It always returns 0, so change it to 'void' and remove the 'done'
variable in handle_stripe().

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md/raid5: move some more common code into handle_stripe

The RAID6 version of this code is usable for RAID5 providing:
  - we test "conf->max_degraded" rather than "2" as appropriate
  - we make sure s->failed_num[1] is meaningful (and not '-1')
    when s->failed > 1

The 'return 1' must become 'goto finish' in the new location.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md/raid5: move more common code into handle_stripe

Apart from 'prexor' which can only be set for RAID5, and
'qd_idx' which can only be meaningful for RAID6, these two
chunks of code are nearly the same.

So combine them into one adding a test to call either
handle_parity_checks5 or handle_parity_checks6 as appropriate.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md/raid5: unite handle_stripe_dirtying5 and handle_stripe_dirtying6

RAID6 is only allowed to choose 'reconstruct-write' while RAID5 is
also allow 'read-modify-write'
Apart from this difference, handle_stripe_dirtying[56] are nearly
identical. So resolve these differences and create just one function.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md/raid5: unite fetch_block5 and fetch_block6

Provided that ->failed_num[1] is not a valid device number (which is
easily achieved) fetch_block6 provides all the functionality of
fetch_block5.

So remove the latter and rename the former to simply "fetch_block".

Then handle_stripe_fill5 and handle_stripe_fill6 become the same and
can similarly be united.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md/raid5: rearrange a test in fetch_block6.

Next patch will unite fetch_block5 and fetch_block6.
First I want to make the differences a little more clear.

For RAID6 if we are writing at all and there is a failed device, then
we need to load or compute every block so we can do a
reconstruct-write.
This case isn't needed for RAID5 - we will do a read-modify-write in
that case.
So make that test a separate test in fetch_block6 rather than merged
with two other tests.

Make a similar change in fetch_block5 so the one bit that is not
needed for RAID6 is clearly separate.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md/raid5: move more code into common handle_stripe

The difference between the RAID5 and RAID6 code here is easily
resolved using conf->max_degraded.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md/raid5: Move code for finishing a reconstruction into handle_stripe.

Prior to commit ab69ae12ceef7 the code in handle_stripe5 and
handle_stripe6 to "Finish reconstruct operations initiated by the
expansion process" was identical.
That commit added an identical stanza of code to each function, but in
different places. That was careless.

The raid5 code was correct, so move that out into handle_stripe and
remove raid6 version.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

md/raid5: Remove stripe_head_state arg from handle_stripe_expansion.

This arg is only used to differentiate between RAID5 and RAID6 but
that is not needed. For RAID5, raid5_compute_sector will set qd_idx
to "~0" so j with certainly not equals qd_idx, so there is no need
for a guard on that condition.

So remove the guard and remove the arg from the declaration and
callers of handle_stripe_expansion.

Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>

Merge branch 'next/devel2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc

* 'next/devel2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc: (47 commits)
  OMAP: Add debugfs node to show the summary of all clocks
  OMAP2+: hwmod: Follow the recommended PRCM module enable sequence
  OMAP2+: clock: allow per-SoC clock init code to prevent clockdomain calls from clock code
  OMAP2+: clockdomain: Add per clkdm lock to prevent concurrent state programming
  OMAP2+: PM: idle clkdms only if already in idle
  OMAP2+: clockdomain: add clkdm_in_hwsup()
  OMAP2+: clockdomain: Add 2 APIs to control clockdomain from hwmod framework
  OMAP: clockdomain: Remove redundant call to pwrdm_wait_transition()
  OMAP4: hwmod: Introduce the module control in hwmod control
  OMAP4: cm: Add two new APIs for modulemode control
  OMAP4: hwmod data: Add modulemode entry in omap_hwmod structure
  OMAP4: hwmod data: Add PRM context register offset
  OMAP4: prm: Remove deprecated functions
  OMAP4: prm: Replace warm reset API with the offset based version
  OMAP4: hwmod: Replace RSTCTRL absolute address with offset macros
  OMAP: hwmod: Wait the idle status to be disabled
  OMAP4: hwmod: Replace CLKCTRL absolute address with offset macros
  OMAP2+: hwmod: Init clkdm field at boot time
  OMAP4: hwmod data: Add clock domain attribute
  OMAP4: clock data: Add missing divider selection for auxclks
  ...

Merge branch 'next/devel' of ssh://master.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc

* 'next/devel' of ssh://master.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc: (128 commits)
  ARM: S5P64X0: External Interrupt Support
  ARM: EXYNOS4: Enable MFC on Samsung NURI
  ARM: EXYNOS4: Enable MFC on universal_c210
  ARM: S5PV210: Enable MFC on Goni
  ARM: S5P: Add support for MFC device
  ARM: EXYNOS4: Add support FIMD on SMDKC210
  ARM: EXYNOS4: Add platform device and helper functions for FIMD
  ARM: EXYNOS4: Add resource definition for FIMD
  ARM: EXYNOS4: Change devname for FIMD clkdev
  ARM: SAMSUNG: Add IRQ_I2S0 definition
  ARM: SAMSUNG: Add platform device for idma
  ARM: EXYNOS4: Add more registers to be saved and restored for PM
  ARM: EXYNOS4: Add more register addresses of CMU
  ARM: EXYNOS4: Add platform device for dwmci driver
  ARM: EXYNOS4: configure rtc-s3c on NURI
  ARM: EXYNOS4: configure MAX8903 secondary charger on NURI
  ARM: EXYNOS4: configure ADC on NURI
  ARM: EXYNOS4: configure MAX17042 fuel gauge on NURI
  ARM: EXYNOS4: configure regulators and PMIC(MAX8997) on NURI
  ARM: EXYNOS4: Increase NR_IRQS for devices with more IRQs
  ...

Fix up tons of silly conflicts:
- arch/arm/mach-davinci/include/mach/psc.h
- arch/arm/mach-exynos4/Kconfig
- arch/arm/mach-exynos4/mach-smdkc210.c
- arch/arm/mach-exynos4/pm.c
- arch/arm/mach-imx/mm-imx1.c
- arch/arm/mach-imx/mm-imx21.c
- arch/arm/mach-imx/mm-imx25.c
- arch/arm/mach-imx/mm-imx27.c
- arch/arm/mach-imx/mm-imx31.c
- arch/arm/mach-imx/mm-imx35.c
- arch/arm/mach-mx5/mm.c
- arch/arm/mach-s5pv210/mach-goni.c
- arch/arm/mm/Kconfig

Merge branch 'next/board' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc

* 'next/board' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc:
  ARM: S3C64XX: Configure backup battery charger on Cragganmore
  ARM: S3C64XX: Fix WM8915 IRQ polarity on Cragganmore
  ARM: S3C64XX: Configure supplies for all Cragganmore regulators
  ARM: S3C64XX: Refresh Cragganmore support
  ARM: S3C64XX: Initial support for Wolfson/Simtec Cragganmore/Banff
  OMAP4: Keyboard: Mux changes in the board file
  omap: blaze: add mmc5/wl1283 device support
  omap: 4430SDP: Register the card detect GPIO properly
  arm: omap3: cm-t35: add support for cm-t3730
  OMAP3: beagle: add support for beagleboard xM revision C
  OMAP3: rx-51: Add full regulator definitions
  omap: rx51: Platform support for lp5523 led chip

Merge branch 'next/cross-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc

* 'next/cross-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/linux-arm-soc:
  ARM: Consolidate the clkdev header files
  ARM: set vga memory base at run-time
  ARM: convert PCI defines to variables
  ARM: pci: make pcibios_assign_all_busses use pci_has_flag
  ARM: remove unnecessary mach/hardware.h includes
  pci: move microblaze and powerpc pci flag functions into asm-generic
  powerpc: rename ppc_pci_*_flags to pci_*_flags

Fix up conflicts in arch/microblaze/include/asm/pci-bridge.h