NeilBrown [Thu, 22 Dec 2011 23:17:56 +0000 (10:17 +1100)]
md/raid1: Allocate spare to store replacement devices and their bios.
In RAID1, a replacement is much like a normal device, so we just
double the size of the relevant arrays and look at all possible
devices for reads and writes.
This means that the array looks like it is now double the size in some
way - we need to be careful about that.
In particular, we checking if the array is still degraded while
creating a recovery request we need to only consider the first 'half'
- i.e. the real (non-replacement) devices.
NeilBrown [Thu, 22 Dec 2011 23:17:56 +0000 (10:17 +1100)]
md/raid1: Replace use of mddev->raid_disks with conf->raid_disks.
In general mddev->raid_disks can change unexpectedly while
conf->raid_disks will only change in a very controlled way. So change
some uses of one to the other.
The use of mddev->raid_disks will not cause actually problems but
this way is more consistent and safer in the long term.
NeilBrown [Thu, 22 Dec 2011 23:17:55 +0000 (10:17 +1100)]
md/raid10: Allow replacement device to be replace old drive.
When recovery finish and spare_active is called, check for a
replace that might have just become fully synced and mark it
as such, marking the original as failed.
Then when the original is removed, move the replacement into
its position.
This means that 'replacement' and spontaneously become NULL in some
situations. Make sure we check for those.
It also means that 'rdev' and 'replacement' could appear to be
identical - check for that too.
NeilBrown [Thu, 22 Dec 2011 23:17:55 +0000 (10:17 +1100)]
md/raid10: Handle replacement devices during resync.
If we need to resync an array which has replacement devices,
we always write any block checked to every replacement.
If the resync was bitmap-based resync we will then complete the
replacement normally.
If it was a full resync, we mark the replacements as fully recovered
when the resync finishes so no further recovery is needed.
NeilBrown [Thu, 22 Dec 2011 23:17:54 +0000 (10:17 +1100)]
md/raid10: preferentially read from replacement device if possible.
When reading (for array reads, not for recovery etc) we read from the
replacement device if it has recovered far enough.
This requires storing the chosen rdev in the 'r10_bio' so we can make
sure to drop the ref on the right device when the read finishes.
NeilBrown [Thu, 22 Dec 2011 23:17:54 +0000 (10:17 +1100)]
md/raid10: change read_balance to return an rdev
It makes more sense to return an rdev than just an index as
read_balance() gets a reference to the rdev and so returning
the pointer make this more idiomatic.
This will be needed in a future patch when we might return
a 'replacement' rdev instead of the main rdev.
NeilBrown [Thu, 22 Dec 2011 23:17:54 +0000 (10:17 +1100)]
md/raid5: Mark device want_replacement when we see a write error.
Now that WantReplacement drives are replaced cleanly, mark a drive
as WantReplacement when we see a write error. It might get failed soon so
the WantReplacement flag is irrelevant, but if the write error is recorded
in the bad block log, we still want to activate any spare that might
be available.
Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 22 Dec 2011 23:17:53 +0000 (10:17 +1100)]
md/raid5: handle activation of replacement device when recovery completes.
When recovery completes - as reported by a call to ->spare_active,
we clear In_sync on the original and set it on the replacement.
Then when the original gets removed we move the replacement from
'replacement' to 'rdev'.
This could race with other code that is looking at these pointers,
so we use memory barriers and careful ordering to ensure that
a reader might see one device twice, but never no devices.
Then the readers guard against using both devices, which could
only happen when writing.
NeilBrown [Thu, 22 Dec 2011 23:17:53 +0000 (10:17 +1100)]
md/raid5: detect and handle replacements during recovery.
During recovery we want to write to the replacement but not
the original. So we have two new flags
- R5_NeedReplace if this stripe has a replacement that needs to
be written at some stage
- R5_WantReplace if NeedReplace, and the data is available, and
a 'sync' has been requested on this stripe.
We also distinguish between 'sync and replace' which need to read
all other devices, and 'replace' which only needs to read the
devices being replaced.
Note that during resync we always write to any replacement device.
It might not need to be written to, but as we don't read to compare,
we have to write to be sure.
NeilBrown [Thu, 22 Dec 2011 23:17:52 +0000 (10:17 +1100)]
md/raid5: preferentially read from replacement device if possible.
If a replacement device is present and has been recovered far enough,
then use it for reading into the stripe cache.
If we get an error we don't try to repair it, we just fail the device.
A replacement device that gives errors does not sound sensible.
This requires removing the setting of R5_ReadError when we get
a read error during a read that bypasses the cache. It was probably
a bad idea anyway as we don't know that every block in the read
caused an error, and it could cause ReadError to be set for the
replacement device, which is bad.
NeilBrown [Thu, 22 Dec 2011 23:17:52 +0000 (10:17 +1100)]
md/raid5: remove redundant bio initialisations.
We current initialise some fields of a bio when preparing a
stripe_head, and again just before submitting the request.
Remove the duplication by only setting the fields that lower level
devices don't touch in raid5_build_block, and only set the changeable
fields in ops_run_io.
Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 22 Dec 2011 23:17:52 +0000 (10:17 +1100)]
md/raid5: allow each slot to have an extra replacement device
Just enhance data structures to record a second device per slot to be
used as a 'replacement' device, replacing the original.
We also have a second bio in each slot in each stripe_head. This will
only be used when writing to the array - we need to write to both the
original and the replacement at the same time, so will need two bios.
For now, only try using the replacement drive for aligned-reads.
In this case, we prefer the replacement if it has been recovered far
enough, otherwise use the original.
This includes a small enhancement. Previously we would only do
aligned reads if the target device was fully recovered. Now we also
do them if it has recovered far enough.
Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 22 Dec 2011 23:17:51 +0000 (10:17 +1100)]
md: create externally visible flags for supporting hot-replace.
hot-replace is a feature being added to md which will allow a
device to be replaced without removing it from the array first.
With hot-replace a spare can be activated and recovery can start while
the original device is still in place, thus allowing a transition from
an unreliable device to a reliable device without leaving the array
degraded during the transition. It can also be use when the original
device is still reliable but it not wanted for some reason.
This will eventually be supported in RAID4/5/6 and RAID10.
This patch adds a super-block flag to distinguish the replacement
device. If an old kernel sees this flag it will reject the device.
It also adds two per-device flags which are viewable and settable via
sysfs.
"want_replacement" can be set to request that a device be replaced.
"replacement" is set to show that this device is replacing another
device.
The "rd%d" links in /sys/block/mdXx/md only apply to the original
device, not the replacement. We currently don't make links for the
replacement - there doesn't seem to be a need.
NeilBrown [Thu, 22 Dec 2011 23:17:51 +0000 (10:17 +1100)]
md: change hot_remove_disk to take an rdev rather than a number.
Soon an array will be able to have multiple devices with the
same raid_disk number (an original and a replacement). So removing
a device based on the number won't work. So pass the actual device
handle instead.
Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 22 Dec 2011 23:17:51 +0000 (10:17 +1100)]
md: remove test for duplicate device when setting slot number.
When setting the slot number on a device in an active array we
currently check that the number is not already in use.
We then call into the personality's hot_add_disk function
which performs the same test and returns the same error.
Thus the common test is not needed.
As we will shortly be changing some personalities to allow duplicates
in some cases (to support hot-replace), the common test will become
inconvenient.
So remove the common test.
Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 22 Dec 2011 23:17:51 +0000 (10:17 +1100)]
md/bitmap: be more consistent when setting new bits in memory bitmap.
For each active region corresponding to a bit in the bitmap with have
a 14bit counter (and some flags).
This counts
number of active writes + bit in the on-disk bitmap + delay-needed.
The "delay-needed" is because we always want a delay before clearing a
bit. So the number here is normally number of active writes plus 2.
If there have been no writes for a while, we drop to 1.
If still no writes we clear the bit and drop to 0.
So for consistency, when setting bit from the on-disk bitmap or by
request from user-space it is best to set the counter to '2' to start
with.
In particular we might also set the NEEDED_MASK flag at this time, and
in all other cases NEEDED_MASK is only set when the counter is 2 or
more.
Steven Rostedt [Thu, 22 Dec 2011 23:17:51 +0000 (10:17 +1100)]
md: Fix userspace free_pages() macro
While using etags to find free_pages(), I stumbled across this debug
definition of free_pages() that is to be used while debugging some raid
code in userspace. The __get_free_pages() allocates the correct size,
but the free_pages() does not match. free_pages(), like
__get_free_pages(), takes an order and not a size.
Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 22 Dec 2011 23:17:50 +0000 (10:17 +1100)]
md/raid5: be more thorough in calculating 'degraded' value.
When an array is being reshaped to change the number of devices,
the two halves can be differently degraded. e.g. one could be
missing a device and the other not.
So we need to be more careful about calculating the 'degraded'
attribute.
Instead of just inc/dec at appropriate times, perform a full
re-calculation examining both possible cases. This doesn't happen
often so it not a big cost, and we already have most of the code to
do it.
NeilBrown [Thu, 22 Dec 2011 23:17:50 +0000 (10:17 +1100)]
md/bitmap: daemon_work cleanup.
We have a variable 'mddev' in this function, but repeatedly get the
same value by dereferencing bitmap->mddev.
There is room for simplification here...
NeilBrown [Thu, 22 Dec 2011 23:17:26 +0000 (10:17 +1100)]
md: allow non-privileged uses to GET_*_INFO about raid arrays.
The info is already available in /proc/mdstat and /sys/block in
an accessible form so there is no point in putting a road-block in
the ioctl for information gathering.
When writing to an array that is undergoing recovery (a spare
in being integrated into the array), writing to the array will
set bits in the bitmap, but they will not be cleared when the
write completes.
For bits covering areas that have not been recovered yet this is not a
problem as the recovery will clear the bits. However bits set in
already-recovered region will stay set and never be cleared.
This doesn't risk data integrity. The only negatives are:
- next time there is a crash, more resyncing than necessary will
be done.
- the bitmap doesn't look clean, which is confusing.
While an array is recovering we don't want to update the
'events_cleared' setting in the bitmap but we do still want to clear
bits that have very recently been set - providing they were written to
the recovering device.
So split those two needs - which previously both depended on 'success'
and always clear the bit of the write went to all devices.
NeilBrown [Thu, 22 Dec 2011 22:57:19 +0000 (09:57 +1100)]
md: don't give up looking for spares on first failure-to-add
Before performing a recovery we try to remove any spares that
might not be working, then add any that might have become relevant.
Currently we abort on the first spare that cannot be added.
This is a false optimisation.
It is conceivable that - depending on rules in the personality - a
subsequent spare might be accepted.
Also the loop does other things like count the available spares and
reset the 'recovery_offset' value.
If we abort early these might not happen properly.
So remove the early abort.
In particular if you have an array what is undergoing recovery and
which has extra spares, then the recovery may not restart after as
reboot as the could of 'spares' might end up as zero.
NeilBrown [Thu, 22 Dec 2011 22:57:00 +0000 (09:57 +1100)]
md/raid5: ensure correct assessment of drives during degraded reshape.
While reshaping a degraded array (as when reshaping a RAID0 by first
converting it to a degraded RAID4) we currently get confused about
which devices are in_sync. In most cases we get it right, but in the
region that is being reshaped we need to treat non-failed devices as
in-sync when we have the data but haven't actually written it out yet.
Reported-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Thu, 22 Dec 2011 22:56:55 +0000 (09:56 +1100)]
md/linear: fix hot-add of devices to linear arrays.
commit d70ed2e4fafdbef0800e73942482bb075c21578b
broke hot-add to a linear array.
After that commit, metadata if not written to devices until they
have been fully integrated into the array as determined by
saved_raid_disk. That patch arranged to clear that field after
a recovery completed.
However for linear arrays, there is no recovery - the integration is
instantaneous. So we need to explicitly clear the saved_raid_disk
field.
NeilBrown [Thu, 8 Dec 2011 05:26:08 +0000 (16:26 +1100)]
md: ensure new badblocks are handled promptly.
When we mark blocks as bad we need them to be acknowledged by the
metadata handler promptly.
For an in-kernel metadata handler that was already being done. But
for an external metadata handler we need to alert it of the change by
sending a notification through the sysfs file. This adds that
notification.
NeilBrown [Thu, 8 Dec 2011 05:22:48 +0000 (16:22 +1100)]
md: bad blocks shouldn't cause a Blocked status on a Faulty device.
Once a device is marked Faulty the badblocks - whether acknowledged or
not - become irrelevant. So they shouldn't cause the device to be
marked as Blocked.
Without this patch, a process might write "-blocked" to clear the
Blocked status, but while that will correctly fail the device, it
won't remove the apparent 'blocked' status.
NeilBrown [Thu, 8 Dec 2011 04:49:46 +0000 (15:49 +1100)]
md: take a reference to mddev during sysfs access.
When we are accessing an mddev via sysfs we know that the
mddev cannot disappear because it has an embedded kobj which
is refcounted by sysfs.
And we also take the mddev_lock.
However this is not enough.
The final mddev_put could have been called and the
mddev_delayed_delete is waiting for sysfs to let go so it can destroy
the kobj and mddev.
In this state there are a lot of changes that should not be attempted.
To to guard against this we:
- initialise mddev->all_mddevs in on last put so the state can be
easily detected.
- in md_attr_show and md_attr_store, check ->all_mddevs under
all_mddevs_lock and mddev_get the mddev if it still appears to
be active.
This means that if we get to sysfs as the mddev is being deleted we
will get -EBUSY.
rdev_attr_store and rdev_attr_show are similar but already have
sufficient protection. They check that rdev->mddev still points to
mddev after taking mddev_lock. As this is cleared before delayed
removal which can only be requested under the mddev_lock, this
ensure the rdev and mddev are still alive.
NeilBrown [Thu, 8 Dec 2011 04:49:12 +0000 (15:49 +1100)]
md: refine interpretation of "hold_active == UNTIL_IOCTL".
We like md devices to disappear when they really are not needed.
However it is not possible to tell from the current state whether it
is needed or not. We can only tell from recent history of changes.
In particular immediately after we create an md device it looks very
similar to immediately after we have finished with it.
So we always preserve a newly created md device until something
significant happens. This state is stored in 'hold_active'.
The normal case is to keep it until an ioctl happens, as that will
normally either activate it, or explicitly de-activate it. If it
doesn't then it was probably created by mistake and it is now time to
get rid of it.
We can also modify an array via sysfs (instead of via ioctl) and we
currently treat any change via sysfs like an ioctl as a sign that if
it now isn't more active, it should be destroyed.
However this is not appropriate as changes made via sysfs are more
gradual so we should look for a more definitive change.
So this patch only clears 'hold_active' from UNTIL_IOCTL to clear when
the array_state is changed via sysfs. Other changes via sysfs
are ignored.
NeilBrown [Tue, 22 Nov 2011 23:18:52 +0000 (10:18 +1100)]
md/lock: ensure updates to page_attrs are properly locked.
Page attributes are set using __set_bit rather than set_bit as
it normally called under a spinlock so the extra atomicity is not
needed.
However there are two places where we might set or clear page
attributes without holding the spinlock.
So add the spinlock in those cases.
This might be the cause of occasional reports that bits a aren't
getting clear properly - theory is that BITMAP_PAGE_PENDING gets lost
when BITMAP_PAGE_NEEDWRITE is set or cleared. This is an
inconvenience, not a threat to data safety.
Dan Williams [Tue, 8 Nov 2011 05:22:06 +0000 (16:22 +1100)]
md/raid5: STRIPE_ACTIVE has lock semantics, add barriers
All updates that occur under STRIPE_ACTIVE should be globally visible
when STRIPE_ACTIVE clears. test_and_set_bit() implies a barrier, but
clear_bit() does not.
This is suitable for 3.1-stable.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
NeilBrown [Tue, 8 Nov 2011 05:22:01 +0000 (16:22 +1100)]
md/raid5: abort any pending parity operations when array fails.
When the number of failed devices exceeds the allowed number
we must abort any active parity operations (checks or updates) as they
are no longer meaningful, and can lead to a BUG_ON in
handle_parity_checks6.
Linus Torvalds [Tue, 8 Nov 2011 00:16:02 +0000 (16:16 -0800)]
Linux 3.2-rc1
.. with new name. Because nothing says "really solid kernel release"
like naming it after an extinct animal that just happened to be in the
news lately.
Al Viro [Mon, 7 Nov 2011 21:21:26 +0000 (21:21 +0000)]
VFS: we need to set LOOKUP_JUMPED on mountpoint crossing
Mountpoint crossing is similar to following procfs symlinks - we do
not get ->d_revalidate() called for dentry we have arrived at, with
unpleasant consequences for NFS4.
Linus Torvalds [Mon, 7 Nov 2011 20:38:11 +0000 (12:38 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf top: Fix live annotation in the --stdio interface
perf top tui: Don't recalc column widths considering just the first page
perf report: Add progress bar when processing time ordered events
perf hists browser: Warn about lost events
perf tools: Fix a typo of command name as trace-cmd
perf hists: Fix recalculation of total_period when sorting entries
perf header: Fix build on old systems
perf ui browser: Handle K_RESIZE in dialog windows
perf ui browser: No need to switch char sets that often
perf hists browser: Use K_TIMER
perf ui: Rename ui__warning_paranoid to ui__error_paranoid
perf ui: Reimplement the popup windows using libslang
perf ui: Reimplement ui__popup_menu using ui__browser
perf ui: Reimplement ui_helpline using libslang
perf ui: Improve handling sigwinch a bit
perf ui progress: Reimplement using slang
perf evlist: Fix grouping of multiple events
Tony Lindgren [Mon, 7 Nov 2011 20:27:10 +0000 (12:27 -0800)]
ARM: OMAP: Fix export.h or module.h includes
Commit 32aaeffbd4a7457bf2f7448b33b5946ff2a960eb (Merge branch
'modsplit-Oct31_2011'...) caused some build errors. Fix these
and make sure we always have export.h or module.h included
for MODULE_ and EXPORT_SYMBOL users:
Axel Lin [Mon, 7 Nov 2011 20:27:10 +0000 (12:27 -0800)]
ARM: OMAP: omap_device: Include linux/export.h
Include linux/export.h to fix below build warning:
CC arch/arm/plat-omap/omap_device.o
arch/arm/plat-omap/omap_device.c:1055: warning: data definition has no type or storage class
arch/arm/plat-omap/omap_device.c:1055: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL'
arch/arm/plat-omap/omap_device.c:1055: warning: parameter names (without types) in function declaration
Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
forcedeth: fix a few sparse warnings (variable shadowing)
forcedeth: Improve stats counters
forcedeth: remove unneeded stats updates
forcedeth: Acknowledge only interrupts that are being processed
forcedeth: fix race when unloading module
MAINTAINERS/rds: update maintainer
wanrouter: Remove kernel_lock annotations
usbnet: fix oops in usbnet_start_xmit
ixgbe: Fix compile for kernel without CONFIG_PCI_IOV defined
etherh: Add MAINTAINERS entry for etherh
bonding: comparing a u8 with -1 is always false
sky2: fix regression on Yukon Optima
netlink: clarify attribute length check documentation
netlink: validate NLA_MSECS length
i825xx:xscale:8390:freescale: Fix Kconfig dependancies
macvlan: receive multicast with local address
tg3: Update version to 3.121
tg3: Eliminate timer race with reset_task
tg3: Schedule at most one tg3_reset_task run
tg3: Obtain PCI function number from device
...
david decotigny [Sat, 5 Nov 2011 14:38:24 +0000 (14:38 +0000)]
forcedeth: fix a few sparse warnings (variable shadowing)
This fixes the following sparse warnings:
drivers/net/ethernet/nvidia/forcedeth.c:2113:7: warning: symbol 'size' shadows an earlier one
drivers/net/ethernet/nvidia/forcedeth.c:2102:6: originally declared here
drivers/net/ethernet/nvidia/forcedeth.c:2155:7: warning: symbol 'size' shadows an earlier one
drivers/net/ethernet/nvidia/forcedeth.c:2102:6: originally declared here
drivers/net/ethernet/nvidia/forcedeth.c:2227:7: warning: symbol 'size' shadows an earlier one
drivers/net/ethernet/nvidia/forcedeth.c:2215:6: originally declared here
drivers/net/ethernet/nvidia/forcedeth.c:2271:7: warning: symbol 'size' shadows an earlier one
drivers/net/ethernet/nvidia/forcedeth.c:2215:6: originally declared here
drivers/net/ethernet/nvidia/forcedeth.c:2986:20: warning: symbol 'addr' shadows an earlier one
drivers/net/ethernet/nvidia/forcedeth.c:2963:6: originally declared here
Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mandeep Baines [Sat, 5 Nov 2011 14:38:23 +0000 (14:38 +0000)]
forcedeth: Improve stats counters
Rx byte count was off; instead use the hardware's count. Tx packet
count was counting pre-TSO packets; instead count on-the-wire packets.
Report hardware dropped frame count as rx_fifo_errors.
- The count of transmitted packets reported by the forcedeth driver
reports pre-TSO (TCP Segmentation Offload) packet counts and not the
count of the number of packets sent on the wire. This change fixes
the forcedeth driver to report the correct count. Fixed the code by
copying the count stored in the NIC H/W to the value reported by the
driver.
- Count rx_drop_frame errors as rx_fifo_errors:
We see a lot of rx_drop_frame errors if we disable the rx bottom-halves
for too long. Normally, rx_fifo_errors would be counted in this case.
The rx_drop_frame error count is private to forcedeth and is not
reported by ifconfig or sysfs. The rx_fifo_errors count is currently
unused in the forcedeth driver. It is reported by ifconfig as overruns.
This change reports rx_drop_frame errors as rx_fifo_errors.
Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
david decotigny [Sat, 5 Nov 2011 14:38:22 +0000 (14:38 +0000)]
forcedeth: remove unneeded stats updates
Function ndo_get_stats() updates most of the stats from hardware
registers, making the manual updates un-needed. This change removes
these manual updates. Main exception is rx_missed_errors which needs
manual update.
Another exception is rx_packets, still updated manually in this commit
to make sure this patch doesn't change behavior of driver. This will
be addressed by a future patch.
Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Ditto [Sat, 5 Nov 2011 14:38:21 +0000 (14:38 +0000)]
forcedeth: Acknowledge only interrupts that are being processed
This is to avoid a race, accidentally acknowledging an interrupt that
we didn't notice and won't immediately process. This is based solely
on code inspection; it is not known if there was an actual bug here.
Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
david decotigny [Sat, 5 Nov 2011 14:38:20 +0000 (14:38 +0000)]
forcedeth: fix race when unloading module
When forcedeth module is unloaded, there exists a path that can lead
to mod_timer() after del_timer_sync(), causing an oops. This patch
short-circuits this unneeded path, which originates in
nv_get_ethtool_stats().
Tested:
x86_64 16-way + 3 ethtool -S infinite loops + 100Mbps incoming traffic
+ rmmod/modprobe/ifconfig in a loop
Paul Gortmaker [Wed, 28 Sep 2011 22:29:32 +0000 (18:29 -0400)]
drivers/md: change module.h -> export.h in persistent-data/dm-*
For the files which are not themselves modular, we can change
them to include only the smaller export.h since all they are
doing is looking for EXPORT_SYMBOL.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Gortmaker [Sun, 9 Oct 2011 03:24:48 +0000 (23:24 -0400)]
arm: Add export.h to recently added files for EXPORT_SYMBOL
These files didn't exist at the time of the module.h split, and
so were not fixed by the commits on that baseline. Since they use
the EXPORT_SYMBOL and/or THIS_MODULE macros, they will need the
new export.h file included that provides them.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the bug added in commit v3.1-rc7-1055-gf9b491e
SKB can be NULL at this point, at least for cdc-ncm.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 7 Nov 2011 18:13:52 +0000 (10:13 -0800)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
cpuidle: Single/Global registration of idle states
cpuidle: Split cpuidle_state structure and move per-cpu statistics fields
cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare()
cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state
ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning
ACPI: Export FADT pm_profile integer value to userspace
thermal: Prevent polling from happening during system suspend
ACPI: Drop ACPI_NO_HARDWARE_INIT
ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast()
PNPACPI: Simplify disabled resource registration
ACPI: Fix possible recursive locking in hwregs.c
ACPI: use kstrdup()
mrst pmu: update comment
tools/power turbostat: less verbose debugging
Linus Torvalds [Mon, 7 Nov 2011 18:01:56 +0000 (10:01 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (40 commits)
vmwgfx: Snoop DMA transfers with non-covering sizes
vmwgfx: Move the prefered mode first in the list
vmwgfx: Unreference surface on cursor error path
vmwgfx: Free prefered mode on error path
vmwgfx: Use pointer return error codes
vmwgfx: Fix hw cursor position
vmwgfx: Infrastructure for explicit placement
vmwgfx: Make the preferred autofit mode have a 60Hz vrefresh
vmwgfx: Remove screen object active list
vmwgfx: Screen object cleanups
drm/radeon/kms: consolidate GART code, fix segfault after GPU lockup V2
drm/radeon/kms: don't poll forever if MC GDDR link training fails
drm/radeon/kms: fix DP setup on TRAVIS bridges
drm/radeon/kms: set HPD polarity in hpd_init()
drm/radeon/kms: add MSI module parameter
drm/radeon/kms: Add MSI quirk for Dell RS690
drm/radeon/kms: Add MSI quirk for HP RS690
drm/radeon/kms: split MSI check into a separate function
vmwgfx: Reinstate the update_layout ioctl
drm/radeon/kms: always do extended edid probe
...
Linus Torvalds [Mon, 7 Nov 2011 17:11:16 +0000 (09:11 -0800)]
Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (226 commits)
mtd: tests: annotate as DANGEROUS in Kconfig
mtd: tests: don't use mtd0 as a default
mtd: clean up usage of MTD_DOCPROBE_ADDRESS
jffs2: add compr=lzo and compr=zlib options
jffs2: implement mount option parsing and compression overriding
mtd: nand: initialize ops.mode
mtd: provide an alias for the redboot module name
mtd: m25p80: don't probe device which has status of 'disabled'
mtd: nand_h1900 never worked
mtd: Add DiskOnChip G3 support
mtd: m25p80: add EON flash EN25Q32B into spi flash id table
mtd: mark block device queue as non-rotational
mtd: r852: make r852_pm_ops static
mtd: m25p80: add support for at25df321a spi data flash
mtd: mxc_nand: preset_v1_v2: unlock all NAND flash blocks
mtd: nand: switch `check_pattern()' to standard `memcmp()'
mtd: nand: invalidate cache on unaligned reads
mtd: nand: do not scan bad blocks with NAND_BBT_NO_OOB set
mtd: nand: wait to set BBT version
mtd: nand: scrub BBT on ECC errors
...
Fix up trivial conflicts:
- arch/arm/mach-at91/board-usb-a9260.c
Merged into board-usb-a926x.c
- drivers/mtd/maps/lantiq-flash.c
add_mtd_partitions -> mtd_device_register vs changed to use
mtd_device_parse_register.
vmwgfx: Snoop DMA transfers with non-covering sizes
Enough to get cursors working under Wayland.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Make it possible to use explicit placement
(although not hooked up with a user-space interface yet)
and relax the single framebuffer limit to only apply to implicit placement.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
vmwgfx: Make the preferred autofit mode have a 60Hz vrefresh
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Michael Neuling [Mon, 7 Nov 2011 06:12:28 +0000 (17:12 +1100)]
powerpc: fix building hvc_opal.c
Fix building following build error:
drivers/tty/hvc/hvc_opal.c:244:12: error: 'THIS_MODULE' undeclared here (not in a function)
Signed-off-by: Michael Neuling <mikey@neuling.org>
[ New file from powerpc tree not following the new rules from the
module.h split, both of which were merged today. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 7 Nov 2011 04:20:46 +0000 (20:20 -0800)]
Merge branch 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/jump-label-noearly' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
jump-label: initialize jump-label subsystem much earlier
x86/jump_label: add arch_jump_label_transform_static()
s390/jump-label: add arch_jump_label_transform_static()
jump_label: add arch_jump_label_transform_static() to optimise non-live code updates
sparc/jump_label: drop arch_jump_label_text_poke_early()
x86/jump_label: drop arch_jump_label_text_poke_early()
jump_label: if a key has already been initialized, don't nop it out
stop_machine: make stop_machine safe and efficient to call early
jump_label: use proper atomic_t initializer
Conflicts:
- arch/x86/kernel/jump_label.c
Added __init_or_module to arch_jump_label_text_poke_early vs
removal of that function entirely
- kernel/stop_machine.c
same patch ("stop_machine: make stop_machine safe and efficient
to call early") merged twice, with whitespace fix in one version
Linus Torvalds [Mon, 7 Nov 2011 04:15:05 +0000 (20:15 -0800)]
Merge branch 'upstream/xen-settime' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/xen-settime' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
xen/dom0: set wallclock time in Xen
xen: add dom0_op hypercall
xen/acpi: Domain0 acpi parser related platform hypercall
Linus Torvalds [Mon, 7 Nov 2011 04:03:41 +0000 (20:03 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (114 commits)
Btrfs: check for a null fs root when writing to the backup root log
Btrfs: fix race during transaction joins
Btrfs: fix a potential btrfs_bio leak on scrub fixups
Btrfs: rename btrfs_bio multi -> bbio for consistency
Btrfs: stop leaking btrfs_bios on readahead
Btrfs: stop the readahead threads on failed mount
Btrfs: fix extent_buffer leak in the metadata IO error handling
Btrfs: fix the new inspection ioctls for 32 bit compat
Btrfs: fix delayed insertion reservation
Btrfs: ClearPageError during writepage and clean_tree_block
Btrfs: be smarter about committing the transaction in reserve_metadata_bytes
Btrfs: make a delayed_block_rsv for the delayed item insertion
Btrfs: add a log of past tree roots
btrfs: separate superblock items out of fs_info
Btrfs: use the global reserve when truncating the free space cache inode
Btrfs: release metadata from global reserve if we have to fallback for unlink
Btrfs: make sure to flush queued bios if write_cache_pages waits
Btrfs: fix extent pinning bugs in the tree log
Btrfs: make sure btrfs_remove_free_space doesn't leak EAGAIN
Btrfs: don't wait as long for more batches during SSD log commit
...
Linus Torvalds [Mon, 7 Nov 2011 03:44:47 +0000 (19:44 -0800)]
Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
Linus Torvalds [Mon, 7 Nov 2011 03:02:23 +0000 (19:02 -0800)]
Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Add a 'reason' to wb_writeback_work
writeback: send work item to queue_io, move_expired_inodes
writeback: trace event balance_dirty_pages
writeback: trace event bdi_dirty_ratelimit
writeback: fix ppc compile warnings on do_div(long long, unsigned long)
writeback: per-bdi background threshold
writeback: dirty position control - bdi reserve area
writeback: control dirty pause time
writeback: limit max dirty pause time
writeback: IO-less balance_dirty_pages()
writeback: per task dirty rate limit
writeback: stabilize bdi->dirty_ratelimit
writeback: dirty rate control
writeback: add bg_threshold parameter to __bdi_update_bandwidth()
writeback: dirty position control
writeback: account per-bdi accumulated dirtied pages
Linus Torvalds [Mon, 7 Nov 2011 02:54:53 +0000 (18:54 -0800)]
Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
scsi: drop unused Kconfig symbol
pci: drop unused Kconfig symbol
stmmac: drop unused Kconfig symbol
x86: drop unused Kconfig symbol
powerpc: drop unused Kconfig symbols
powerpc: 40x: drop unused Kconfig symbol
mips: drop unused Kconfig symbols
openrisc: drop unused Kconfig symbols
arm: at91: drop unused Kconfig symbol
samples: drop unused Kconfig symbol
m32r: drop unused Kconfig symbol
score: drop unused Kconfig symbols
sh: drop unused Kconfig symbol
um: drop unused Kconfig symbol
sparc: drop unused Kconfig symbol
alpha: drop unused Kconfig symbol
Fix up trivial conflict in drivers/net/ethernet/stmicro/stmmac/Kconfig
as per Michal: the STMMAC_DUAL_MAC config variable is still unused and
should be deleted.
Linus Torvalds [Mon, 7 Nov 2011 02:53:33 +0000 (18:53 -0800)]
Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
script/checkpatch.pl: warn about deprecated use of EXTRA_{A,C,CPP,LD}FLAGS
tags, powerpc: Update tags.sh to support _GLOBAL symbols
scripts: add extract-vmlinux
Linus Torvalds [Mon, 7 Nov 2011 02:52:52 +0000 (18:52 -0800)]
Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
scripts/kconfig/nconf: add KEY_HOME / KEY_END for dialog_inputbox
scripts/kconfig/nconf: fix editing long strings
scripts/kconfig/nconf: dynamically alloc dialog_input_result
scripts/kconfig/nconf: fix memmove's length arg
scripts/kconfig/nconf: fix typo: unknow => unknown
kconfig: fix set but not used variables
kconfig: handle SIGINT in menuconfig
kconfig: fix __enabled_ macros definition for invisible and un-selected symbols
kconfig: factor code in menu_get_ext_help()
kbuild: Fix help text not displayed in choice option.
kconfig/nconf: nuke unreferenced `nohelp_text'
kconfig/streamline_config.pl: merge local{mod,yes}config
kconfig/streamline_config.pl: use options to determine operating mode
kconfig/streamline_config.pl: directly access LSMOD from the environment
Linus Torvalds [Mon, 7 Nov 2011 02:41:27 +0000 (18:41 -0800)]
Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
Kbuild: append missing-syscalls to the default target list
genksyms: Regenerate lexer and parser
genksyms: Do not expand internal types
genksyms: Minor parser cleanup
Makefile: remove a duplicated line
fixdep: fix extraneous dependencies
scripts/Makefile.build: do not reference EXTRA_CFLAGS as CFLAGS replacement
kbuild: prevent make from deleting _shipped files
kbuild: Do not delete empty files in make distclean
Linus Torvalds [Mon, 7 Nov 2011 02:34:03 +0000 (18:34 -0800)]
hid/apple: modern macbook airs use the standard apple function key translations
This removes the use of the special "macbookair_fn_keys" keyboard
translation table for the MacBookAir4,x models (ie the 2011 refresh).
They use the standard apple_fn_keys[] translation. Apparently only the
old MacBook Air's need a different translation table.
This mirrors the change that commit da617c7cb915 ("HID: consolidate
MacbookAir 4,1 mappings") did for the WELLSPRING6A ones, but does it for
the WELLSPRING6 model used on the MacBookAir4,2.
Linus Torvalds [Mon, 7 Nov 2011 02:31:36 +0000 (18:31 -0800)]
Merge branch 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/vmalloc-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
net: xen-netback: use API provided by xenbus module to map rings
block: xen-blkback: use API provided by xenbus module to map rings
xen: use generic functions instead of xen_{alloc, free}_vm_area()