Balbir Singh [Wed, 17 Jun 2009 23:26:34 +0000 (16:26 -0700)]
memcg: add file-based RSS accounting
Add file RSS tracking per memory cgroup
We currently don't track file RSS, the RSS we report is actually anon RSS.
All the file mapped pages, come in through the page cache and get
accounted there. This patch adds support for accounting file RSS pages.
It should
1. Help improve the metrics reported by the memory resource controller
2. Will form the basis for a future shared memory accounting heuristic
that has been proposed by Kamezawa.
Unfortunately, we cannot rename the existing "rss" keyword used in
memory.stat to "anon_rss". We however, add "mapped_file" data and hope to
educate the end user through documentation.
Li Zefan [Wed, 17 Jun 2009 23:26:33 +0000 (16:26 -0700)]
cgroups: forbid noprefix if mounting more than just cpuset subsystem
The 'noprefix' option was introduced for backwards-compatibility of
cpuset, but actually it can be used when mounting other subsystems.
This results in possibility of name collision, and now the collision can
really happen, because we have 'stat' file in both memory and cpuacct
subsystem:
# mount -t cgroup -o noprefix,memory,cpuacct xxx /mnt
Cgroup will happily mount the 2 subsystems, but only 'stat' file of memory
subsys can be seen.
We don't want users to use nopreifx, and also want to avoid name
collision, so we change to allow noprefix only if mounting just the cpuset
subsystem.
[akpm@linux-foundation.org: fix shift for cpuset_subsys_id >= 32] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 17 Jun 2009 23:26:32 +0000 (16:26 -0700)]
cgroups: make messages more readable
Fix some cgroup messages to read better.
Update MAINTAINERS to include mm/*cgroup* files.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Paul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Documentation/Changes: perl is needed to build the kernel
Perl is used on the kernel Makefile to generate documentation, firmwares
in c source form, sources, graphs, and some headers and this fact is
undocumented.
[akpm@linux-foundation.org: 80-columns, please] Signed-off-by: Jose Luis Perez Diez <jluis@escomposlinux.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Mahoney [Wed, 17 Jun 2009 23:26:29 +0000 (16:26 -0700)]
reiserfs: fix warnings with gcc 4.4
Several code paths in reiserfs have a construct like:
if (is_direntry_le_ih(ih = B_N_PITEM_HEAD(src, item_num))) ...
which, in addition to being ugly, end up causing compiler warnings with
gcc 4.4.0. Previous compilers didn't issue a warning.
fs/reiserfs/do_balan.c:1273: warning: operation on `aux_ih' may be undefined
fs/reiserfs/lbalance.c:393: warning: operation on `ih' may be undefined
fs/reiserfs/lbalance.c:421: warning: operation on `ih' may be undefined
fs/reiserfs/lbalance.c:777: warning: operation on `ih' may be undefined
I believe this is due to the ih being passed to macros which evaluate the
argument more than once. This is old code and we haven't seen any
problems with it, but this patch eliminates the warnings.
It converts the multiple evaluation macros to static inlines and does a
preassignment for the cases that were causing the warnings because that
code is just ugly.
Reported-by: Chris Mason <mason@oracle.com> Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 17 Jun 2009 23:26:27 +0000 (16:26 -0700)]
isofs: cleanup mount option processing
Remove unused variables from isofs_sb_info (used to be some mount
options), unify variables for option to use 0/1 (some options used
'y'/'n'), use bit fields for option flags in superblock.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 17 Jun 2009 23:26:27 +0000 (16:26 -0700)]
isofs: fix setting of uid and gid to 0
isofs allows setting of default uid and gid of files but value 0 was used
to indicate that user did not specify any uid/gid mount option. Since
this option also overrides uid/gid set in Rock Ridge extension, it makes
sense to allow forcing uid/gid 0. Fix option processing to allow this.
Cc: <Hans-Joachim.Baader@cjt.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 17 Jun 2009 23:26:25 +0000 (16:26 -0700)]
isofs: let mode and dmode mount options override rock ridge mode setting
So far, permissions set via 'mode' and/or 'dmode' mount options were
effective only if the medium had no rock ridge extensions (or was mounted
without them). Add 'overriderockmode' mount option to indicate that these
options should override permissions set in rock ridge extensions. Maybe
this should be default but the current behavior is there since mount
options were created so I think we should not change how they behave.
Cc: <Hans-Joachim.Baader@cjt.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 17 Jun 2009 23:26:24 +0000 (16:26 -0700)]
ext3: make sure inode is deleted from orphan list after truncate
As Ted pointed out, it can happen that ext3_truncate() returns without
removing inode from orphan list. This way we could in some rare cases
(like when we get ENOMEM from an allocation in ext3_truncate called
because of failed ext3_write_begin) leave the inode on orphan list and
that triggers assertion failure on umount.
So make ext3_truncate() always remove inode from in-memory orphan list.
Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
jbd: fix race between free buffer and commit transaction
This patch is no longer needed because if race between freeing buffer and
committing transaction functionality occurs and dio gets error, currently
dio falls back to buffered IO by the following patch.
Jan Kara [Wed, 17 Jun 2009 23:26:23 +0000 (16:26 -0700)]
ext3: fix chain verification in ext3_get_blocks()
Chain verification in ext3_get_blocks() has been hosed since it called
verify_chain(chain, NULL) which always returns success. As a result
readers could in theory race with truncate. On the other hand the race
probably cannot happen with the current locking scheme, since by the
time ext3_truncate() is called all the pages are already removed and
hence get_block() shouldn't be called on such pages...
Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 17 Jun 2009 23:26:20 +0000 (16:26 -0700)]
ext2: Do not update mtime of a moved directory
One of our users is complaining that his backup tool is upset on ext2
(while it's happy on ext3, xfs, ...) because of the mtime change.
The problem is:
mkdir foo
mkdir bar
mkdir foo/a
Now under ext2:
mv foo/a foo/b
changes mtime of 'foo/a' (foo/b after the move). That does not really
make sense and it does not happen under any other filesystem I've seen.
More complicated is:
mv foo/a bar/a
This changes mtime of foo/a (bar/a after the move) and it makes some
sense since we had to update parent directory pointer of foo/a. But
again, no other filesystem does this. So after some thoughts I'd vote
for consistency and change ext2 to behave the same as other filesystems.
Do not update mtime of a moved directory. Specs don't say anything
about it (neither that it should, nor that it should not be updated) and
other common filesystems (ext3, ext4, xfs, reiserfs, fat, ...) don't do
it. So let's become more consistent.
Spotted by ronny.pretzsch@dfs.de, initial fix by Jörn Engel.
Reported-by: <ronny.pretzsch@dfs.de> Cc: <hare@suse.de> Cc: Jörn Engel <joern@logfs.org> Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nate Case [Wed, 17 Jun 2009 23:26:17 +0000 (16:26 -0700)]
gpio: pca953x: Get platform_data from OpenFirmware
On OpenFirmware platforms, it makes the most sense to get platform_data
from the device tree. Make an attempt to translate OF node properties
into platform_data struct before bailing out.
Note that the implementation approach taken differs from other device
drivers that make use of device tree information. This is because I2C
chips are already registered automatically by of_i2c, so we can get by
with a small translator function in the driver.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: kfree(NULL) is legal] Signed-off-by: Nate Case <ncase@xes-inc.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Wed, 17 Jun 2009 23:26:16 +0000 (16:26 -0700)]
gpio: max7301: add missing __devexit marking
The remove member of the spi_driver max7301_driver uses __devexit_p(), so
the remove function itself should be marked with __devexit. Even more so
considering the probe function is marked with __devinit.
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Juergen Beisert <j.beisert@pengutronix.de> Cc: Dmitry Baryshkov <dbaryshkov@gmail.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support to the PCA953x driver to use the GPIOLIB naming facility for
GPIOs.
Signed-off-by: Daniel Silverstone <dsilvers@simtec.co.uk> Cc: Ben Gardner <bgardner@wabtec.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atsushi Nemoto [Wed, 17 Jun 2009 23:26:13 +0000 (16:26 -0700)]
rtc-ds1553: drop IRQF_SHARED
IRQF_SHARED should not be used with IRQF_DISABLED. There is no in-tree
user of this driver and only out-of-tree user I know uses a dedicated irq
line for this RTC.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Atsushi Nemoto [Wed, 17 Jun 2009 23:26:12 +0000 (16:26 -0700)]
rtc-tx4939: drop IRQF_SHARED
IRQF_SHARED should not be used with IRQF_DISABLED. This RTC have a
dedicated irq line to SoC's internal interrupt controller so there is
no reason to use IRQF_SHARED.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for the Epson RX-8025SA/NB RTC chips. It includes support for
alarms, periodic interrupts (1 Hz) and clock precision adjustment.
For clock precision adjustment, the SYSFS file "clock_adjust_ppb" gets
created in "/sys/class/rtc/rtcX/device". It permits to set and get the
clock adjustment in ppb (parts per billion), e.g.:
This allows to compensate temperature dependent clock drifts. According
to the RX8025 SA/NB application manual the frequency and temperature
characteristics can be approximated using the following equation:
df = a * (ut - t)**2
df: Frequency deviation in any temperature
a : Coefficient = (-35 +-5) * 10**-9
ut: Ultimate temperature in degree = +25 +-5 degree
t : Any temperature in degree
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: Sergei Poselenov <sposelenov@emcraft.com> Signed-off-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Dmitry Rakhchev <rda@emcraft.com> Signed-off-by: Matthias Fuchs <matthias.fuchs@esd.eu> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfram Sang [Wed, 17 Jun 2009 23:26:10 +0000 (16:26 -0700)]
rtc: rtc-ds1307 add ds3231
Add ds3231 variant. For that, the BBSQI bit position was changed from a
simple define into a lookup-array as it differs. This also removes
writing to an unused bit in case of the ds1337.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The RTC driver for ds1742 / ds1743 uses a static nvram attribute. This
patch replaces this static attribute with one nvram attribute for each
ds174x registered.
The nvram size is not the same for all types of ds174x. The nvram size is
accessible as the file size of the nvram attribute in sysfs. With only a
single nvram attribute, this file size will be incorrect if more than one
type of ds174x is present on a system. See the comment in the removed
code below.
This patch have been tested with linux-2.6.28 and linux-2.6.29-rc5/6 on a
custom board with one ds1743.
Daniel Ribeiro [Wed, 17 Jun 2009 23:26:06 +0000 (16:26 -0700)]
pxa2xx_spi: fix for SPI_CS_HIGH
Commit a7bb3909b3293d503211d7f6af8ed62c1644b686 ("spi: pxa2xx_spi:
introduce chipselect GPIO to simplify the common cases") introduces
chipselect GPIO, and configures the CS polarity using SPI_CS_HIGH
spi->mode flag. Add SPI_CS_HIGH to the allowed modes.
Signed-off-by: Daniel Ribeiro <drwyrm@gmail.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Vorontsov [Wed, 17 Jun 2009 23:26:05 +0000 (16:26 -0700)]
mpc52xx_psc_spi: convert to cs_control callback
mpc52xx_psc_spi driver is the last user of the legacy activate_cs and
deactivate_cs callbacks, so convert the driver to the cs_control hook and
remove the legacy callbacks from fsl_spi_platform_data struct.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Wed, 17 Jun 2009 23:26:04 +0000 (16:26 -0700)]
spi: move more spi_setup() functionality into core
Move some common spi_setup() error checks into the SPI framework from the
spi_master controller drivers:
- Add a new "mode_bits" field to spi_master
- Use that in spi_setup to validate the spi->mode value being
requested. Setting this new field is now mandatory for any
controller supporting more than vanilla SPI_MODE_0.
- Update all spi_master drivers to:
* Initialize that field
* Remove current spi_setup() checks using that value.
This is a net minor code shrink.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Wed, 17 Jun 2009 23:26:03 +0000 (16:26 -0700)]
spi: move common spi_setup() functionality into core
Start moving some spi_setup() functionality into the SPI core from the
various spi_master controller drivers:
- Make that function stop being an inline;
- Move two common idioms from drivers into that new function:
* Default bits_per_word to 8 if that field isn't set
* Issue a standardized dev_dbg() message
This is a net minor source code shrink, and supports enhancments found in
some follow-up patches.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Simek [Wed, 17 Jun 2009 23:25:59 +0000 (16:25 -0700)]
procfs: remove sparse errors in proc_devtree.c
CHECK fs/proc/proc_devtree.c
fs/proc/proc_devtree.c:197:14: warning: Using plain integer as NULL pointer
fs/proc/proc_devtree.c:203:34: warning: Using plain integer as NULL pointer
fs/proc/proc_devtree.c:210:14: warning: Using plain integer as NULL pointer
fs/proc/proc_devtree.c:223:26: warning: Using plain integer as NULL pointer
fs/proc/proc_devtree.c:226:14: warning: Using plain integer as NULL pointer
Signed-off-by: Michal Simek <monstr@monstr.eu> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davide Libenzi [Wed, 17 Jun 2009 23:25:58 +0000 (16:25 -0700)]
epoll: fix nested calls support
This fixes a regression in 2.6.30.
I unfortunately accepted a patch time ago, to drop the "current" usage
from possible IRQ context, w/out proper thought over it. The patch
switched to using the CPU id by bounding the nested call callback with a
get_cpu()/put_cpu().
Unfortunately the ep_call_nested() function can be called with a callback
that grabs sleepy locks (from own f_op->poll()), that results in epic
fails. The following patch uses the proper "context" depending on the
path where it is called, and on the kind of callback.
This has been reported by Stefan Richter, that has also verified the patch
is his previously failing environment.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 17 Jun 2009 23:25:56 +0000 (16:25 -0700)]
MAINTAINERS: fbdev is orphaned
Tony hasn't been heard from in 18 months and people keep sending him
things.
Cc: Joe Perches <joe@perches.com> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keika Kobayashi [Wed, 17 Jun 2009 23:25:55 +0000 (16:25 -0700)]
proc: export statistics for softirq to /proc
Export statistics for softirq in /proc/softirqs and /proc/stat.
1. /proc/softirqs
Implement /proc/softirqs which shows the number of softirq
for each CPU like /proc/interrupts.
2. /proc/stat
Add the "softirq" line to /proc/stat.
This line shows the number of softirq for all cpu.
The first column is the total of all softirqs and
each subsequent column is the total for particular softirq.
Robin Getz [Wed, 17 Jun 2009 23:25:54 +0000 (16:25 -0700)]
irqs: add IRQF_SAMPLE_RANDOM to the feature-removal-schedule.txt (deprecated) list
This adds IRQF_SAMPLE_RANDOM to the feature-removal (deprecated) list
since most of the IRQF_SAMPLE_RANDOM users are technically bogus as
entropy sources in the kernel's current entropy model.
This was discussed on the lkml the past few days, which started here:
http://lkml.org/lkml/2009/4/6/283
Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Matt Mackall <mpm@selenic.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keika Kobayashi [Wed, 17 Jun 2009 23:25:52 +0000 (16:25 -0700)]
softirq: introduce statistics for softirq
Statistics for softirq doesn't exist.
It will be helpful like statistics for interrupts.
This patch introduces counting the number of softirq,
which will be exported in /proc/softirqs.
When softirq handler consumes much CPU time,
/proc/stat is like the following.
$ while :; do cat /proc/stat | head -n1 ; sleep 10 ; done
cpu 88 0 408 739665 583 28 2 0 0
cpu 450 0 1090 740970 594 28 1294 0 0
^^^^
softirq
In such a situation,
/proc/softirqs shows us which softirq handler is invoked.
We can see the increase rate of softirqs.
When CPU TIME of softirq is high,
the rates of increase is the following.
TIMER : 220/sec : CPU1-3
NET_TX : 5/sec : CPU0
NET_RX : 120/sec : CPU0
SCHED : 40-200/sec : all CPU
RCU : 45-58/sec : all CPU
The rates of increase in an idle mode is the following.
TIMER : 250/sec
SCHED : 250/sec
RCU : 2/sec
It seems many softirqs for receiving packets and rcu are invoked. This
gives us help for checking system.
Matt Fleming [Thu, 18 Jun 2009 09:03:33 +0000 (10:03 +0100)]
sh: Fix declaration of __kernel_sigreturn and __kernel_rt_sigreturn
GCC 4.5.0 complains about the declaration of variables
__kernel_sigreturn and __kernel_rt_sigreturn because they have type
void. Correctly declare these symbols as functions to fix the
following error,
arch/sh/kernel/signal_32.c: In function 'setup_frame':
arch/sh/kernel/signal_32.c:368:14: error: taking address of expression of type 'void'
arch/sh/kernel/signal_32.c: In function 'setup_rt_frame':
arch/sh/kernel/signal_32.c:452:14: error: taking address of expression of type 'void'
make[1]: *** [arch/sh/kernel/signal_32.o] Error 1
make: *** [arch/sh/kernel] Error 2
Signed-off-by: Matt Fleming <matt@console-pimps.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Dhananjay Phadke [Wed, 17 Jun 2009 17:27:25 +0000 (17:27 +0000)]
netxen: fix tx ring accounting
This forces every update of tx ring producer to check for
availability of space for next full TSO command. Earlier
firmware control commands didn't care to pause tx queue.
Stop the tx queue if there's not enough space to transmit one full
LSO command left on the tx ring after current transmit. This avoids
returning NETDEV_TX_BUSY after checking distance between producer
and consumer on every cpu.
Restart the tx queue only if we have cleaned up enough tx
descriptors.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bugzilla: 9868 & 10195.
There seems to be a bug into the SMM code that handles TCO Timeout SMI.
Andriy Gapon found that the code on his DG33TL system does the following:
> The handler is quite simple - it tests value in TCO1_CNT against 0x800, i.e.
> checks TCO_TMR_HLT. If the bit is set the handler goes into an infinite loop,
> apparently to allow the second timeout and reboot. Otherwise it simply clears
> TIMEOUT bit in TCO1_STS and that's it.
> So the logic seems to be reversed, because it is hard to see how TIMEOUT can
> get set to 1 and SMI generated when TCO_TMR_HLT is set (other than a
> transitional effect).
The only trick we have is to bypass the SMM code by turning of the generation
of the SMI#. The trick can only be enabled by setting the vendorsupport module
parameter to 911. This trick doesn't work well on laptop's.
Note: this is a dirty hack. Please handle with care. The only real fix is that
the bug in the SMM bios code get's fixed.
[WATCHDOG] move platform probe and remove function to devinit and devexit
A pointer to probe and remove functions is passed to the core via
platform_driver_register and so the function must not disappear when the
.init sections are discarded. Otherwise (if also having HOTPLUG=y)
unbinding and binding a device to the driver via sysfs will result in an
oops as does a device being registered late.
Net / e100: Fix suspend of devices that cannot be power managed
If the adapter is not power-manageable using either ACPI, or the
native PCI PM interface, __e100_power_off() returns error code, which
causes every attempt to suspend to fail, although it should return 0
in such a case. Fix this problem by ignoring the return value of
pci_set_power_state() in __e100_power_off().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Andreas Mohr <andi@lisas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 17 Jun 2009 01:12:19 +0000 (01:12 +0000)]
net: group address list and its count
This patch is inspired by patch recently posted by Johannes Berg. Basically what
my patch does is to group list and a count of addresses into newly introduced
structure netdev_hw_addr_list. This brings us two benefits:
1) struct net_device becames a bit nicer.
2) in the future there will be a possibility to operate with lists independently
on netdevices (with exporting right functions).
I wanted to introduce this patch before I'll post a multicast lists conversion.
Jarek Poplawski [Thu, 18 Jun 2009 07:28:51 +0000 (00:28 -0700)]
ipv4: Fix fib_trie rebalancing, part 2
My previous patch, which explicitly delays freeing of tnodes by adding
them to the list to flush them after the update is finished, isn't
strict enough. It treats exceptionally tnodes without parent, assuming
they are newly created, so "invisible" for the read side yet.
But the top tnode doesn't have parent as well, so we have to exclude
all exceptions (at least until a better way is found). Additionally we
need to move rcu assignment of this node before flushing, so the
return type of the trie_rebalance() function is changed.
Reported-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
arch/sh has a couple of stray markers without any users introduced
in commit 3d58695edbfac785161bf282dc11fd42a483d6c9. Remove them in
preparation of removing the markers in favour of the TRACE_EVENT
macro (and also because we don't keep dead code around).
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Jarek Poplawski [Tue, 16 Jun 2009 08:33:55 +0000 (08:33 +0000)]
pkt_sched: Update drops stats in act_police
Action police statistics could be misleading because drops are not
shown when expected.
With feedback from: Jamal Hadi Salim <hadi@cyberus.ca>
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce the size of the driver transmit ring to reduce latency
and allow qdisc to do better rate control. Also make it
obvious what the minimum transmit ring allowed is and why.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The logic in sky2_down was incorrect. Receiver could report status
after rx_stop was called.
The steps need to be:
* stop new frames from being transmitted
* shut off transmit/receive logic
* synchronize with NAPI to process status info about transmitter
and receiver
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
françois romieu [Wed, 17 Jun 2009 11:41:45 +0000 (11:41 +0000)]
r8169: do not bring device down when suspending
Stopping all activity through ChipCmd and blindly acking the irqs
is neither nice nor completely needed: the transition to low-power
mode does enough work and it apparently keeps the device in a sane
state.
Patch suggested by a fix for http://bugzilla.kernel.org/show_bug.cgi?id=9512
The rtl_shutdown path is kept unchanged so far.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Tested-by: Anders Eriksson <aeriksson@fastmail.fm> Cc: Edward Hsu <edward_hsu@realtek.com.tw> Signed-off-by: David S. Miller <davem@davemloft.net>
françois romieu [Wed, 17 Jun 2009 11:43:11 +0000 (11:43 +0000)]
sis190: use an adequate phy list entry as a fallback
When sis190 driver is trying to get default phy, if it doesn't find home
or lan phy, it falls back to the first phy in the phy list but list_entry()
points to a bogus entry. list_first_entry() should be used instead.
Signed-off-by: Arnaud Patard <apatard@mandriva.com> Acked-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Haiying Wang <Haiying.Wang@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch meshes badly with "net: Rework ucc_geth driver to use
of_mdio infrastructure" (0b9da337dca972e7a4144e298ec3adb8f244d4a4).
Since most of the patch needs to be reworked, it is clearer to revert
the patch and then apply the corrected version
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
The skb mac_header field is sometimes NULL (or ~0u) as a sentinel
value. The places where skb is expanded add an offset which would
change this flag into an invalid pointer (or offset).
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
skbuff: skb_mac_header_was_set is always true on >32 bit
Looking at the crash in log_martians(), one suspect is that the check for
mac header being set is not correct. The value of mac_header defaults to
0 on allocation, therefore skb_mac_header_was_set will always be true on
platforms using NET_SKBUFF_USES_OFFSET.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hisashi Hifumi [Thu, 18 Jun 2009 00:08:51 +0000 (20:08 -0400)]
jbd2: clean up jbd2_journal_try_to_free_buffers()
This patch reverts 3f31fddf, which is no longer needed because if a
race between freeing buffer and committing transaction functionality
occurs and dio gets error, currently dio falls back to buffered IO due
to the commit 6ccfa806.
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Mingming Cao <cmm@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
NeilBrown [Wed, 17 Jun 2009 23:14:12 +0000 (09:14 +1000)]
md/raid5: correctly update sync_completed when we reach max_resync
At the end of reshape_request we update cyrr_resync_completed
if we are about to pause due to reaching resync_max.
However we update it to the wrong value. We need to add the
"reshape_sectors" that have just been reshaped.
Dan Williams [Tue, 16 Jun 2009 23:00:33 +0000 (16:00 -0700)]
md/raid5: add missing call to schedule() after prepare_to_wait()
In the unlikely event that reshape progresses past the current request
while it is waiting for a stripe we need to schedule() before retrying
for 2 reasons:
1/ Prevent list corruption from duplicated list_add() calls without
intervening list_del().
2/ Give the reshape code a chance to make some progress to resolve the
conflict.
Cc: <stable@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 17 Jun 2009 22:49:42 +0000 (08:49 +1000)]
md/linear: use call_rcu to free obsolete 'conf' structures.
Current, when we update the 'conf' structure, when adding a
drive to a linear array, we keep the old version around until
the array is finally stopped, as it is not safe to free it
immediately.
Now that we have rcu protection on all accesses to 'conf',
we can use call_rcu to free it more promptly.
SandeepKsinha [Wed, 17 Jun 2009 22:49:35 +0000 (08:49 +1000)]
md linear: Protecting mddev with rcu locks to avoid races
Due to the lack of memory ordering guarantees, we may have races around
mddev->conf.
In particular, the correct contents of the structure we get from
dereferencing ->private might not be visible to this CPU yet, and
they might not be correct w.r.t mddev->raid_disks.
This patch addresses the problem using rcu protection to avoid
such race conditions.
Andre Noll [Wed, 17 Jun 2009 22:49:23 +0000 (08:49 +1000)]
md: Move check for bitmap presence to personality code.
If the superblock of a component device indicates the presence of a
bitmap but the corresponding raid personality does not support bitmaps
(raid0, linear, multipath, faulty), then something is seriously wrong
and we'd better refuse to run such an array.
Currently, this check is performed while the superblocks are examined,
i.e. before entering personality code. Therefore the generic md layer
must know which raid levels support bitmaps and which do not.
This patch avoids this layer violation without adding identical code
to various personalities. This is accomplished by introducing a new
public function to md.c, md_check_no_bitmap(), which replaces the
hard-coded checks in the superblock loading functions.
A call to md_check_no_bitmap() is added to the ->run method of each
personality which does not support bitmaps and assembly is aborted
if at least one component device contains a bitmap.
Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 17 Jun 2009 22:48:58 +0000 (08:48 +1000)]
md: remove chunksize rounding from common code.
It is easiest to round sizes to multiples of chunk size in
the personality code for those personalities which care.
Those personalities now do the rounding, so we can
remove that function from common code.
Also remove the upper bound on the size of a chunk, and the lower
bound on the size of a device (1 chunk), neither of which really buy
us anything.
NeilBrown [Wed, 17 Jun 2009 22:48:55 +0000 (08:48 +1000)]
md: raid0/linear: ensure device sizes are rounded to chunk size.
This is currently ensured by common code, but it is more reliable to
ensure it where it is needed in personality code.
All the other personalities that care already round the size to
the chunk_size. raid0 and linear are the only hold-outs.
NeilBrown [Wed, 17 Jun 2009 22:48:19 +0000 (08:48 +1000)]
md: move assignment of ->utime so that it never gets skipped.
Currently the assignment to utime gets skipped for 'external'
metadata. So move it to the top of the function so that it
always gets effected.
This is of largely cosmetic interest. Nothing actually depends
on ->utime being right for external arrays.
"mdadm --monitor" does use it for 0.90 and 1.x arrays, but with
mdadm-3.0, this is not important for external metadata.
Andre Noll [Wed, 17 Jun 2009 22:48:06 +0000 (08:48 +1000)]
md: Push down reconstruction log message to personality code.
Currently, the md layer checks in analyze_sbs() if the raid level
supports reconstruction (mddev->level >= 1) and if reconstruction is
in progress (mddev->recovery_cp != MaxSector).
Move that printk into the personality code of those raid levels that
care (levels 1, 4, 5, 6, 10).
Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Wed, 17 Jun 2009 22:47:55 +0000 (08:47 +1000)]
md: merge reconfig and check_reshape methods.
The difference between these two methods is artificial.
Both check that a pending reshape is valid, and perform any
aspect of it that can be done immediately.
'reconfig' handles chunk size and layout.
'check_reshape' handles raid_disks.
Andre Noll [Wed, 17 Jun 2009 22:45:27 +0000 (08:45 +1000)]
md: Convert mddev->new_chunk to sectors.
A straight-forward conversion which gets rid of some
multiplications/divisions/shifts. The patch also introduces a couple
of new ones, most of which are due to conf->chunk_size still being
represented in bytes. This will be cleaned up in subsequent patches.
Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
Matthew Wilcox [Wed, 17 Jun 2009 20:33:36 +0000 (16:33 -0400)]
ia64: Fix resource assignment for root busses
ia64 was assigning resources to root busses after allocations had
been made for child busses. Calling pcibios_setup_root_windows() from
pcibios_fixup_bus() solves this problem by assigning the resources to
the root bus before child busses are scanned.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>