coredump: move dump_write() and dump_seek() into a header file
My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
them into other newly created *.c files. Then, each files will contain
dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
same. So, this patch moves them into a header file with dump_seek().
Also, the patch deletes confusing DUMP_WRITE macros in each files.
Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
coredump: unify dump_seek() implementations for each binfmt_*.c
The current ELF dumper can produce broken corefiles if program headers
exceed 65535. In particular, the program in 64-bit environment often
demands more than 65535 mmaps. If you google max_map_count, then you can
find many users facing this problem.
Solaris has already dealt with this issue, and other OSes have also
adopted the same method as in Solaris. Currently, Sun's document and AMD
64 ABI include the description for the extension, where they call the
extension Extended Numbering. See Reference for further information.
I believe that linux kernel should adopt the same way as they did, so I've
written this patch.
I am also preparing for patches of GDB and binutils.
How to fix
==========
In new dumping process, there are two cases according to weather or
not the number of program headers is equal to or more than 65535.
- if less than 65535, the produced corefile format is exactly the same
as the ordinary one.
- if equal to or more than 65535, then e_phnum field is set to newly
introduced constant PN_XNUM(0xffff) and the actual number of program
headers is set to sh_info field of the section header at index 0.
Compatibility Concern
=====================
* As already mentioned in Summary, Sun and AMD64 has already adopted
this. See Reference.
* There are four combinations according to whether kernel and userland
tools are respectively modified or not. The next table summarizes
shortly for each combination.
---------------------------------------------
Original Kernel | Modified Kernel
---------------------------------------------
< 65535 | >= 65535 | < 65535 | >= 65535
-------------------------------------------------------------
Original Tools | OK | broken | OK | broken (#)
-------------------------------------------------------------
Modified Tools | OK | broken | OK | OK
-------------------------------------------------------------
Note that there is no case that `OK' changes to `broken'.
(#) Although this case remains broken, O-M behaves better than
O-O. That is, while in O-O case e_phnum field would be extremely
small due to integer overflow, in O-M case it is guaranteed to be at
least 65535 by being set to PN_XNUM(0xFFFF), much closer to the
actual correct value than the O-O case.
Test Program
============
Here is a test program mkmmaps.c that is useful to produce the
corefile with many mmaps. To use this, please take the following
steps:
$ ulimit -c unlimited
$ sysctl vm.max_map_count=70000 # default 65530 is too small
$ sysctl fs.file-max=70000
$ mkmmaps 65535
Then, the program will abort and a corefile will be generated.
If failed, there are two cases according to the error message
displayed.
* ``out of memory'' means vm.max_map_count is still smaller
* ``too many open files'' means fs.file-max is still smaller
So, please change it to a larger value, and then retry it.
mkmmaps.c
==
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
int main(int argc, char **argv)
{
int maps_num;
if (argc < 2) {
fprintf(stderr, "mkmmaps [number of maps to be created]\n");
exit(1);
}
if (sscanf(argv[1], "%d", &maps_num) == EOF) {
perror("sscanf");
exit(2);
}
if (maps_num < 0) {
fprintf(stderr, "%d is invalid\n", maps_num);
exit(3);
}
for (; maps_num > 0; --maps_num) {
if (MAP_FAILED == mmap((void *)NULL, (size_t) 1, PROT_READ,
MAP_SHARED | MAP_ANONYMOUS, (int) -1,
(off_t) NULL)) {
perror("mmap");
exit(4);
}
}
abort();
{
char buffer[128];
sprintf(buffer, "wc -l /proc/%u/maps", getpid());
system(buffer);
}
return 0;
}
Tested on i386, ia64 and um/sys-i386.
Built on sh4 (which covers fs/binfmt_elf_fdpic.c)
References
==========
- Sun microsystems: Linker and Libraries.
Part No: 817-1984-17, September 2008.
URL: http://docs.sun.com/app/docs/doc/817-1984
- System V ABI AMD64 Architecture Processor Supplement
Draft Version 0.99., May 11, 2009.
URL: http://www.x86-64.org/
This patch:
There are three different definitions for dump_seek() functions in
binfmt_aout.c, binfmt_elf.c and binfmt_elf_fdpic.c, respectively. The
only for binfmt_elf.c.
My next patch will move dump_seek() into a header file in order to share
the same implementations for dump_write() and dump_seek(). As the first
step, this patch unify these three definitions for dump_seek() by applying
the past commits that have been applied only for binfmt_elf.c.
Specifically, the modification made here is part of the following commits:
Alexey Dobriyan [Fri, 5 Mar 2010 21:44:00 +0000 (13:44 -0800)]
proc: warn on non-existing proc entries
* warn if creation goes on to non-existent directory
* warn if removal goes on from non-existing directory
* warn if non-existing proc entry is removed
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/hwmon/adcxx.c: fix for single-channel ADCs
While testing an ADC121S021 in an embedded board with a S3C2142 SoC (ARM
core), I have found that the 'adcxx' driver does not handle correctly
single channel ADCs from this chip family. For single channel chips you
must only issue one read transfer for correct measurement.
Signed-off-by: Jose Miguel Goncalves <jose.goncalves@inov.pt> Cc: Marc Pignat <marc.pignat@hevs.ch> Cc: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 5 Mar 2010 21:43:56 +0000 (13:43 -0800)]
drivers/hwmon/vt8231.c: fix continuation line formats
String constants that are continued on subsequent lines with \ will cause
spurious whitespace in the resulting output.
Signed-off-by: Joe Perches <joe@perches.com> Cc: Roger Lucas <vt8231@hiddenengine.co.uk> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Emese Revfy [Fri, 5 Mar 2010 21:43:53 +0000 (13:43 -0800)]
checkpatch.pl: extend list of expected-to-be-const structures
Based on Arjan's suggestion, extend the list of ops structures that should
be const.
Signed-off-by: Emese Revfy <re.emese@gmail.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This construct is legal and safe, so checkpatch.pl should accept this. It
should be also true for struct defined in a macro.
Add the `struct' and `union' keywords to the exceptions list of the
checkpatch.pl script, to prevent error message "Macros with multiple
statements should be enclosed in a do - while loop". Otherwise it is not
possible to build a struct or union with a macro.
Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This card reader doesn't advertise, however DMA works well. Probably
windows SDHCI driver assumes that all readers support DMA and thus we see
that bug.
Nicolas Ferre [Fri, 5 Mar 2010 21:43:45 +0000 (13:43 -0800)]
mmc: at91_mci: correct kunmap_atomic()
kunmap_atomic() accepts a pointer to any location in the page so we do not
need the subtraction and cast.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Wolfgang Muees <wolfgang.mues@auerswald.de> Cc: Andrew Victor <avictor.za@gmail.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We used to manage features and differences on a per-cpu basis. As several
cpus share the same mci revision, this patch aggregates cpus that have the
same IP revision in one defined constant. We use the
at91mci_is_mci1rev2xx() funtion name not to mess with newer Atmel sd/mmc
IP called "MCI2". _rev2 naming could have been confusing...
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Wolfgang Muees <wolfgang.mues@auerswald.de> Cc: Andrew Victor <avictor.za@gmail.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Ferre [Fri, 5 Mar 2010 21:43:43 +0000 (13:43 -0800)]
mmc: at91_mci: Enable MMC_CAP_SDIO_IRQ only when it actually works.
According to the datasheets AT91SAM9261 does not support SDIO interrupts,
and AT91SAM9260/9263 have an erratum requiring 4bit mode while using slot
B for the interrupt to work.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Wolfgang Muees <wolfgang.mues@auerswald.de> Cc: Andrew Victor <avictor.za@gmail.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfgang Muees [Fri, 5 Mar 2010 21:43:42 +0000 (13:43 -0800)]
mmc: at91_mci: enable large data blocks
This patch is setting some max_ variables for the IO elevator, so the
elevator will put requests for large data blocks to the driver. This is
critical for
a) speed
and
b) wear leveling of the flash chip controller: Otherwise the controller
will treat the SD card badly with millions of single 4 KByte write
commands. This will lead to a shorter life time for the SD cards.
Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Andrew Victor <avictor.za@gmail.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfgang Muees [Fri, 5 Mar 2010 21:43:41 +0000 (13:43 -0800)]
mmc: at91_mci: use DMA buffer for read
Convert the read to use the DMA buffer as well. The old code was doing
double-buffering DMA with the PDC; no way to make it work. Replace it
with a single-PDC approach. It also simplify things removing the need for
a pre_dma_read() function.
[nicolas.ferre@atmel.com coding style modifications] Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Andrew Victor <avictor.za@gmail.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfgang Muees [Fri, 5 Mar 2010 21:43:40 +0000 (13:43 -0800)]
mmc: at91_mci: use one coherent DMA buffer
The TX DMA buffer is allocated only once, because the
allocation/deallocation of the buffer for EACH chunk of data is
time-consuming and prone to memory fragmentation.
Using a coherent DMA buffer avoids extra data cache calls.
[nicolas.ferre@atmel.com: coding style modifications] Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Andrew Victor <avictor.za@gmail.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfgang Muees [Fri, 5 Mar 2010 21:43:39 +0000 (13:43 -0800)]
mmc: at91_mci: fix timeout errors
Fix two timeout errors, one for slow SDHC cards and one for slow users
while inserting SD cards.
Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Andrew Victor <avictor.za@gmail.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfgang Muees [Fri, 5 Mar 2010 21:43:38 +0000 (13:43 -0800)]
mmc: at91_mci: fix pointer errors
Fixes two pointer errors, one which leads to memory overwrites if used
with large chunks of data.
Signed-off-by: Wolfgang Muees <wolfgang.mues@auerswald.de> Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Andrew Victor <avictor.za@gmail.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
s3cmci: initialize default platform data no_wprotect and no_detect with 1
If no platform_data was givin to the device it's going to use it's default
platform data struct which has all fields initialized to zero. As a
result the driver is going to try to request gpio0 both as write protect
and card detect pin. Which of course will fail and makes the driver
unusable
Previously to the introduction of no_wprotect and no_detect the behavior
was to assume that if no platform data was given there is no write protect
or card detect pin. This patch restores that behavior.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Cc: Ben Dooks <ben-linux@fluff.org> Cc: <linux-mmc@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Pitre [Fri, 5 Mar 2010 21:43:34 +0000 (13:43 -0800)]
sdio: kick the interrupt thread upon a resume
Some SDIO cards may suspend while keeping function interrupts active
especially in the powered suspend case. Upon resume we need to kick the
SDIO interrupt thread to check for pending interrupts and to restart card
IRQ detection at the host controller level.
Signed-off-by: Nicolas Pitre <nico@marvell.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Ball [Fri, 5 Mar 2010 21:43:33 +0000 (13:43 -0800)]
sdio: don't use CMD[357] as part of a powered SDIO resume
Seen on a Marvell 8686 SDIO card and Via VX855 controller: we must avoid
sending CMD3/5/7 on a resume where power has been maintained, because the
8686 will refuse to respond to them and the MMC stack will give up on the
card.
Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Nicolas Pitre <nico@marvell.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Pitre [Fri, 5 Mar 2010 21:43:31 +0000 (13:43 -0800)]
sdio: introduce API for special power management features
This patch series provides the core changes needed to allow SDIO cards to
remain powered and active while the host system is suspended, and let them
wake up the host system when needed. This is used to implement
wake-on-lan with SDIO wireless cards at the moment. Patches to add that
support to the libertas driver will be posted separately.
This patch:
Some SDIO cards have the ability to keep on running autonomously when the
host system is suspended, and wake it up when needed. This however
requires that the host controller preserve power to the card, and
configure itself appropriately for wake-up.
There is however 4 layers of abstractions involved: the host controller
driver, the MMC core code, the SDIO card management code, and the actual
SDIO function driver. To make things simple and manageable, host drivers
must advertise their PM capabilities with a feature bitmask, then function
drivers can query and set those features from their suspend method. Then
each layer in the suspend call chain is expected to act upon those bits
accordingly.
[akpm@linux-foundation.org: fix typo in comment] Signed-off-by: Nicolas Pitre <nico@marvell.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Fri, 5 Mar 2010 21:43:29 +0000 (13:43 -0800)]
sdhci: improve sdhci sdhci_set_adma_desc() code
sdhci_set_adma_desc() is using byte-writes to write data in a specified
order into memory. Change to using __le16 for the two byte and __le32 for
the four byte cases and use the cpu_to_{le16,le32} to do the conversion
before writing.
This will reduce the size of the code and the number of writes as we no
longer need to chop the data up before writing.
As an example on ARM S3C64XX SoC, in little-endian configuration:
Ben Dooks [Fri, 5 Mar 2010 21:43:26 +0000 (13:43 -0800)]
sdhci: add adma descriptor set call
The code to write the ADMA descriptor into memory is repeated several
times throughout sdhci_adma_table_pre, and thus should be moved into a
common function. This will also be useful if the patch to make the write
more efficient is accepted.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Cc: Pierre Ossman <pierre@ossman.eu> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maxim Levitsky [Fri, 5 Mar 2010 21:43:20 +0000 (13:43 -0800)]
ricoh_mmc: port from driver to pci quirk
This patch solves nasty problem original driver has.
Original goal of the ricoh_mmc was to disable this device because then,
mmc cards can be read using standard SDHCI controller, thus avoiding
writing of yet another driver.
However, the act of disablement, makes other pci functions that belong to
this controller (xD and memstick) shift up one level, thus pci core has
now wrong idea about these devices.
To fix this issue, this patch moves the driver into the pci quirk section,
thus it is executes before the pci is enumerated, and therefore solving
that issue, also same sequence of commands is performed on resume for same
reasons.
Also regardless of the above, this way is cleaner. You still need to set
CONFIG_MMC_RICOH_MMC to enable this quirk
Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> Acked-by: Philip Langdale <philipl@overt.org> Acked-by: Wolfram Sang <w.sang@pengutronix.de> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Fri, 5 Mar 2010 21:43:19 +0000 (13:43 -0800)]
fs/compat_ioctl.c: suppress two warnings
fs/compat_ioctl.c: In function 'do_ioctl_trans':
fs/compat_ioctl.c:534: warning: 'karg' may be used uninitialized in this function
fs/compat_ioctl.c:533: warning: 'kcmd' may be used uninitialized in this function
fs/compat_ioctl.c:656: warning: 'ret' may be used uninitialized in this function
Reduces text size by 44 bytes.
If someone calls one of these functions with an unexpected argument, the
code's buggy as-is.
Amerigo Wang <amwang@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don Mullis [Fri, 5 Mar 2010 21:43:15 +0000 (13:43 -0800)]
lib: more scalable list_sort()
XFS and UBIFS can pass long lists to list_sort(); this alternative
implementation scales better, reaching ~3x performance gain when list
length exceeds the L2 cache size.
Stand-alone program timings were run on a Core 2 duo L1=32KB L2=4MB,
gcc-4.4, with flags extracted from an Ubuntu kernel build. Object size is
581 bytes compared to 455 for Mark J. Roberts' code.
Worst case for either implementation is a list length just over a power of
two, and to roughly the same degree, so here are timing results for a
range of 2^N+1 lengths. List elements were 16 bytes each including malloc
overhead; initial order was random.
Simon's algorithm performs O(log N) passes over the entire input list,
doing merges of sublists that double in size on each pass. The generic
algorithm instead merges pairs of equal length lists as early as possible,
in recursive order. For either algorithm, the elements that extend the
list beyond power-of-two length are a special case, handled as nearly as
possible as a "rounding-up" to a full POT.
Some intuition for the locality of reference implications of merge order
may be gotten by watching this animation:
http://www.sorting-algorithms.com/merge-sort
Simon's algorithm requires only O(1) extra space rather than the generic
algorithm's O(log N), but in my non-recursive implementation the actual
O(log N) data is merely a vector of ~20 pointers, which I've put on the
stack.
Long-running list_sort() calls: If the list passed in may be long, or the
client's cmp() callback function is slow, the client's cmp() may
periodically invoke cond_resched() to voluntarily yield the CPU. All
inner loops of list_sort() call back to cmp().
Stability of the sort: distinct elements that compare equal emerge from
the sort in the same order as with Mark's code, for simple test cases. A
boot-time test is provided to verify this and other correctness
requirements.
A kernel that uses drm.ko appears to run normally with this change; I have
no suitable hardware to similarly test the use by UBIFS.
[akpm@linux-foundation.org: style tweaks, fix comment, make list_sort_test __init] Signed-off-by: Don Mullis <don.mullis@gmail.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Joe Perches <joe@perches.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
perlcritic is a standard checker for Perl Best Practices. This patch
fixes most of the warnings in the get_maintainer script. If kernel
programmers are going to have checkpatch they should write clean scripts
as well...
Bareword file handle opened at line 176, column 1. See pages 202,204 of PBP. (Severity: 5)
Two-argument "open" used at line 176, column 1. See page 207 of PBP. (Severity: 5)
Bareword file handle opened at line 207, column 5. See pages 202,204 of PBP. (Severity: 5)
Two-argument "open" used at line 207, column 5. See page 207 of PBP. (Severity: 5)
Bareword file handle opened at line 246, column 6. See pages 202,204 of PBP. (Severity: 5)
Two-argument "open" used at line 246, column 6. See page 207 of PBP. (Severity: 5)
Bareword file handle opened at line 258, column 2. See pages 202,204 of PBP. (Severity: 5)
Two-argument "open" used at line 258, column 2. See page 207 of PBP. (Severity: 5)
Expression form of "eval" at line 983, column 17. See page 161 of PBP. (Severity: 5)
Expression form of "eval" at line 985, column 17. See page 161 of PBP. (Severity: 5)
Subroutine prototypes used at line 1186, column 1. See page 194 of PBP. (Severity: 5)
Subroutine prototypes used at line 1206, column 1. See page 194 of PBP. (Severity: 5)
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 5 Mar 2010 21:43:04 +0000 (13:43 -0800)]
scripts/get_maintainer.pl: add ability to read from STDIN
Doesn't need or accept '-' as a trailing option to read stdin. Doesn't
print usage() after bad options. Adds --usage as command line equivalent
of --help
Suggested-by: Borislav Petkov <petkovbb@googlemail.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add an imperfect option to search a source file for email addresses.
New option: --file-emails or --fe
email addresses in files are freeform text and are nearly impossible to
parse. Still, might as well try to do a somewhat acceptable job of
finding them. This code should find all addresses that are in the form
addr@domain.tld
The code assumes that up to 3 alphabetic words along with dashes, commas,
and periods that preceed the email address are a name.
If 3 words are found for the name, and one of the first two words are a
single letter and period, or just a single letter then the 3 words are use
as name otherwise the last 2 words are used.
Some variants that are shown correctly:
John Smith <jksmith@domain.org>
Random J. Developer <rjd@tld.com>
Random J. Developer (rjd@tld.com)
J. Random Developer rjd@tld.com
Variants that are shown nominally correctly:
Written by First Last (funny-addr@somecompany.com)
is shown as:
First Last <funny-addr@somecompany.com>
Variants that are shown incorrectly:
Some Really Long Name <srln@foo.bar>
MontaVista Software, Inc. <source@mvista.com>
are returned as:
Long Name <srln@foo.bar>
"Software, Inc" <source@mvista.com>
--roles and --rolestats show "(in file)" for matches.
For instance:
Without -file-emails:
$ ./scripts/get_maintainer.pl -f -nogit -roles net/core/netpoll.c
David S. Miller <davem@davemloft.net> (maintainer:NETWORKING [GENERAL])
linux-kernel@vger.kernel.org (open list)
With -fe:
$ ./scripts/get_maintainer.pl -f -fe -nogit -roles net/core/netpoll.c
David S. Miller <davem@davemloft.net> (maintainer:NETWORKING [GENERAL])
Matt Mackall <mpm@selenic.com> (in file)
Ingo Molnar <mingo@redhat.com> (in file)
linux-kernel@vger.kernel.org (open list)
netdev@vger.kernel.org (open list:NETWORKING [GENERAL])
The number of email addresses in the file in not limited. Neither is the
number of returned email addresses.
Signed-off-by: Joe Perches <joe@perches.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tetsuo Handa [Fri, 5 Mar 2010 21:42:56 +0000 (13:42 -0800)]
kernel/pid.c: update comment on find_task_by_pid_ns
tasklist_lock does protect the task and its pid, it can't go away. The
problem is that find_pid_ns() itself is unsafe without rcu lock, it can
race with copy_process()->free_pid(any_pid).
Protecting copy_process()->free_pid(any_pid) with tasklist_lock would make
it possible to call find_task_by_pid_ns() under tasklist safely, but we
don't do so because we are trying to get rid of the read_lock sites of
tasklist_lock.
Anton Blanchard [Fri, 5 Mar 2010 21:42:55 +0000 (13:42 -0800)]
panic: fix panic_timeout accuracy when running on a hypervisor
I've had some complaints about panic_timeout being wildly innacurate on
shared processor PowerPC partitions (a 3 minute panic_timeout taking 30
minutes).
The problem is we loop on mdelay(1) and with a 1ms in 10ms hypervisor
timeslice each of these will take 10ms (ie 10x) longer. I expect other
platforms with shared processor hypervisors will see the same issue.
This patch keeps the old behaviour if we have a panic_blink (only keyboard
LEDs right now) and does 1 second mdelays if we don't.
Signed-off-by: Anton Blanchard <anton@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simon Kagstrom [Fri, 5 Mar 2010 21:42:49 +0000 (13:42 -0800)]
lkdtm: add debugfs access and loosen KPROBE ties
Add adds a debugfs interface and additional failure modes to LKDTM to
provide similar functionality to the provoke-crash driver submitted here:
http://lwn.net/Articles/371208/
Crashes can now be induced either through module parameters (as before)
or through the debugfs interface as in provoke-crash.
The patch also provides a new "direct" interface, where KPROBES are not
used, i.e., the crash is invoked directly upon write to the debugfs
file. When built without KPROBES configured, only this mode is available.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net> Cc: M. Mohan Kumar <mohan@in.ibm.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The only in tree external users of the symbol setup_max_cpus are in
arch/x86/. The files ./kernel/alternative.c, ./kernel/visws_quirks.c, and
./mm/kmemcheck/kmemcheck.c are all guarded by CONFIG_SMP being defined.
For this case the symbol is an unsigned int and declared as an extern in
include/linux/smp.h.
When CONFIG_SMP is not defined the symbol setup_max_cpus is
a constant value that is only used in init/main.c. Make the symbol
static for this case.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The macro any_online_node() is prone to producing sparse warnings due to
the local symbol 'node'. Since all the in-tree users are really
requesting the first online node (the mask argument is either
NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
remove the any_online_node macro since there are no users.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Milton Miller <miltonm@bga.com> Cc: Nathan Fontenot <nfont@austin.ibm.com> Cc: Geoff Levand <geoffrey.levand@am.sony.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: David S. Miller <davem@davemloft.net> Cc: Benny Halevy <bhalevy@panasas.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
init/initramfs.c: fix "symbol shadows an earlier one" noise
The symbol 'count' is a local global variable in this file. The function
clean_rootfs() should use a different symbol name to prevent "symbol
shadows an earlier one" noise.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Fri, 5 Mar 2010 21:42:35 +0000 (13:42 -0800)]
MFGPT: move clocksource menu
Move the CS5535 MFGPT hrtimer kconfig option to be with the other MFGPT
options. This makes it easier to find and also removes it from the main
"Device Drivers" menu, where it should not have been.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Andres Salomon <dilinger@collabora.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[akpm@linux-foundation.org: simplification] Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Kennedy [Fri, 5 Mar 2010 21:42:30 +0000 (13:42 -0800)]
cpuidle menu: remove 8 bytes of padding on 64 bit builds
Reorder struct menu_device to remove 8 bytes of padding on 64 bit builds.
Size drops from 136 to 128 bytes, so possibly needing one fewer cache
lines.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roel Kluin [Fri, 5 Mar 2010 21:42:28 +0000 (13:42 -0800)]
alpha: PTR_ERR overwrites -EINVAL in syscall osf_mount
The initial -EINVAL value is overwritten by `retval = PTR_ERR(name)'. If
this isn't an error pointer and typenr is not 1, 6 or 9, then this retval,
a pointer cast to a long, is returned.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Fri, 5 Mar 2010 21:42:26 +0000 (13:42 -0800)]
frv: remove pci_dma_sync_single() and pci_dma_sync_sg()
No architecture except for frv has pci_dma_sync_single() and
pci_dma_sync_sg(). The APIs are deprecated.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 5 Mar 2010 21:42:25 +0000 (13:42 -0800)]
mm: add comment on swap_duplicate's error code
swap_duplicate()'s loop appears to miss out on returning the error code
from __swap_duplicate(), except when that's -ENOMEM. In fact this is
intentional: prior to -ENOMEM for swap_count_continuation,
swap_duplicate() was void (and the case only occurs when copy_one_pte()
hits a corrupt pte). But that's surprising behaviour, which certainly
deserves a comment.
nommu: get_user_pages(): pin last page on non-page-aligned start
The noMMU version of get_user_pages() fails to pin the last page when the
start address isn't page-aligned. The patch fixes this in a way that
makes find_extend_vma() congruent to its MMU cousin.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 5 Mar 2010 21:42:22 +0000 (13:42 -0800)]
vmscan: detect mapped file pages used only once
The VM currently assumes that an inactive, mapped and referenced file page
is in use and promotes it to the active list.
However, every mapped file page starts out like this and thus a problem
arises when workloads create a stream of such pages that are used only for
a short time. By flooding the active list with those pages, the VM
quickly gets into trouble finding eligible reclaim canditates. The result
is long allocation latencies and eviction of the wrong pages.
This patch reuses the PG_referenced page flag (used for unmapped file
pages) to implement a usage detection that scales with the speed of LRU
list cycling (i.e. memory pressure).
If the scanner encounters those pages, the flag is set and the page cycled
again on the inactive list. Only if it returns with another page table
reference it is activated. Otherwise it is reclaimed as 'not recently
used cache'.
This effectively changes the minimum lifetime of a used-once mapped file
page from a full memory cycle to an inactive list cycle, which allows it
to occur in linear streams without affecting the stable working set of the
system.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: OSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 5 Mar 2010 21:42:21 +0000 (13:42 -0800)]
vmscan: drop page_mapping_inuse()
page_mapping_inuse() is a historic predicate function for pages that are
about to be reclaimed or deactivated.
According to it, a page is in use when it is mapped into page tables OR
part of swap cache OR backing an mmapped file.
This function is used in combination with page_referenced(), which checks
for young bits in ptes and the page descriptor itself for the
PG_referenced bit. Thus, checking for unmapped swap cache pages is
meaningless as PG_referenced is not set for anonymous pages and unmapped
pages do not have young ptes. The test makes no difference.
Protecting file pages that are not by themselves mapped but are part of a
mapped file is also a historic leftover for short-lived things like the
exec() code in libc. However, the VM now does reference accounting and
activation of pages at unmap time and thus the special treatment on
reclaim is obsolete.
This patch drops page_mapping_inuse() and switches the two callsites to
use page_mapped() directly.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: OSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Fri, 5 Mar 2010 21:42:19 +0000 (13:42 -0800)]
vmscan: factor out page reference checks
The used-once mapped file page detection patchset.
It is meant to help workloads with large amounts of shortly used file
mappings, like rtorrent hashing a file or git when dealing with loose
objects (git gc on a bigger site?).
Right now, the VM activates referenced mapped file pages on first
encounter on the inactive list and it takes a full memory cycle to
reclaim them again. When those pages dominate memory, the system
no longer has a meaningful notion of 'working set' and is required
to give up the active list to make reclaim progress. Obviously,
this results in rather bad scanning latencies and the wrong pages
being reclaimed.
This patch makes the VM be more careful about activating mapped file
pages in the first place. The minimum granted lifetime without
another memory access becomes an inactive list cycle instead of the
full memory cycle, which is more natural given the mentioned loads.
This test resembles a hashing rtorrent process. Sequentially, 32MB
chunks of a file are mapped into memory, hashed (sha1) and unmapped
again. While this happens, every 5 seconds a process is launched and
its execution time taken:
I also tested kernbench with regular IO streaming in the background to
see whether the delayed activation of frequently used mapped file
pages had a negative impact on performance in the presence of pressure
on the inactive list. The patch made no significant difference in
timing, neither for kernbench nor for the streaming IO throughput.
The first patch submission raised concerns about the cost of the extra
faults for actually activated pages on machines that have no hardware
support for young page table entries.
I created an artificial worst case scenario on an ARM machine with
around 300MHz and 64MB of memory to figure out the dimensions
involved. The test would mmap a file of 20MB, then
1. touch all its pages to fault them in
2. force one full scan cycle on the inactive file LRU
-- old: mapping pages activated
-- new: mapping pages inactive
3. touch the mapping pages again
-- old and new: fault exceptions to set the young bits
4. force another full scan cycle on the inactive file LRU
5. touch the mapping pages one last time
-- new: fault exceptions to set the young bits
The test showed an overall increase of 6% in time over 100 iterations
of the above (old: ~212sec, new: ~225sec). 13 secs total overhead /
(100 * 5k pages), ignoring the execution time of the test itself,
makes for about 25us overhead for every page that gets actually
activated. Note:
1. File mapping the size of one third of main memory, _completely_
in active use across memory pressure - i.e., most pages referenced
within one LRU cycle. This should be rare to non-existant,
especially on such embedded setups.
2. Many huge activation batches. Those batches only occur when the
working set fluctuates. If it changes completely between every full
LRU cycle, you have problematic reclaim overhead anyway.
3. Access of activated pages at maximum speed: sequential loads from
every single page without doing anything in between. In reality,
the extra faults will get distributed between actual operations on
the data.
So even if a workload manages to get the VM into the situation of
activating a third of memory in one go on such a setup, it will take
2.2 seconds instead 2.1 without the patch.
Comparing the numbers (and my user-experience over several months),
I think this change is an overall improvement to the VM.
Patch 1 is only refactoring to break up that ugly compound conditional
in shrink_page_list() and make it easy to document and add new checks
in a readable fashion.
Patch 2 gets rid of the obsolete page_mapping_inuse(). It's not
strictly related to #3, but it was in the original submission and is a
net simplification, so I kept it.
Patch 3 implements used-once detection of mapped file pages.
This patch:
Moving the big conditional into its own predicate function makes the code
a bit easier to read and allows for better commenting on the checks
one-by-one.
This is just cleaning up, no semantics should have been changed.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: OSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 5 Mar 2010 21:42:16 +0000 (13:42 -0800)]
mm: document /sys/devices/system/node/nodeX
Add a bare description of what /sys/devices/system/node/nodeX is. Others
will follow in time but right now, none of that tree is documented. The
existence of this file might at least encourage people to document new
entries.
David Rientjes [Fri, 5 Mar 2010 21:42:14 +0000 (13:42 -0800)]
mm: suppress pfn range output for zones without pages
free_area_init_nodes() emits pfn ranges for all zones on the system.
There may be no pages on a higher zone, however, due to memory limitations
or the use of the mem= kernel parameter. For example:
Zone PFN ranges:
DMA 0x00000001 -> 0x00001000
DMA32 0x00001000 -> 0x00100000
Normal 0x00100000 -> 0x00100000
The implementation copies the previous zone's highest pfn, if any, as the
next zone's lowest pfn. If its highest pfn is then greater than the
amount of addressable memory, the upper memory limit is used instead.
Thus, both the lowest and highest possible pfn for higher zones without
memory may be the same.
The pfn range for zones without memory is now shown as "empty" instead.
Signed-off-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>