Hugh Dickins [Wed, 3 Aug 2011 23:21:20 +0000 (16:21 -0700)]
tmpfs: demolish old swap vector support
The maximum size of a shmem/tmpfs file has been limited by the maximum
size of its triple-indirect swap vector. With 4kB page size, maximum
filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of
that on a 64-bit kernel. (With 8kB page size, maximum filesize was just
over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel,
MAX_LFS_FILESIZE being then more restrictive than swap vector layout.)
It's a shame that tmpfs should be more restrictive than ramfs, and this
limitation has now been noticed. Add another level to the swap vector?
No, it became obscure and hard to maintain, once I complicated it to
make use of highmem pages nine years ago: better choose another way.
Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then
tmpfs would never have invented its own peculiar radix tree: we would
have fitted swap entries into the common radix tree instead, in much the
same way as we fit swap entries into page tables.
And why should each file have a separate radix tree for its pages and
for its swap entries? The swap entries are required precisely where and
when the pages are not. We want to put them together in a single radix
tree: which can then avoid much of the locking which was needed to
prevent them from being exchanged underneath us.
This also avoids the waste of memory devoted to swap vectors, first in
the shmem_inode itself, then at least two more pages once a file grew
beyond 16 data pages (pages accounted by df and du, but not by memcg).
Allocated upfront, to avoid allocation when under swapping pressure, but
pure waste when CONFIG_SWAP is not set - I have never spattered around
the ifdefs to prevent that, preferring this move to sharing the common
radix tree instead.
There are three downsides to sharing the radix tree. One, that it binds
tmpfs more tightly to the rest of mm, either requiring knowledge of swap
entries in radix tree there, or duplication of its code here in shmem.c.
I believe that the simplications and memory savings (and probable higher
performance, not yet measured) justify that.
Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix
nodes that cannot be freed under memory pressure - whereas before it was
the less precious highmem swap vector pages that could not be freed.
I'm hoping that 64-bit has now been accessible for long enough, that the
highmem argument has grown much less persuasive.
Three, that swapoff is slower than it used to be on tmpfs files, since
it's using a simple generic mechanism not tailored to it: I find this
noticeable, and shall want to improve, but maybe nobody else will
notice.
So... now remove most of the old swap vector code from shmem.c. But,
for the moment, keep the simple i_direct vector of 16 pages, with simple
accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation
to help mark where swap needs to be handled in subsequent patches.
Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:19 +0000 (16:21 -0700)]
mm: let swap use exceptional entries
If swap entries are to be stored along with struct page pointers in a
radix tree, they need to be distinguished as exceptional entries.
Most of the handling of swap entries in radix tree will be contained in
shmem.c, but a few functions in filemap.c's common code need to check
for their appearance: find_get_page(), find_lock_page(),
find_get_pages() and find_get_pages_contig().
So as not to slow their fast paths, tuck those checks inside the
existing checks for unlikely radix_tree_deref_slot(); except for
find_lock_page(), where it is an added test. And make it a BUG in
find_get_pages_tag(), which is not applied to tmpfs files.
A part of the reason for eliminating shmem_readpage() earlier, was to
minimize the places where common code would need to allow for swap
entries.
The swp_entry_t known to swapfile.c must be massaged into a slightly
different form when stored in the radix tree, just as it gets massaged
into a pte_t when stored in page tables.
In an i386 kernel this limits its information (type and page offset) to
30 bits: given 32 "types" of swapfile and 4kB pagesize, that's a maximum
swapfile size of 128GB. Which is less than the 512GB we previously
allowed with X86_PAE (where the swap entry can occupy the entire upper
32 bits of a pte_t), but not a new limitation on 32-bit without PAE; and
there's not a new limitation on 64-bit (where swap filesize is already
limited to 16TB by a 32-bit page offset). Thirty areas of 128GB is
probably still enough swap for a 64GB 32-bit machine.
Provide swp_to_radix_entry() and radix_to_swp_entry() conversions, and
enforce filesize limit in read_swap_header(), just as for ptes.
Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:18 +0000 (16:21 -0700)]
radix_tree: exceptional entries and indices
A patchset to extend tmpfs to MAX_LFS_FILESIZE by abandoning its
peculiar swap vector, instead keeping a file's swap entries in the same
radix tree as its struct page pointers: thus saving memory, and
simplifying its code and locking.
This patch:
The radix_tree is used by several subsystems for different purposes. A
major use is to store the struct page pointers of a file's pagecache for
memory management. But what if mm wanted to store something other than
page pointers there too?
The low bit of a radix_tree entry is already used to denote an indirect
pointer, for internal use, and the unlikely radix_tree_deref_retry()
case.
Define the next bit as denoting an exceptional entry, and supply inline
functions radix_tree_exception() to return non-0 in either unlikely
case, and radix_tree_exceptional_entry() to return non-0 in the second
case.
If a subsystem already uses radix_tree with that bit set, no problem: it
does not affect internal workings at all, but is defined for the
convenience of those storing well-aligned pointers in the radix_tree.
The radix_tree_gang_lookups have an implicit assumption that the caller
can deduce the offset of each entry returned e.g. by the page->index of
a struct page. But that may not be feasible for some kinds of item to
be stored there.
radix_tree_gang_lookup_slot() allow for an optional indices argument,
output array in which to return those offsets. The same could be added
to other radix_tree_gang_lookups, but for now keep it to the only one
for which we need it.
Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 3 Aug 2011 23:21:17 +0000 (16:21 -0700)]
drivers/video/backlight/aat2870_bl.c: make it buildable as a module
i386 allmodconfig:
drivers/built-in.o: In function `aat2870_bl_remove':
aat2870_bl.c:(.text+0x414f9): undefined reference to `backlight_device_unregister'
drivers/built-in.o: In function `aat2870_bl_probe':
aat2870_bl.c:(.text+0x418fc): undefined reference to `backlight_device_register'
aat2870_bl.c:(.text+0x41a31): undefined reference to `backlight_device_unregiste
Cc: Jin Park <jinyoungp@nvidia.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Axel Lin <axel.lin@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Current implementation tests wrong value for setting
aat2870_bl->max_current.
- In the current implementation, we cannot differentiate between 2 cases:
a) if pdata->max_current is not set , or
b) pdata->max_current is set to AAT2870_CURRENT_0_45 (which is also 0).
Fix it by setting AAT2870_CURRENT_0_45 to be 1 and adjust the equation in
aat2870_brightness() accordingly.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Samuel Ortiz <sameo@linux.intel.com> Tested-by: Jin Park <jinyoungp@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Wed, 3 Aug 2011 23:21:16 +0000 (16:21 -0700)]
drivers/video/backlight/aat2870_bl.c: fix error checking for backlight_device_register
backlight_device_register() returns ERR_PTR() on error.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Jin Park <jinyoungp@nvidia.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WANG Cong [Wed, 3 Aug 2011 23:21:15 +0000 (16:21 -0700)]
cris: add missing declaration of kgdb_init() and breakpoint()
Fix:
arch/cris/arch-v10/kernel/irq.c:239: error: implicit declaration of function 'kgdb_init'
arch/cris/arch-v10/kernel/irq.c:240: error: implicit declaration of function 'breakpoint'
Declare these two functions.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 3 Aug 2011 23:21:11 +0000 (16:21 -0700)]
tpm_tis: fix build when ACPI is not enabled
Fix tpm_tis.c build when CONFIG_ACPI is not enabled by providing a stub
function. Fixes many build errors/warnings:
drivers/char/tpm/tpm_tis.c:89: error: dereferencing pointer to incomplete type
drivers/char/tpm/tpm_tis.c:89: warning: type defaults to 'int' in declaration of 'type name'
drivers/char/tpm/tpm_tis.c:89: error: request for member 'list' in something not a structure or union
...
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Leendert van Doorn <leendert@watson.ibm.com> Cc: James Morris <jmorris@namei.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 3 Aug 2011 23:21:09 +0000 (16:21 -0700)]
mm: page_alloc: increase __GFP_BITS_SHIFT to include __GFP_OTHER_NODE
__GFP_OTHER_NODE is used for NUMA allocations on behalf of other nodes.
It's supposed to be passed through from the page allocator to
zone_statistics(), but it never gets there as gfp_allowed_mask is not
wide enough and masks out the flag early in the allocation path.
The result is an accounting glitch where successful NUMA allocations
by-agent are not properly attributed as local.
Increase __GFP_BITS_SHIFT so that it includes __GFP_OTHER_NODE.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergiu Iordache [Wed, 3 Aug 2011 23:21:09 +0000 (16:21 -0700)]
ramoops: update module parameters
Update the module parameters when platform data is used. This means
that they can be read from /sys/module/ramoops/parameters in order to
parse the memory area.
Signed-off-by: Sergiu Iordache <sergiu@chromium.org> Cc: Marco Stornelli <marco.stornelli@gmail.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Will Drewry [Wed, 3 Aug 2011 23:21:08 +0000 (16:21 -0700)]
Documentation: add pointer to name_to_dev_t for root= values
Update kernel-parameters.txt to point users to the authoritative comment
for name_to_dev_t. In addition, updates other places where some
name_to_dev_t behavior was discussed. All other references to root=
appear to be for explicit sample usage or just side comments when
discussing other kernel parameters.
Signed-off-by: Will Drewry <wad@chromium.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Wed, 3 Aug 2011 23:21:05 +0000 (16:21 -0700)]
taskstats: add_del_listener() should ignore !valid listeners
When send_cpu_listeners() finds the orphaned listener it marks it as
!valid and drops listeners->sem. Before it takes this sem for writing,
s->pid can be reused and add_del_listener() can wrongly try to re-use
this entry.
Oleg Nesterov [Wed, 3 Aug 2011 23:21:04 +0000 (16:21 -0700)]
taskstats: add_del_listener() shouldn't use the wrong node
1. Commit 26c4caea9d69 "don't allow duplicate entries in listener mode"
changed add_del_listener(REGISTER) so that "next_cpu:" can reuse the
listener allocated for the previous cpu, this doesn't look exactly
right even if minor.
Change the code to kfree() in the already-registered case, this case
is unlikely anyway so the extra kmalloc_node() shouldn't hurt but
looke more correct and clean.
2. use the plain list_for_each_entry() instead of _safe() to scan
listeners->list.
3. Remove the unneeded INIT_LIST_HEAD(&s->list), we are going to
list_add(&s->list).
Daniel Glöckner [Wed, 3 Aug 2011 23:21:02 +0000 (16:21 -0700)]
rtc-omap: fix initialization of control register
As the comment explains, the intention of the code is to clear the
OMAP_RTC_CTRL_MODE_12_24 bit, but instead it only clears the
OMAP_RTC_CTRL_SPLIT and OMAP_RTC_CTRL_AUTO_COMP bits, which should be
kept. OMAP_RTC_CTRL_DISABLE, OMAP_RTC_CTRL_SET_32_COUNTER,
OMAP_RTC_CTRL_TEST, and OMAP_RTC_CTRL_ROUND_30S are also better off
being cleared.
Signed-off-by: Daniel Glöckner <dg@emlix.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Wed, 3 Aug 2011 23:21:01 +0000 (16:21 -0700)]
fault-injection: add ability to export fault_attr in arbitrary directory
init_fault_attr_dentries() is used to export fault_attr via debugfs.
But it can only export it in debugfs root directory.
Per Forlin is working on mmc_fail_request which adds support to inject
data errors after a completed host transfer in MMC subsystem.
The fault_attr for mmc_fail_request should be defined per mmc host and
export it in debugfs directory per mmc host like
/sys/kernel/debug/mmc0/mmc_fail_request.
init_fault_attr_dentries() doesn't help for mmc_fail_request. So this
introduces fault_create_debugfs_attr() which is able to create a
directory in the arbitrary directory and replace
init_fault_attr_dentries().
[akpm@linux-foundation.org: extraneous semicolon, per Randy] Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Tested-by: Per Forlin <per.forlin@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 3 Aug 2011 07:17:39 +0000 (21:17 -1000)]
Merge branch 'tools-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6
* 'tools-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
tools/power turbostat: fit output into 80 columns on snb-ep
tools/power x86_energy_perf_policy: fix print of uninitialized string
Linus Torvalds [Wed, 3 Aug 2011 07:17:02 +0000 (21:17 -1000)]
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (28 commits)
ACPI: delete stale reference in kernel-parameters.txt
ACPI: add missing _OSI strings
ACPI: remove NID_INVAL
thermal: make THERMAL_HWMON implementation fully internal
thermal: split hwmon lookup to a separate function
thermal: hide CONFIG_THERMAL_HWMON
ACPI print OSI(Linux) warning only once
ACPI: DMI workaround for Asus A8N-SLI Premium and Asus A8N-SLI DELUX
ACPI / Battery: propagate sysfs error in acpi_battery_add()
ACPI / Battery: avoid acpi_battery_add() use-after-free
ACPI: introduce "acpi_rsdp=" parameter for kdump
ACPI: constify ops structs
ACPI: fix CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS
ACPI: fix 80 char overflow
ACPI / Battery: Resolve the race condition in the sysfs_remove_battery()
ACPI / Battery: Add the check before refresh sysfs in the battery_notify()
ACPI / Battery: Add the hibernation process in the battery_notify()
ACPI / Battery: Rename acpi_battery_quirks2 with acpi_battery_quirks
ACPI / Battery: Change 16-bit signed negative battery current into correct value
ACPI / Battery: Add the power unit macro
...
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile/mm/init.c: trivial: use BUG_ON
arch/tile: remove useless set_fixmap_nocache() macro
arch/tile: add hypervisor-based character driver for SPI flash ROM
ioctl-number.txt: add the tile hardwall ioctl range
tile: use generic-y format for one-line asm-generic headers
clocksource: tile: convert to use clocksource_register_hz
Linus Torvalds [Wed, 3 Aug 2011 07:14:05 +0000 (21:14 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (31 commits)
Btrfs: don't call writepages from within write_full_page
Btrfs: Remove unused variable 'last_index' in file.c
Btrfs: clean up for find_first_extent_bit()
Btrfs: clean up for wait_extent_bit()
Btrfs: clean up for insert_state()
Btrfs: remove unused members from struct extent_state
Btrfs: clean up code for merging extent maps
Btrfs: clean up code for extent_map lookup
Btrfs: clean up search_extent_mapping()
Btrfs: remove redundant code for dir item lookup
Btrfs: make acl functions really no-op if acl is not enabled
Btrfs: remove remaining ref-cache code
Btrfs: remove a BUG_ON() in btrfs_commit_transaction()
Btrfs: use wait_event()
Btrfs: check the nodatasum flag when writing compressed files
Btrfs: copy string correctly in INO_LOOKUP ioctl
Btrfs: don't print the leaf if we had an error
btrfs: make btrfs_set_root_node void
Btrfs: fix oops while writing data to SSD partitions
Btrfs: Protect the readonly flag of block group
...
Fix up trivial conflicts (due to acl and writeback cleanups) in
- fs/btrfs/acl.c
- fs/btrfs/ctree.h
- fs/btrfs/extent_io.c
Linus Torvalds [Wed, 3 Aug 2011 06:49:57 +0000 (20:49 -1000)]
Merge branch 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6:
MAINTAINERS: Add keyword match for of_match_table to device tree section
of: constify property name parameters for helper functions
input: xilinx_ps2: Add missing of_address.h header
of: address: use resource_size helper
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-watchdog:
watchdog: Cleanup WATCHDOG_CORE help text
watchdog: Fix POST failure on ASUS P5N32-E SLI and similar boards
watchdog: shwdt: fix usage of mod_timer
Tony Luck [Tue, 2 Aug 2011 22:08:30 +0000 (15:08 -0700)]
efivars: fix warnings when CONFIG_PSTORE=n
drivers/firmware/efivars.c:161: warning: ‘utf16_strlen’ defined but not used
utf16_strlen() is only used inside CONFIG_PSTORE - make this "static inline"
to shut the compiler up [thanks to hpa for the suggestion].
drivers/firmware/efivars.c:602: warning: initialization from incompatible pointer type
Between v1 and v2 of this patch series we decided to make the "part" number
unsigned - but missed fixing the stub version of efi_pstore_write()
Acked-by: Matthew Garrett <mjg@redhat.com> Acked-by: Mike Waychison <mikew@google.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Shaohua Li [Thu, 28 Jul 2011 20:48:43 +0000 (13:48 -0700)]
ACPI: add missing _OSI strings
Linux supports some optional features, but it should notify the BIOS about
them via the _OSI method. Currently Linux doesn't notify any, which might
make such features not work because the BIOS doesn't know about them.
Jarosz has a system which needs this to make ACPI processor aggregator
device work.
Reported-by: "Jarosz, Sebastian" <sebastian.jarosz@intel.com> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
David Rientjes [Thu, 28 Jul 2011 20:48:43 +0000 (13:48 -0700)]
ACPI: remove NID_INVAL
b552a8c56db8 ("ACPI: remove NID_INVAL") removed the left over uses of
NID_INVAL, but didn't actually remove the definition. Remove it.
Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Jean Delvare [Thu, 28 Jul 2011 20:48:42 +0000 (13:48 -0700)]
thermal: make THERMAL_HWMON implementation fully internal
THERMAL_HWMON is implemented inside the thermal_sys driver and has no
effect on drivers implementing thermal zones, so they shouldn't see
anything related to it in <linux/thermal.h>. Making the THERMAL_HWMON
implementation fully internal has two advantages beyond the cleaner
design:
* This avoids rebuilding all thermal drivers if the THERMAL_HWMON
implementation changes, or if CONFIG_THERMAL_HWMON gets enabled or
disabled.
* This avoids breaking the thermal kABI in these cases too, which should
make distributions happy.
The only drawback I can see is slightly higher memory fragmentation, as
the number of kzalloc() calls will increase by one per thermal zone. But
I doubt it will be a problem in practice, as I've never seen a system with
more than two thermal zones.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Rene Herman <rene.herman@gmail.com> Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Jean Delvare [Thu, 28 Jul 2011 20:48:41 +0000 (13:48 -0700)]
thermal: split hwmon lookup to a separate function
We'll soon need to reuse it.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Rene Herman <rene.herman@gmail.com> Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Jean Delvare [Thu, 28 Jul 2011 20:48:40 +0000 (13:48 -0700)]
thermal: hide CONFIG_THERMAL_HWMON
It's about time to revert 16d752397301b9 ("thermal: Create
CONFIG_THERMAL_HWMON=n"). Anybody running a kernel >= 2.6.40 would also
be running a recent enough version of lm-sensors.
Actually having CONFIG_THERMAL_HWMON is pretty convenient so instead of
dropping it, we keep it but hide it.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Rene Herman <rene.herman@gmail.com> Acked-by: Guenter Roeck <guenter.roeck@ericsson.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>
Mark Brown [Tue, 2 Aug 2011 05:13:26 +0000 (14:13 +0900)]
MAINTAINERS: Add keyword match for of_match_table to device tree section
If a patch is working with an of_match_table it's probably adding an OF
binding to a driver in which case the binding ought to be reviewed by the
device tree folks to make sure that things like the device matches are set
up correctly.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Linus Walleij [Tue, 2 Aug 2011 09:29:24 +0000 (11:29 +0200)]
spi/pl022: remove function cannot exit
The remove function in the PL022 driver cannot abort the remove
function any way, so restructure the code so as not to make that
assumption. Remove will now proceed no matter whether it can
stop the transfer queue or not.
Reported-by: Russell King <linux@arm.linux.org.uk> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Mike Snitzer [Tue, 2 Aug 2011 11:32:08 +0000 (12:32 +0100)]
dm table: set flush capability based on underlying devices
DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities
regardless of whether or not a given DM device's underlying devices
also advertised a need for them.
Block's flush-merge changes from 2.6.39 have proven to be more costly
for DM devices. Performance regressions have been reported even when
DM's underlying devices do not advertise that they have a write cache.
Fix the performance regressions by configuring a DM device's flushing
capabilities based on those of the underlying devices' capabilities.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Milan Broz [Tue, 2 Aug 2011 11:32:08 +0000 (12:32 +0100)]
dm crypt: optionally support discard requests
Add optional parameter field to dmcrypt table and support
"allow_discards" option.
Discard requests bypass crypt queue processing. Bio is simple remapped
to underlying device.
Note that discard will be never enabled by default because of security
consequences. It is up to the administrator to enable it for encrypted
devices.
(Note that userspace cryptsetup does not understand new optional
parameters yet. Support for this will come later. Until then, you
should use 'dmsetup' to enable and disable this.)
Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Add the ability to parse and use metadata devices to dm-raid. Although
not strictly required, without the metadata devices, many features of
RAID are unavailable. They are used to store a superblock and bitmap.
The role, or position in the array, of each device must be recorded in
its superblock. This is to help with fault handling, array reshaping,
and sanity checks. RAID 4/5/6 devices must be loaded in a specific order:
in this way, the 'array_position' field helps validate the correctness
of the mapping when it is loaded. It can be used during reshaping to
identify which devices are added/removed. Fault handling is impossible
without this field. For example, when a device fails it is recorded in
the superblock. If this is a RAID1 device and the offending device is
removed from the array, there must be a way during subsequent array
assembly to determine that the failed device was the one removed. This
is done by correlating the 'array_position' field and the bit-field
variable 'failed_devices'.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 2 Aug 2011 11:32:06 +0000 (12:32 +0100)]
dm ioctl: forbid multiple device specifiers
Exactly one of name, uuid or device must be specified when referencing
an existing device. This removes the ambiguity (risking the wrong
device being updated) if two conflicting parameters were specified.
Previously one parameter got used and any others were ignored silently.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 2 Aug 2011 11:32:06 +0000 (12:32 +0100)]
dm ioctl: introduce __get_dev_cell
Move logic to find device based on major/minor number to a separate
function __get_dev_cell (similar to __get_uuid_cell and __get_name_cell).
This makes the function __find_device_hash_cell more straightforward.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 2 Aug 2011 11:32:06 +0000 (12:32 +0100)]
dm ioctl: fill in device parameters in more ioctls
Move parameter filling from find_device to __find_device_hash_cell.
This patch causes ioctls using __find_device_hash_cell
(DM_DEV_REMOVE_CMD, DM_DEV_SUSPEND_CMD - resume, DM_TABLE_CLEAR_CMD)
to return device parameters, bringing them into line with the other
ioctls.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Tue, 2 Aug 2011 11:32:06 +0000 (12:32 +0100)]
dm flakey: add corrupt_bio_byte feature
Add corrupt_bio_byte feature to simulate corruption by overwriting a byte at a
specified position with a specified value during intervals when the device is
"down".
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Tue, 2 Aug 2011 11:32:05 +0000 (12:32 +0100)]
dm flakey: support feature args
Add the ability to specify arbitrary feature flags when creating a
flakey target. This code uses the same target argument helpers that
the multipath target does.
Also remove the superfluous 'dm-flakey' prefixes from the error messages,
as they already contain the prefix 'flakey'.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 2 Aug 2011 11:32:04 +0000 (12:32 +0100)]
dm snapshot: skip reading origin when overwriting complete chunk
If we write a full chunk in the snapshot, skip reading the origin device
because the whole chunk will be overwritten anyway.
This patch changes the snapshot write logic when a full chunk is written.
In this case:
1. allocate the exception
2. dispatch the bio (but don't report the bio completion to device mapper)
3. write the exception record
4. report bio completed
Callbacks must be done through the kcopyd thread, because callbacks must not
race with each other. So we create two new functions:
dm_kcopyd_prepare_callback: allocate a job structure and prepare the callback.
(This function must not be called from interrupt context.)
dm_kcopyd_do_callback: submit callback.
(This function may be called from interrupt context.)
Performance test (on snapshots with 4k chunk size):
without the patch:
non-direct-io sequential write (dd): 17.7MB/s
direct-io sequential write (dd): 20.9MB/s
non-direct-io random write (mkfs.ext2): 0.44s
with the patch:
non-direct-io sequential write (dd): 26.5MB/s
direct-io sequential write (dd): 33.2MB/s
non-direct-io random write (mkfs.ext2): 0.27s
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 2 Aug 2011 11:32:04 +0000 (12:32 +0100)]
dm: ignore merge_bvec for snapshots when safe
Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate
whether the device can accept bios larger than the size its merge
function returns. When set, use this to send large bios to snapshots
which can split them if necessary. Snapshot I/O may be significantly
fragmented and this approach seems to improve peformance.
Before the patch, dm_set_device_limits restricted bio size to page size
if the underlying device had a merge function and the target didn't
provide a merge function. After the patch, dm_set_device_limits
restricts bio size to page size if the underlying device has a merge
function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't
provide a merge function.
The snapshot target can't provide a merge function because when the merge
function is called, it is impossible to determine where the bio will be
remapped. Previously this led us to impose a 4k limit, which we can
now remove if the snapshot store is located on a device without a merge
function. Together with another patch for optimizing full chunk writes,
it improves performance from 29MB/s to 40MB/s when writing to the
filesystem on snapshot store.
If the snapshot store is placed on a non-dm device with a merge function
(such as md-raid), device mapper still limits all bios to page size.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 2 Aug 2011 11:32:02 +0000 (12:32 +0100)]
dm kcopyd: remove nr_pages field from job structure
The nr_pages field in struct kcopyd_job is only used temporarily in
run_pages_job() to count the number of required pages.
We can use a local variable instead.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Tue, 2 Aug 2011 11:32:01 +0000 (12:32 +0100)]
dm table: fix discard support
Remove 'discards_supported' from the dm_table structure. The same
information can be easily discovered from the table's target(s) in
dm_table_supports_discards().
Before this fix dm_table_supports_discards() would skip checking the
individual targets' 'discards_supported' flag if any one target in the
table didn't set num_discard_requests > 0. Now the per-target
'discards_supported' flag is effective at insuring the final DM device
advertises discard support. But, to be clear, targets that don't
support discards (!num_discard_requests) will not receive discard
requests.
Also DMWARN if a target sets 'discards_supported' override but forgets
to set 'num_discard_requests'.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 2 Aug 2011 11:32:01 +0000 (12:32 +0100)]
dm io: flush cpu cache with vmapped io
For normal kernel pages, CPU cache is synchronized by the dma layer.
However, this is not done for pages allocated with vmalloc. If we do I/O
to/from vmallocated pages, we must synchronize CPU cache explicitly.
Prior to doing I/O on vmallocated page we must call
flush_kernel_vmap_range to flush dirty cache on the virtual address.
After finished read we must call invalidate_kernel_vmap_range to
invalidate cache on the virtual address, so that accesses to the virtual
address return newly read data and not stale data from CPU cache.
This patch fixes metadata corruption on dm-snapshots on PA-RISC and
possibly other architectures with caches indexed by virtual address.
Mart Gerrits [Sat, 30 Jul 2011 14:59:12 +0000 (16:59 +0200)]
watchdog: Fix POST failure on ASUS P5N32-E SLI and similar boards
At present the module does not unset the NO_REBOOT bit upon shutdown, this
causes the BIOS to fail the POST once and reset. During the next boot it
displays the following error message:
***** Warning: System BOOT Fail *****
Your system last boot fail or POST interrupted.
Please enter setup to load default and reboot again.
Press F1 to continue, DEL to enter SETUP
With this patch the NO_REBOOT flag will be unset on shutdown and thus stop
this failure from occurring.
Tested on 'ASUS P5N32-E SLI with BIOS revision 1801' and
'ASUS P5N32-E SLI PLUS with BIOS revision 1502'.
Signed-off-by: Mart Gerrits <mart1987@gmail.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
David Engraf [Wed, 20 Jul 2011 13:03:39 +0000 (15:03 +0200)]
watchdog: shwdt: fix usage of mod_timer
Fix the usage of mod_timer() and make the driver usable. mod_timer() must
be called with an absolute timeout in jiffies. The old implementation
used a relative timeout thus the hardware watchdog was never triggered.
Signed-off-by: David Engraf <david.engraf@sysgo.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Wim Van sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: stable <stable@kernel.org>
oom: task->mm == NULL doesn't mean the memory was freed
exit_mm() sets ->mm == NULL then it does mmput()->exit_mmap() which
frees the memory.
However select_bad_process() checks ->mm != NULL before TIF_MEMDIE,
so it continues to kill other tasks even if we have the oom-killed
task freeing its memory.
Change select_bad_process() to check ->mm after TIF_MEMDIE, but skip
the tasks which have already passed exit_notify() to ensure a zombie
with TIF_MEMDIE set can't block oom-killer. Alternatively we could
probably clear TIF_MEMDIE after exit_mmap().
Linus Torvalds [Tue, 2 Aug 2011 00:05:46 +0000 (14:05 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: (23 commits)
regulator: Improve WM831x DVS VSEL selection algorithm
regulator: Bootstrap wm831x DVS VSEL value from ON VSEL if not already set
regulator: Set up GPIO for WM831x VSEL before enabling VSEL mode
regulator: Add EPEs to the MODULE_ALIAS() for wm831x-dcdc
regulator: Fix WM831x DCDC DVS VSEL bootstrapping
regulator: Fix WM831x regulator ID lookups for multiple WM831xs
regulator: Fix argument format type errors in error prints
regulator: Fix memory leak in set_machine_constraints() error paths
regulator: Make core more chatty about some errors
regulator: tps65910: Fix array access out of bounds bug
regulator: tps65910: Add missing breaks in switch/case
regulator: tps65910: Fix a memory leak in tps65910_probe error path
regulator: TWL: Remove entry of RES_ID for 6030 macros
ASoC: tlv320aic3x: Add correct hw registers to Line1 cross connect muxes
regulator: Add basic per consumer debugfs
regulator: Add rdev_crit() macro
regulator: Refactor supply implementation to work as regular consumers
regulator: Include the device name in the microamps_requested_ file
regulator: Increase the limit on sysfs file names
regulator: Properly register dummy regulator driver
...
Linus Torvalds [Mon, 1 Aug 2011 23:56:03 +0000 (13:56 -1000)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits)
ext4: prevent memory leaks from ext4_mb_init_backend() on error path
ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode #
ext4: use ext4_msg() instead of printk in mballoc
ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info
ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()
ext4: use the correct error exit path in ext4_init_inode_table()
ext4: add missing kfree() on error return path in add_new_gdb()
ext4: change umode_t in tracepoint headers to be an explicit __u16
ext4: fix races in ext4_sync_parent()
ext4: Fix overflow caused by missing cast in ext4_fallocate()
ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole
ext4: simplify parameters of reserve_backup_gdb()
ext4: simplify parameters of add_new_gdb()
ext4: remove lock_buffer in bclean() and setup_new_group_blocks()
ext4: simplify journal handling in setup_new_group_blocks()
ext4: let setup_new_group_blocks() set multiple bits at a time
ext4: fix a typo in ext4_group_extend()
ext4: let ext4_group_add_blocks() handle 0 blocks quickly
ext4: let ext4_group_add_blocks() return an error code
ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks()
...
Fix up conflict in fs/ext4/inode.c: commit aacfc19c626e ("fs: simplify
the blockdev_direct_IO prototype") had changed the ext4_ind_direct_IO()
function for the new simplified calling convention, while commit dae1e52cb126 ("ext4: move ext4_ind_* functions from inode.c to
indirect.c") moved the function to another file.
Linus Torvalds [Mon, 1 Aug 2011 23:48:31 +0000 (13:48 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
xfs: Fix build breakage in xfs_iops.c when CONFIG_FS_POSIX_ACL is not set
VFS: Reorganise shrink_dcache_for_umount_subtree() after demise of dcache_lock
VFS: Remove dentry->d_lock locking from shrink_dcache_for_umount_subtree()
VFS: Remove detached-dentry counter from shrink_dcache_for_umount_subtree()
switch posix_acl_chmod() to umode_t
switch posix_acl_from_mode() to umode_t
switch posix_acl_equiv_mode() to umode_t *
switch posix_acl_create() to umode_t *
block: initialise bd_super in bdget()
vfs: avoid call to inode_lru_list_del() if possible
vfs: avoid taking inode_hash_lock on pipes and sockets
vfs: conditionally call inode_wb_list_del()
VFS: Fix automount for negative autofs dentries
Btrfs: load the key from the dir item in readdir into a fake dentry
devtmpfs: missing initialialization in never-hit case
hppfs: missing include
Linus Torvalds [Mon, 1 Aug 2011 23:40:51 +0000 (13:40 -1000)]
Merge branch 'pstore-efi' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'pstore-efi' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
efivars: Introduce PSTORE_EFI_ATTRIBUTES
efivars: Use string functions in pstore_write
efivars: introduce utf16_strncmp
efivars: String functions
efi: Add support for using efivars as a pstore backend
pstore: Allow the user to explicitly choose a backend
pstore: Make "part" unsigned
pstore: Add extra context for writes and erases
pstore: Extend API for more flexibility in new backends
Linus Torvalds [Mon, 1 Aug 2011 23:39:40 +0000 (13:39 -1000)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kdb,kgdb: Allow arbitrary kgdb magic knock sequences
kdb: Remove all references to DOING_KGDB2
kdb,kgdb: Implement switch and pass buffer from kdb -> gdb
kdb: cleanup unused variables missed in the original kdb merge
Yu Jian [Mon, 1 Aug 2011 21:41:46 +0000 (17:41 -0400)]
ext4: prevent memory leaks from ext4_mb_init_backend() on error path
In ext4_mb_init(), if the s_locality_group allocation fails it will
currently cause the allocations made in ext4_mb_init_backend() to
be leaked. Moving the ext4_mb_init_backend() allocation after the
s_locality_group allocation avoids that problem.
Josef Bacik [Mon, 1 Aug 2011 18:37:36 +0000 (14:37 -0400)]
Btrfs: don't call writepages from within write_full_page
When doing a writepage we call writepages to try and write out any other dirty
pages in the area. This could cause problems where we commit a transaction and
then have somebody else dirtying metadata in the area as we could end up writing
out a lot more than we care about, which could cause latency on anybody who is
waiting for the transaction to completely finish committing. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Btrfs: Remove unused variable 'last_index' in file.c
The variable 'last_index' is calculated in the __btrfs_buffered_write
function and passed as a parameter to the prepare_pages function,
but is not used anywhere in the prepare_pages function.
Remove instances of 'last_index' in these functions.
Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>