Yinghai Lu [Fri, 1 Mar 2013 22:51:27 +0000 (14:51 -0800)]
x86, ACPI, mm: Revert movablemem_map support
Tim found:
WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
Hardware name: S2600CP
sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
smpboot: Booting Node 1, Processors #1
Modules linked in:
Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
Call Trace:
set_cpu_sibling_map+0x279/0x449
start_secondary+0x11d/0x1e5
Don Morris reproduced on a HP z620 workstation, and bisected it to
commit e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock
is ready")
It turns out movable_map has some problems, and it breaks several things
1. numa_init is called several times, NOT just for srat. so those
nodes_clear(numa_nodes_parsed)
memset(&numa_meminfo, 0, sizeof(numa_meminfo))
can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy.
and make fall back path working.
2. simply split acpi_numa_init to early_parse_srat.
a. that early_parse_srat is NOT called for ia64, so you break ia64.
b. for (i = 0; i < MAX_LOCAL_APIC; i++)
set_apicid_to_node(i, NUMA_NO_NODE)
still left in numa_init. So it will just clear result from early_parse_srat.
it should be moved before that....
c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
early before override from INITRD is settled.
3. that patch TITLE is total misleading, there is NO x86 in the title,
but it changes critical x86 code. It caused x86 guys did not
pay attention to find the problem early. Those patches really should
be routed via tip/x86/mm.
4. after that commit, following range can not use movable ram:
a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
b. initrd... it will be freed after booting, so it could be on movable...
c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
anymore.
d. init_mem_mapping: can not put page table high anymore.
e. initmem_init: vmemmap can not be high local node anymore. That is
not good.
If node is hotplugable, the mem related range like page table and
vmemmap could be on the that node without problem and should be on that
node.
We have workaround patch that could fix some problems, but some can not
be fixed.
So just remove that offending commit and related ones including:
f7210e6c4ac7 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
protect movablecore_map in memblock_overlaps_region().")
01a178a94e8e ("acpi, memory-hotplug: support getting hotplug info from
SRAT")
27168d38fa20 ("acpi, memory-hotplug: extend movablemem_map ranges to
the end of node")
e8d195525809 ("acpi, memory-hotplug: parse SRAT before memblock is
ready")
fb06bc8e5f42 ("page_alloc: bootmem limit with movablecore_map")
42f47e27e761 ("page_alloc: make movablemem_map have higher priority")
6981ec31146c ("page_alloc: introduce zone_movable_limit[] to keep
movable limit for nodes")
4d59a75125d5 ("x86: get pg_data_t's memory from other node")
Later we should have patches that will make sure kernel put page table
and vmemmap on local node ram instead of push them down to node0. Also
need to find way to put other kernel used ram to local node ram.
Reported-by: Tim Gardner <tim.gardner@canonical.com> Reported-by: Don Morris <don.morris@hp.com> Bisected-by: Don Morris <don.morris@hp.com> Tested-by: Don Morris <don.morris@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 2 Mar 2013 16:34:06 +0000 (08:34 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull signal/compat fixes from Al Viro:
"Fixes for several regressions introduced in the last signal.git pile,
along with fixing bugs in truncate and ftruncate compat (on just about
anything biarch at least one of those two had been done wrong)."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
compat: restore timerfd settime and gettime compat syscalls
[regression] braino in "sparc: convert to ksignal"
fix compat truncate/ftruncate
switch lseek to COMPAT_SYSCALL_DEFINE
lseek() and truncate() on sparc really need sign extension
Linus Torvalds [Sat, 2 Mar 2013 16:31:39 +0000 (08:31 -0800)]
Merge tag 'for_linux-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb
Pull KGDB/KDB fixes and cleanups from Jason Wessel:
"For a change we removed more code than we added. If people aren't
using it we shouldn't be carrying it. :-)
Cleanups:
- Remove kdb ssb command - there is no in kernel disassembler to
support it
- Remove kdb ll command - Always caused a kernel oops and there were
no bug reports so no one was using this command
- Use kernel ARRAY_SIZE macro instead of array computations
Fixes:
- Stop oops in kdb if user executes kdb_defcmd with args
- kdb help command truncated text
- ppc64 support for kgdbts
- Add missing kconfig option from original kdb port for dealing with
catastrophic kernel crashes such that you can reboot automatically
on continue from kdb"
* tag 'for_linux-3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
kdb: Remove unhandled ssb command
kdb: Prevent kernel oops with kdb_defcmd
kdb: Remove the ll command
kdb_main: fix help print
kdb: Fix overlap in buffers with strcpy
Fixed dead ifdef block by adding missing Kconfig option.
kdb: Setup basic kdb state before invoking commands via kgdb
kdb: use ARRAY_SIZE where possible
kgdb/kgdbts: support ppc64
kdb: A fix for kdb command table expansion
Linus Torvalds [Sat, 2 Mar 2013 15:58:56 +0000 (07:58 -0800)]
Merge tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull new ARC architecture from Vineet Gupta:
"Initial ARC Linux port with some fixes on top for 3.9-rc1:
I would like to introduce the Linux port to ARC Processors (from
Synopsys) for 3.9-rc1. The patch-set has been discussed on the public
lists since Nov and has received a fair bit of review, specially from
Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb...
The arch bits are in arch/arc, some asm-generic changes (acked by
Arnd), a minor change to PARISC (acked by Helge).
The series is a touch bigger for a new port for 2 main reasons:
1. It enables a basic kernel in first sub-series and adds
ptrace/kgdb/.. later
2. Some of the fallout of review (DeviceTree support, multi-platform-
image support) were added on top of orig series, primarily to
record the revision history.
This updated pull request additionally contains
- fixes due to our GNU tools catching up with the new syscall/ptrace
ABI
- some (minor) cross-arch Kconfig updates."
* tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (82 commits)
ARC: split elf.h into uapi and export it for userspace
ARC: Fixup the current ABI version
ARC: gdbserver using regset interface possibly broken
ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge window
ARC: make a copy of flat DT
ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed"
ARC: Ensure CONFIG_VIRT_TO_BUS is not enabled
ARC: Fix pt_orig_r8 access
ARC: [3.9] Fallout of hlist iterator update
ARC: 64bit RTSC timestamp hardware issue
ARC: Don't fiddle with non-existent caches
ARC: Add self to MAINTAINERS
ARC: Provide a default serial.h for uart drivers needing BASE_BAUD
ARC: [plat-arcfpga] defconfig for fully loaded ARC Linux
ARC: [Review] Multi-platform image #8: platform registers SMP callbacks
ARC: [Review] Multi-platform image #7: SMP common code to use callbacks
ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional
ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core
ARC: [Review] Multi-platform image #4: Isolate platform headers
ARC: [Review] Multi-platform image #3: switch to board callback
...
Linus Torvalds [Sat, 2 Mar 2013 15:44:16 +0000 (07:44 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
o Add basic support for the Mediatek/Ralink Wireless SoC family.
o The Qualcomm Atheros platform is extended by support for the new
QCA955X SoC series as well as a bunch of patches that get the code
ready for OF support.
o Lantiq and BCM47XX platform have a few improvements and bug fixes.
o MIPS has sent a few patches that get the kernel ready for the
upcoming microMIPS support.
o The rest of the series is made up of small bug fixes and cleanups
that relate to various parts of the MIPS code. The biggy in there is
a whitespace cleanup. After I was sent another set of whitespace
cleanup patches I decided it was the time to clean the whitespace
"issues" for once and and that touches many files below arch/mips/.
Fix up silly conflicts, mostly due to whitespace cleanups.
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (105 commits)
MIPS: Quit exporting kernel internel break codes to uapi/asm/break.h
MIPS: remove broken conditional inside vpe loader code
MIPS: SMTC: fix implicit declaration of set_vi_handler
MIPS: early_printk: drop __init annotations
MIPS: Probe for and report hardware virtualization support.
MIPS: ath79: add support for the Qualcomm Atheros AP136-010 board
MIPS: ath79: add USB controller registration code for the QCA955X SoCs
MIPS: ath79: add PCI controller registration code for the QCA955X SoCs
MIPS: ath79: add WMAC registration code for the QCA955X SoCs
MIPS: ath79: register UART for the QCA955X SoCs
MIPS: ath79: add QCA955X specific glue to ath79_device_reset_{set, clear}
MIPS: ath79: add GPIO setup code for the QCA955X SoCs
MIPS: ath79: add IRQ handling code for the QCA955X SoCs
MIPS: ath79: add clock setup code for the QCA955X SoCs
MIPS: ath79: add SoC detection code for the QCA955X SoCs
MIPS: ath79: add early printk support for the QCA955X SoCs
MIPS: ath79: fix WMAC IRQ resource assignment
mips: reserve elfcorehdr
mips: Make sure kernel memory is in iomem
MIPS: ath79: use dynamically allocated USB platform devices
...
Theodore Ts'o [Sat, 2 Mar 2013 15:27:46 +0000 (10:27 -0500)]
ext4: use percpu counter for extent cache count
Use a percpu counter rather than atomic types for shrinker accounting.
There's no need for ultimate accuracy in the shrinker, so this
should come a little more cheaply. The percpu struct is somewhat
large, but there was a big gap before the cache-aligned
s_es_lru_lock anyway, and it fits nicely in there.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jason Wessel [Mon, 4 Feb 2013 16:35:33 +0000 (10:35 -0600)]
kdb: Prevent kernel oops with kdb_defcmd
The kdb_defcmd can only be used to display the available command aliases
while using the kernel debug shell. If you try to define a new macro
while the kernel debugger is active it will oops. The debug shell
macros must use pre-allocated memory set aside at the time kdb_init()
is run, and the kdb_defcmd is restricted to only working at the time
that the kdb_init sequence is being run, which only occurs if you
actually activate the kernel debugger.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Jason Wessel [Mon, 4 Feb 2013 16:35:33 +0000 (10:35 -0600)]
kdb: Remove the ll command
Recently some code inspection was done after fixing a problem with
kmalloc used while in the kernel debugger context (which is not
legal), and it turned up the fact that kdb ll command will oops the
kernel.
Given that there have been zero bug reports on the command combined
with the fact it will oops the kernel it is clearly not being used.
Instead of fixing it, it will be removed.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Jason Wessel [Sun, 3 Feb 2013 15:32:28 +0000 (09:32 -0600)]
kdb: Fix overlap in buffers with strcpy
Maxime reported that strcpy(s->usage, s->usage+1) has no definitive
guarantee that it will work on all archs the same way when you have
overlapping memory. The fix is simple for the kdb code because we
still have the original string memory in the function scope, so we
just have to use that as the argument instead.
Reported-by: Maxime Villard <rustyBSD@gmx.fr> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Robert Obermeier [Sun, 16 Dec 2012 04:59:36 +0000 (05:59 +0100)]
Fixed dead ifdef block by adding missing Kconfig option.
Added missing Kconfig option KDB_CONTINUE_CATASTROPHIC which lead to a dead
ifdef block in kernel/debug/kdb/kdb_main.c:73-75.
The code using KDB_CONTINUE_CATASTROPHIC was originally introduced in
commit '5d5314d6795f3c1c0f415348ff8c51f7de042b77' by Jason Wessel.
This patchset ("kdb: core for kgdb back end (1 of 2)")
added platform independent part of kdb to the linux kernel.
The Kernel option however, even though it had the same options and
behaviour on all supported architectures, was part of the x86 and
ia64 patchset of KDB and therefore not pulled into the mainline kernel tree.
I actually took the originally written Kconfig by
Keith Owens <kaos@sgi.com> (2003-06-20 according to KDB changelog)
and changed it to reflect the correct behaviour,
as the KDUMP patchset is not part of the kernel and the expected
functionality is missing from it.
Signed-off-by: Robert Obermeier <obbi89@googlemail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Matt Klein [Wed, 2 Jan 2013 21:20:49 +0000 (13:20 -0800)]
kdb: Setup basic kdb state before invoking commands via kgdb
Although invasive kdb commands are not supported via kgdb, some useful
non-invasive commands like bt* require basic kdb state to be setup before
calling into the kdb code. Factor out some of this code and call it before
and after executing kdb commands via kgdb.
Signed-off-by: Matt Klein <mklein@twitter.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Tiejun Chen [Wed, 27 Feb 2013 03:09:27 +0000 (11:09 +0800)]
kgdb/kgdbts: support ppc64
We can't look up the address of the entry point of the function simply
via that function symbol for all architectures.
For PPC64 ABI, actually there is a function descriptors structure.
A function descriptor is a three doubleword data structure that contains
the following values:
* The first doubleword contains the address of the entry point of
the function.
* The second doubleword contains the TOC base address for
the function.
* The third doubleword contains the environment pointer for
languages such as Pascal and PL/1.
So we should call a wapperred dereference_function_descriptor() to get
the address of the entry point of the function.
Note this is also safe for other architecture after refer to
"include/asm-generic/sections.h" since:
dereference_function_descriptor(p) always is (p) if without arched definition.
Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
John Blackwood [Mon, 10 Dec 2012 21:37:22 +0000 (15:37 -0600)]
kdb: A fix for kdb command table expansion
When locally adding in some additional kdb commands, I stumbled
across an issue with the dynamic expansion of the kdb command table.
When the number of kdb commands exceeds the size of the statically
allocated kdb_base_commands[] array, additional space is allocated in
the kdb_register_repeat() routine.
The unused portion of the newly allocated array was not being initialized
to zero properly and this would result in segfaults when help '?' was
executed or when a search for a non-existing command would traverse the
command table beyond the end of valid command entries and then attempt
to use the non-zeroed area as actual command entries.
Signed-off-by: John Blackwood <john.blackwood@ccur.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Heiko Carstens [Sat, 2 Mar 2013 11:26:30 +0000 (12:26 +0100)]
compat: restore timerfd settime and gettime compat syscalls
Both compat syscalls got lost with 9d94b9e2 "switch timerfd compat syscalls
to COMPAT_SYSCALL_DEFINE" because of a typo:
COMPAT instead of CONFIG_COMPAT.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Joe Thornber [Fri, 1 Mar 2013 22:45:51 +0000 (22:45 +0000)]
dm cache: add mq policy
A cache policy that uses a multiqueue ordered by recent hit
count to select which blocks should be promoted and demoted.
This is meant to be a general purpose policy. It prioritises
reads over writes.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Fri, 1 Mar 2013 22:45:51 +0000 (22:45 +0000)]
dm: add cache target
Add a target that allows a fast device such as an SSD to be used as a
cache for a slower device such as a disk.
A plug-in architecture was chosen so that the decisions about which data
to migrate and when are delegated to interchangeable tunable policy
modules. The first general purpose module we have developed, called
"mq" (multiqueue), follows in the next patch. Other modules are
under development.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Fri, 1 Mar 2013 22:45:50 +0000 (22:45 +0000)]
dm thin: remove cells from stack
This patch takes advantage of the new bio-prison interface where the
memory is now passed in rather than using a mempool in bio-prison.
This allows the map function to avoid performing potentially-blocking
allocations that could lead to deadlocks: We want to avoid the cell
allocation that is done in bio_detain.
(The potential for mempool deadlocks still remains in other functions
that use bio_detain.)
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Fri, 1 Mar 2013 22:45:50 +0000 (22:45 +0000)]
dm bio prison: pass cell memory in
Change the dm_bio_prison interface so that instead of allocating memory
internally, dm_bio_detain is supplied with a pre-allocated cell each
time it is called.
This enables a subsequent patch to move the allocation of the struct
dm_bio_prison_cell outside the thin target's mapping function so it can
no longer block there.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
If an instance of a target sets this, it will be queried before the
target's mapping function is called on a write bio, and the response
controls the number of copies of the write bio that the target will
receive.
This provides a convenient way for a target to send the same data to
more than one device. The new cache target uses this in writethrough
mode, to send the data both to the cache and the backing device.
Mikulas Patocka [Fri, 1 Mar 2013 22:45:49 +0000 (22:45 +0000)]
dm kcopyd: introduce configurable throttling
This patch allows the administrator to reduce the rate at which kcopyd
issues I/O.
Each module that uses kcopyd acquires a throttle parameter that can be
set in /sys/module/*/parameters.
We maintain a history of kcopyd usage by each module in the variables
io_period and total_period in struct dm_kcopyd_throttle. The actual
kcopyd activity is calculated as a percentage of time equal to
"(100 * io_period / total_period)". This is compared with the user-defined
throttle percentage threshold and if it is exceeded, we sleep.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Fri, 1 Mar 2013 22:45:49 +0000 (22:45 +0000)]
dm ioctl: allow message to return data
This patch introduces enhanced message support that allows the
device-mapper core to recognise messages that are common to all devices,
and for messages to return data to userspace.
Core messages are processed by the function "message_for_md". If the
device mapper doesn't support the message, it is passed to the target
driver.
If the message returns data, the kernel sets the flag
DM_MESSAGE_OUT_FLAG.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Fri, 1 Mar 2013 22:45:49 +0000 (22:45 +0000)]
dm ioctl: optimize functions without variable params
Device-mapper ioctls receive and send data in a buffer supplied
by userspace. The buffer has two parts. The first part contains
a 'struct dm_ioctl' and has a fixed size. The second part depends
on the ioctl and has a variable size.
This patch recognises the specific ioctls that do not use the variable
part of the buffer and skips allocating memory for it.
In particular, when a device is suspended and a resume ioctl is sent,
this now avoid memory allocation completely.
The variable "struct dm_ioctl tmp" is moved from the function
copy_params to its caller ctl_ioctl and renamed to param_kernel.
It is used directly when the ioctl function doesn't need any arguments.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Fri, 1 Mar 2013 22:45:48 +0000 (22:45 +0000)]
dm ioctl: introduce ioctl_flags
This patch introduces flags for each ioctl function.
So far, one flag is defined, IOCTL_FLAGS_NO_PARAMS. It is set if the
function processing the ioctl doesn't take or produce any parameters in
the section of the data buffer that has a variable size.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Jun'ichi Nomura [Fri, 1 Mar 2013 22:45:48 +0000 (22:45 +0000)]
dm: merge io_pool and tio_pool
This patch merges io_pool and tio_pool into io_pool and cleans up
related functions.
Though device-mapper used to have 2 pools of objects for each dm device,
the use of bioset frontbad for per-bio data has shrunk the number of
pools to 1 for both bio-based and request-based device types.
(See c0820cf5 "dm: introduce per_bio_data" and 94818742 "dm: Use bioset's front_pad for dm_rq_clone_bio_info")
So dm no longer has to maintain 2 different pointers.
No functional changes.
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Christie [Fri, 1 Mar 2013 22:45:48 +0000 (22:45 +0000)]
dm: fix limits initialization when there are no data devices
dm_calculate_queue_limits will first reset the provided limits to
defaults using blk_set_stacking_limits; whereby defeating the purpose of
retaining the original live table's limits -- as was intended via commit 3ae706561637331aa578e52bb89ecbba5edcb7a9 ("dm: retain table limits when
swapping to new table with no devices").
Fix this improper limits initialization (in the no data devices case) by
avoiding the call to dm_calculate_queue_limits.
[patch header revised by Mike Snitzer]
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.6+ Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Rename functions involved in splitting and cloning bios.
The sequence of functions is now:
(1) __split_and_process* - entry point that selects the processing strategy
(2) __send* - prepare the details for each bio needed and loop through them
(3) __clone_and_map* - creates a clone and maps it
Use 'bio' in the name of variables and functions that deal with
bios rather than 'request' to avoid confusion with the normal
block layer use of 'request'.
Remove the no-longer-used struct bio_set argument from clone_bio and split_bvec.
Use tio->ti in __map_bio() instead of passing in ti.
Factor out some code for setting up cloned bios.
Take target_request_nr as a parameter to alloc_tio().
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Kees Cook [Fri, 1 Mar 2013 22:45:46 +0000 (22:45 +0000)]
dm persistent data: remove CONFIG_EXPERIMENTAL
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Fri, 1 Mar 2013 22:45:45 +0000 (22:45 +0000)]
dm thin: use block_size_is_power_of_two
Use block_size_is_power_of_two() rather than checking
sectors_per_block_shift directly. Also introduce local pool variable in
get_bio_block() to eliminate redundant tc->pool dereferences.
No functional change.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Fri, 1 Mar 2013 22:45:44 +0000 (22:45 +0000)]
dm: fix truncated status strings
Avoid returning a truncated table or status string instead of setting
the DM_BUFFER_FULL_FLAG when the last target of a table fills the
buffer.
When processing a table or status request, the function retrieve_status
calls ti->type->status. If ti->type->status returns non-zero,
retrieve_status assumes that the buffer overflowed and sets
DM_BUFFER_FULL_FLAG.
However, targets don't return non-zero values from their status method
on overflow. Most targets returns always zero.
If a buffer overflow happens in a target that is not the last in the
table, it gets noticed during the next iteration of the loop in
retrieve_status; but if a buffer overflow happens in the last target, it
goes unnoticed and erroneously truncated data is returned.
In the current code, the targets behave in the following way:
* dm-crypt returns -ENOMEM if there is not enough space to store the
key, but it returns 0 on all other overflows.
* dm-thin returns errors from the status method if a disk error happened.
This is incorrect because retrieve_status doesn't check the error
code, it assumes that all non-zero values mean buffer overflow.
* all the other targets always return 0.
This patch changes the ti->type->status function to return void (because
most targets don't use the return code). Overflow is detected in
retrieve_status: if the status method fills up the remaining space
completely, it is assumed that buffer overflow happened.
The regression was introduced by the change c0820cf5 "dm: introduce per_bio_data", where dm started to replace
bioset during table replacement.
For bio-based dm, it is good because clone bios do not exist during the
table replacement.
For request-based dm, however, (not-yet-mapped) clone bios may stay in
request queue and survive during the table replacement.
So freeing the old bioset could cause the oops in bio_put().
Since the size of front_pad may change only with bio-based dm,
it is not necessary to replace bioset for request-based dm.
Reported-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Randy Dunlap [Fri, 1 Mar 2013 19:11:47 +0000 (11:11 -0800)]
hsi: fix kernel-doc warnings
Fix kernel-doc warnings in hsi files:
Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'e_handler' description in 'hsi_client'
Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'pclaimed' description in 'hsi_client'
Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'nb' description in 'hsi_client'
Warning(drivers/hsi/hsi.c:434): No description found for parameter 'handler'
Warning(drivers/hsi/hsi.c:434): Excess function parameter 'cb' description in 'hsi_register_port_event'
Don't document "private:" fields with kernel-doc notation.
If you want to leave them fully documented, that's OK, but
then don't mark them as "private:".
Linus Torvalds [Fri, 1 Mar 2013 20:05:13 +0000 (12:05 -0800)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
"Four cifs fixes (including for kernel bug #53221 and samba bug #9519)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: bugfix for unreclaimed writeback pages in cifs_writev_requeue()
cifs: set MAY_SIGN when sec=krb5
POSIX extensions disabled on client due to illegal O_EXCL flag sent to Samba
cifs: ensure that cifs_get_root() only traverses directories
Tim Gardner [Fri, 1 Mar 2013 11:46:48 +0000 (19:46 +0800)]
autofs4 - autofs4_catatonic_mode(): remove redundant null check on kfree()
smatch analysis:
fs/autofs4/waitq.c:46 autofs4_catatonic_mode() info: redundant null check on wq->name.name calling kfree()
Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Ian Kent <raven@themaw.net> Cc: autofs@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Huewe [Fri, 1 Mar 2013 11:46:40 +0000 (19:46 +0800)]
autofs - Fix sparse warning: context imbalance in autofs4_d_automount() different lock contexts for basic block
Sparse complains:
fs/autofs4/root.c:409:9: sparse: context imbalance in 'autofs4_d_automount' - different lock contexts for basic block
This was introduced by commit f55fb0c24386 ("autofs4 - dont clear
DCACHE_NEED_AUTOMOUNT on rootless mount")
The function autofs4_d_automount can be left with the (&sbi->fs_lock)
held if sbi->version <= 4 and simple_empty(dentry) == false so the
warning seems valid.
--> Add an spin_unlock in this case before we jump to done
Unfortunately compile tested only.
Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Gortmaker [Thu, 14 Feb 2013 20:50:15 +0000 (13:50 -0700)]
btrfs: fixup/remove module.h usage as required
We want to avoid module.h where posible, since it in turn includes
nearly all of header space. This means removing it where it is not
required, and using export.h where we are only exporting symbols via
EXPORT_SYMBOL and friends.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Josef Bacik [Fri, 1 Mar 2013 16:47:21 +0000 (11:47 -0500)]
Btrfs: delete inline extents when we find them during logging
Apparently when we do inline extents we allow the data to overlap the last chunk
of the btrfs_file_extent_item, which means that we can possibly have a
btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item.
This messes with us when we try to overwrite the extent when logging new extents
since we expect for it to be the right size. To fix this just delete the item
and try to do the insert again which will give us the proper sized
btrfs_file_extent_item. This fixes a panic where map_private_extent_buffer
would blow up because we're trying to write past the end of the leaf. Thanks,
Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Steven Noonan [Fri, 1 Mar 2013 13:14:59 +0000 (05:14 -0800)]
xenbus: fix compile failure on ARM with Xen enabled
Adding an include of linux/mm.h resolves this:
drivers/xen/xenbus/xenbus_client.c: In function ‘xenbus_map_ring_valloc_hvm’:
drivers/xen/xenbus/xenbus_client.c:532:66: error: implicit declaration of function ‘page_to_section’ [-Werror=implicit-function-declaration]
CC: stable@vger.kernel.org Signed-off-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
we would call the PHYSDEVOP_map_pirq 'nvec' times with the same
contents of the PCI device. Sander discovered that we would get
the same PIRQ value 'nvec' times and return said values to the
caller. That of course meant that the device was configured only
with one MSI and AHCI would fail with:
ahci 0000:00:11.0: version 3.0
xen: registering gsi 19 triggering 0 polarity 1
xen: --> pirq=19 -> irq=19 (gsi=19)
(XEN) [2013-02-27 19:43:07] IOAPIC[0]: Set PCI routing entry (6-19 -> 0x99 -> IRQ 19 Mode:1 Active:1)
ahci 0000:00:11.0: AHCI 0001.0200 32 slots 4 ports 6 Gbps 0xf impl SATA mode
ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
ahci: probe of 0000:00:11.0 failed with error -22
That is b/c in ahci_host_activate the second call to
devm_request_threaded_irq would return -EINVAL as we passed in
(on the second run) an IRQ that was never initialized.
David Sterba [Fri, 1 Mar 2013 15:03:00 +0000 (15:03 +0000)]
btrfs: try harder to allocate raid56 stripe cache
The stripe hash table is large, starting with allocation order 4 and can go as
high as order 7 in case lock debugging is turned on and structure padding
happens.
Wang Shilong [Fri, 1 Mar 2013 11:33:01 +0000 (11:33 +0000)]
Btrfs: don't call btrfs_qgroup_free if just btrfs_qgroup_reserve fails
commit eb6b88d92c6df083dd09a8c471011e3788dfd7c6 leads into another bug.
If it is just because qgroup_reserve fails, the function btrfs_qgroup_free
should not be called, otherwise, it will cause the wrong quota accounting.
Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
You can also use the commands:
btrfs-debug-tree <disk> | grep QGROUP
You will find there are still items existed.The reasons why this happens
is because the original code just checks slots[0]==0 and returns.
We try to fix it by deleting the leaf one by one.
Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Fabio Porcedda [Thu, 14 Feb 2013 08:14:25 +0000 (09:14 +0100)]
watchdog: add timeout-sec property binding
this patchset add the timeout-sec property to the following drivers:
orion_wdt, pnx4008_wdt, s3c2410_wdt and at91sam9_wdt.
The at91sam9_wdt is tested on evk-pr3,
the other drivers are compile tested only.
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Wolfram Sang <w.sang@pengutronix.de> Cc: Masanari Iida <standby24x7@gmail.com> Cc: Ben Dooks <ben-linux@fluff.org> Cc: Kukjin Kim <kgene.kim@samsung.com> Cc: Andrew Victor <linux@maxim.org.za> Cc: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Fabio Porcedda [Tue, 8 Jan 2013 10:04:10 +0000 (11:04 +0100)]
watchdog: core: dt: add support for the timeout-sec dt property
Add support for watchdog drivers to initialize/set the timeout field
of the watchdog_device structure. The timeout field is initialised
either with the module timeout parameter value (if valid) or with the
timeout-sec dt property (if valid). If both are invalid the initial
value is unchanged.
Signed-off-by: Fabio Porcedda <fabio.porcedda@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Hauke Mehrtens [Sat, 12 Jan 2013 17:14:11 +0000 (18:14 +0100)]
watchdog: bcm47xx_wdt.c: add hard timer
The more recent devices have a watchdog timer which could be configured
for over 2 hours and not just 2 seconds like the first generation
devices. For those devices do not use the extra software timer, but
directly program the time into the register. This will automatically be
used if the timer supports more than a minute.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Hauke Mehrtens [Sat, 12 Jan 2013 17:14:09 +0000 (18:14 +0100)]
watchdog: bcm47xx_wdt.c: rename ops methods
Rename the methods registered to struct watchdog_ops bcm47xx_wdt_ops in
order to add an other struct watchdog_ops using different ops in the
next patch.
Also rename WDT_MAX_TIME to WDT_SOFTTIMER_MAX.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Hauke Mehrtens [Thu, 24 Jan 2013 17:13:34 +0000 (18:13 +0100)]
watchdog: bcm47xx_wdt.c: use platform device
Instead of accessing the function to set the watchdog timer directly,
register a platform driver the platform could register to use this
watchdog driver.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Wolfram Sang [Tue, 27 Sep 2011 20:35:40 +0000 (22:35 +0200)]
watchdog: add new driver for STMP3xxx and i.MX23/28
Replace the existing STMP3xxx driver because it has enough drawbacks
that a rewrite is apropriate. The new driver is designed to use the
watchdog framework which makes it a lot smaller and avoids open coding
the watchdog API again. It also uses now an explicitly exported function
from the RTC driver to set up its registers (the old driver silently
reused the hopefully(!) already remapped RTC registers). Also, this
driver is mach independent, while the old one depends on a mach replaced
by another one a year ago. Since the user interface is still the
standard watchdog API, users don't need to adapt.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Wolfram Sang [Tue, 27 Sep 2011 20:21:37 +0000 (22:21 +0200)]
rtc: stmp3xxx: add wdt-accessor function
This RTC also includes a watchdog timer. Provide an accessor function
for setting the watchdog timeout value which will be picked up by a
watchdog driver. Also register the platform_device for the watchdog here
to get the boot-time dependencies right.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Aaro Koskinen [Thu, 27 Dec 2012 20:58:29 +0000 (22:58 +0200)]
watchdog: introduce retu_wdt driver
Introduce Retu watchdog driver.
Cc: linux-watchdog@vger.kernel.org Acked-by: Felipe Balbi <balbi@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Kernel symbol X86_MRST has been removed from the kernel.
INTEL_SCU_WATCHDOG driver can never be compiled due dependence of X86_MRST
which remained in the drivers/watchdog/Kconfig.
Reported-by: Alexander Shiyan <shc_work@mail.ru> Cc: Alan Cox <alan@linux.intel.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Gabor Juhos [Thu, 27 Dec 2012 14:38:26 +0000 (15:38 +0100)]
watchdog: ath79_wdt: get register base from platform device's resources
The ath79_wdt driver uses a fixed memory address
currently. Although this is working with each
currently supported SoCs, but this may change
in the future. Additionally, the driver includes
platform specific header files in order to be
able to get the memory base of the watchdog
device.
The patch adds a memory resource to the platform
device, and converts the driver to get the base
address of the watchdog device from that.
Signed-off-by: Gabor Juhos <juhosg@openwrt.org> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Takahisa Tanaka [Mon, 14 Jan 2013 02:01:58 +0000 (11:01 +0900)]
watchdog: sp5100_tco: Write back the original value to reserved bits, instead of zero
In case of SP5100 or SB7x0 chipsets, the sp5100_tco module writes zero to
reserved bits. The module, however, shouldn't depend on specific default
value, and should perform a read-merge-write operation for the reserved
bits.
This patch makes the sp5100_tco module perform a read-merge-write operation
on all the chipset (sp5100, sb7x0, sb8x0 or later).
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43176 Signed-off-by: Takahisa Tanaka <mc74hc00@gmail.com> Tested-by: Paul Menzel <paulepanter@users.sourceforge.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Cc: stable <stable@vger.kernel.org>
Takahisa Tanaka [Mon, 14 Jan 2013 02:01:57 +0000 (11:01 +0900)]
watchdog: sp5100_tco: Fix wrong indirect I/O access for getting value of reserved bits
In case of SB800 or later chipset and re-programming MMIO address(*),
sp5100_tco module may read incorrect value of reserved bit, because the module
reads a value from an incorrect I/O address. However, this bug doesn't cause
a problem, because when re-programming MMIO address, by chance the module
writes zero (this is BIOS's default value) to the low three bits of register.
* In most cases, PC with SB8x0 or later chipset doesn't need to re-programming
MMIO address, because such PC can enable AcpiMmio and can use 0xfed80b00 for
watchdog register base address.
This patch fixes this bug.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=43176 Signed-off-by: Takahisa Tanaka <mc74hc00@gmail.com> Tested-by: Paul Menzel <paulepanter@users.sourceforge.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Cc: stable <stable@vger.kernel.org>
Arnd Bergmann [Fri, 25 Jan 2013 14:14:27 +0000 (14:14 +0000)]
watchdog: at91sam9: at91_wdt_dt_ids cannot be __init
The device IDs are referenced by the driver and potentially
used beyond the init time, as kbuild correctly warns
about. Remove the __initconst annotation.
Without this patch, building at91_dt_defconfig results in:
WARNING: drivers/watchdog/built-in.o(.data+0x28): Section mismatch in reference from the variable at91wdt_driver to the (unknown reference) .init.rodata:(unknown)
The variable at91wdt_driver references
the (unknown reference) __initconst (unknown)
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Tested-by: Fabio Porcedda <fabio.porcedda@gmail.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Cc: linux-watchdog@vger.kernel.org
Randy Dunlap [Mon, 28 Jan 2013 16:29:48 +0000 (08:29 -0800)]
watchdog: da9055_wdt needs to select WATCHDOG_CORE
DA9055_WATCHDOG (introduced in v3.8) needs to select WATCHDOG_CORE so that it will
build cleanly. Fixes these build errors:
da9055_wdt.c:(.text+0xe9bc7): undefined reference to `watchdog_unregister_device'
da9055_wdt.c:(.text+0xe9f4b): undefined reference to `watchdog_register_device'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: David Dajun Chen <dchen@diasemi.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be> Cc: linux-watchdog@vger.kernel.org Cc: stable <stable@vger.kernel.org>