Ed Cashin [Fri, 5 Oct 2012 00:16:29 +0000 (17:16 -0700)]
aoe: failover remote interface based on aoe_deadsecs parameter
The aoe_deadsecs module parameter allows the user to specify a hard limit
on the number of seconds an AoE command can be retransmitted before the
AoE block device is considered to have failed.
Using aoe_deadsecs to determine the time we try using a different remote
interface helps to ensure that the hard limit is not reached before we've
tried to recover by sending to a different remote port.
As a data storage target, the AoE target is unambiguously identified by
its {major, minor} AoE address tuple, and an AoE target can have multiple
MAC addresses. However, note that "target" in the driver code and
comments means a {major, minor, MAC address} tuple, as in "somewhere to
send packets".
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Fri, 5 Oct 2012 00:16:27 +0000 (17:16 -0700)]
aoe: use packets that work with the smallest-MTU local interface
Users with several network interfaces dedicated to AoE generally do not
configure them to support different-sized AoE data payloads on purpose.
For a given AoE target, there will be a set of local network interfaces
that can reach it. Using only the payload that will fit in the
smallest-sized MTU of all those local interfaces greatly simplifies the
driver, especially in failure scenarios.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Fri, 5 Oct 2012 00:16:25 +0000 (17:16 -0700)]
aoe: use a kernel thread for transmissions
The dev_queue_xmit function needs to have interrupts enabled, so the most
simple way to get the locking right but still fulfill that requirement is
to use a process that can call dev_queue_xmit serially over queued
transmissions.
Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Fri, 5 Oct 2012 00:16:23 +0000 (17:16 -0700)]
aoe: become I/O request queue handler for increased user control
To allow users to choose an elevator algorithm for their particular
workloads, change from a make_request-style driver to an
I/O-request-queue-handler-style driver.
We have to do a couple of things that might be surprising. We manipulate
the page _count directly on the assumption that we still have no guarantee
that users of the block layer are prohibited from submitting bios
containing pages with zero reference counts.[1] If such a prohibition now
exists, I can get rid of the _count manipulation.
Just as before this patch, we still keep track of the sk_buffs that the
network layer still hasn't finished yet and cap the resources we use with
a "pool" of skbs.[2]
Now that the block layer maintains the disk stats, the aoe driver's
diskstats function can go away.
Ed Cashin [Fri, 5 Oct 2012 00:16:21 +0000 (17:16 -0700)]
aoe: kernel thread handles I/O completions for simple locking
Make the frames the aoe driver uses to track the relationship between bios
and packets more flexible and detached, so that they can be passed to an
"aoe_ktio" thread for completion of I/O.
The frames are handled much like skbs, with a capped amount of
preallocation so that real-world use cases are likely to run smoothly and
degenerate gracefully even under memory pressure.
Decoupling I/O completion from the receive path and serializing it in a
process makes it easier to think about the correctness of the locking in
the driver, especially in the case of a remote MAC address becoming
unusable.
[dan.carpenter@oracle.com: cleanup an allocation a bit] Signed-off-by: Ed Cashin <ecashin@coraid.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ed Cashin [Fri, 5 Oct 2012 00:16:20 +0000 (17:16 -0700)]
aoe: for performance support larger packet payloads
tAdd adds the ability to work with large packets composed of a number of
segments, using the scatter gather feature of the block layer (biovecs)
and the network layer (skb frag array). The motivation is the performance
gained by using a packet data payload greater than a page size and by
using the network card's scatter gather feature.
Users of the out-of-tree aoe driver already had these changes, but since
early 2011, they have complained of increased memory utilization and
higher CPU utilization during heavy writes.[1] The commit below appears
related, as it disables scatter gather on non-IP protocols inside the
harmonize_features function, even when the NIC supports sg.
net offloading: Generalize netif_get_vlan_features().
With that regression in place, transmits always linearize sg AoE packets,
but in-kernel users did not have this patch. Before 2.6.38, though, these
changes were working to allow sg to increase performance.
Paul Clements [Fri, 5 Oct 2012 00:16:18 +0000 (17:16 -0700)]
nbd: handle discard requests
Add discard support to nbd. If the nbd-server supports discard, it will
send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag
in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards
for the device (QUEUE_FLAG_DISCARD).
If discard support is enabled, then when the nbd client system receives a
discard request, this will be passed along to the nbd-server. When the
discard request is received by the nbd-server, it will perform:
fallocate(.. FALLOC_FL_PUNCH_HOLE ..)
To punch a hole in the backend storage, which is no longer needed.
Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Clements [Fri, 5 Oct 2012 00:16:15 +0000 (17:16 -0700)]
nbd: add set flags ioctl
Add a set-flags ioctl, allowing various option flags to be set on an nbd
device. This allows the nbd-client to set the device flags (to enable
read-only mode, or enable discard support, etc.).
Flags are typically specified by the nbd-server. During the negotiation
phase of the nbd connection, the server sends its flags to the client.
The client then uses NBD_SET_FLAGS to inform the kernel of the options.
Also included is a one-line fix to debug output for the set-timeout ioctl.
Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace the single global destination ID counter with per-net allocation
mechanism to allow independent destID management for each available
RapidIO network. Using bitmap based mechanism instead of counters allows
destination ID release and reuse in systems that support hot-swap.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio/rionet: rework to support multiple RIO master ports
Make RIONET driver multi-net safe/capable by introducing per-net lists of
RapidIO network peers. Rework registration of network adapters to support
all available RIO master port devices.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Modify mport initialization routine to run the RapidIO discovery process
asynchronously. This allows to have an arbitrary order of enumerating and
discovering ports in systems with multiple RapidIO controllers without
creating a deadlock situation if enumerator port is registered after a
discovering one.
Making netID matching to mportID ensures consistent net ID assignment in
multiport RapidIO systems with asynchronous discovery process (global
counter implementation is affected by race between threads).
[akpm@linux-foundation.org: tweak code layput] Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio: use device lists handling on per-net basis
Modify handling of device lists to resolve issues caused by using single
global list of RIO devices during enumeration/discovery. The most common
sign of existing issue is incorrect contents of switch routing tables in
systems with multiple mport controllers while single-port configuration
performs as expected.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following set of patches provides modifications targeting support of
multiple RapidIO master port (mport) devices on a CPU-side of
RapidIO-capable board. While the RapidIO subsystem code has definitions
suitable for multi-controller/multi-net support, the existing
implementation cannot be considered ready for multiple mport
configurations.
=========== NOTES: =============
a) The patches below do not address RapidIO side view of multiport
processing elements defined in Part 6 of RapidIO spec Rev.2.1 (section
6.4.1). These devices have Base Device ID CSR (0x60) and Component Tag
CSR (0x6C) shared by all SRIO ports. For example, Freescale's P4080,
P3041 and P5020 have a dual-port SRIO controller implemented according
the specification. Enumeration/discovery of such devices from RapidIO
side may require device-specific fixups.
b) Devices referenced above may also require implementation specific
code to setup a host device ID for mport device. These operations are
not addressed by patches in this package.
=================================
Details about provided patches:
1. Fix blocking wait for discovery ready
While it does not happen on PowerPC based platforms, there is
possibility of stalled CPU warning dump on x86 based platforms that run
RapidIO discovery process if they wait too long for being enumerated.
Currently users can avoid it by disabling the soft-lockup detector
using "nosoftlockup" kernel parameter OR by ensuring that enumeration
is completed before soft-lockup is detected.
This patch eliminates blocking wait and keeps a scheduler running.
It also is required for patch 3 below which introduces asynchronous
discovery process.
2. Use device lists handling on per-net basis
This patch allows to correctly support multiple RapidIO nets and
resolves possible issues caused by using single global list of devices
during RapidIO system enumeration/discovery. The most common sign of
existing issue is incorrect contents of switch routing tables in
systems with multiple mport controllers while single-port configuration
performs as expected.
The patch does not eliminate the global RapidIO device list but
changes some routines in enumeration/discovery to use per-net device
lists instead. This way compatibility with upper layer RIO routines is
preserved.
3. Run discovery as an asynchronous process
This patch modifies RapidIO initialization routine to asynchronously
run the discovery process for each corresponding mport. This allows
having an arbitrary order of enumerating and discovering mports without
creating a deadlock situation if an enumerator port was registered
after a discovering one.
On boards with multiple discovering mports it also eliminates order
dependency between mports and may reduce total time of RapidIO
subsystem initialization.
Making netID matching to mportID ensures consistent netID assignment
in multiport RapidIO systems with asynchronous discovery process
(global counter implementation is affected by race between threads).
4. Rework RIONET to support multiple RIO master ports
In the current version of the driver rionet_probe() has comment "XXX
Make multi-net safe". Now it is a good time to address this comment.
This patch makes RIONET driver multi-net safe/capable by introducing
per-net lists of RapidIO network peers. It also enables to register
network adapters for all available mport devices.
5. Add destination ID allocation mechanism
The patch replaces a single global destination ID counter with
per-net allocation mechanism to allow independent destID management for
each available RapidIO network. Using bitmap based mechanism instead
of counters allows destination ID release and reuse in systems that
support hot-swap.
This patch:
Fix blocking wait loop in the RapidIO discovery routine to avoid warning
dumps about stalled CPU on x86 platforms.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio: apply RX/TX enable to active switch ports only
Apply port RX/TX enable operations only to active switch ports.
RapidIO specification (Part 6: LP-Serial Physical Layer) recommends to
keep Output Port Enable (TX) and Input Port Enable (RX) control bits in
disabled state (0b0) after device reset. It also allows to have
implementation specific reset state for these bits.
This patch ensures that TX/RX enable action is applied only to active
switch's ports while preserving an initial state of inactive ones.
This patch is intended to keep inactive switch ports with inbound and
outbound packet transfers disabled to block unexpected packets during hot
insertion event. While it does not fix any visible malfunction it is
intended to prevent such events in future.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add Tsi721 routines to support RapidIO subsystem's inbound memory mapping
interface (RapidIO to system's local memory).
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add common inbound memory mapping/unmapping interface. This allows to make
local memory space accessible from the RapidIO side using hardware mapping
capabilities of RapidIO bridging devices. The new interface is intended to
enable data transfers between RapidIO devices in combination with DMA engine
support.
This patch is based on patch submitted by Li Yang <leoli@freescale.com>
(https://lists.ozlabs.org/pipermail/linuxppc-dev/2009-April/071210.html)
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio: fix kerneldoc warnings after DMA support was added
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Reported-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Modify RapidIO mport device name assignment to include device name of PCIe
side of Tsi721 bridge. The new name format is intended to provide
definitive reference between RapidIO and PCIe sides of the bridge in
systems with multiple Tsi721 bridges.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix multicast packet transmit logic to account for repetitive transmission
of single skb:
- correct check for available buffers (this bug may produce NULL pointer
crash dump in case of heavy traffic);
- update skb user count (incorrect user counter causes a warning dump from
net_tx_action routine during multicast transfers in systems with three or
more rionet participants).
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: David S. Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
yan [Fri, 5 Oct 2012 00:15:41 +0000 (17:15 -0700)]
proc: no need to initialize proc_inode->fd in proc_get_inode()
proc_get_inode() obtains the inode via a call to iget_locked().
iget_locked() calls alloc_inode() which will call proc_alloc_inode() which
clears proc_inode.fd, so there is no need to clear this field in
proc_get_inode().
If iget_locked() instead found the inode via find_inode_fast(), that inode
will not have I_NEW set so this change has no effect.
Signed-off-by: yan <clouds.yan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
yan [Fri, 5 Oct 2012 00:15:38 +0000 (17:15 -0700)]
proc: return -ENOMEM when inode allocation failed
If proc_get_inode() returns NULL then presumably it encountered memory
exhaustion. proc_lookup_de() should return -ENOMEM in this case, not
-EINVAL.
Signed-off-by: yan <clouds.yan@gmail.com> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Fri, 5 Oct 2012 00:15:36 +0000 (17:15 -0700)]
coredump: extend core dump note section to contain file names of mapped files
This note has the following format:
long count -- how many files are mapped
long page_size -- units for file_ofs
array of [COUNT] elements of
long start
long end
long file_ofs
followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Fri, 5 Oct 2012 00:15:35 +0000 (17:15 -0700)]
coredump: add a new elf note with siginfo of the signal
Existing PRSTATUS note contains only si_signo, si_code, si_errno fields
from the siginfo of the signal which caused core to be dumped.
There are tools which try to analyze crashes for possible security
implications, and they want to use, among other data, si_addr field from
the SIGSEGV.
This patch adds a new elf note, NT_SIGINFO, which contains the complete
siginfo_t of the signal which killed the process.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Fri, 5 Oct 2012 00:15:31 +0000 (17:15 -0700)]
compat: move compat_siginfo_t definition to asm/compat.h
This is a preparatory patch for the introduction of NT_SIGINFO elf note.
Make the location of compat_siginfo_t uniform across eight architectures
which have it. Now it can be pulled in by including asm/compat.h or
linux/compat.h.
Most of the copies are verbatim. compat_uid[32]_t had to be replaced by
__compat_uid[32]_t. compat_uptr_t had to be moved up before
compat_siginfo_t in asm/compat.h on a several architectures (tile already
had it moved up). compat_sigval_t had to be relocated from linux/compat.h
to asm/compat.h.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denys Vlasenko [Fri, 5 Oct 2012 00:15:29 +0000 (17:15 -0700)]
coredump: pass siginfo_t* to do_coredump() and below, not merely signr
This is a preparatory patch for the introduction of NT_SIGINFO elf note.
With this patch we pass "siginfo_t *siginfo" instead of "int signr" to
do_coredump() and put it into coredump_params. It will be used by the
next patch. Most changes are simple s/signr/siginfo->si_signo/.
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Amerigo Wang <amwang@redhat.com> Cc: "Jonathan M. Foote" <jmfoote@cert.org> Cc: Roland McGrath <roland@hack.frob.com> Cc: Pedro Alves <palves@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Oleg Nesterov [Fri, 5 Oct 2012 00:15:25 +0000 (17:15 -0700)]
coredump: add support for %d=__get_dumpable() in core name
Some coredump handlers want to create a core file in a way compatible with
standard behavior. Standard behavior with fs.suid_dumpable = 2 is to
create core file with uid=gid=0. However, there was no way for coredump
handler to know that the process being dumped was suid'ed.
This patch adds the new %d specifier for format_corename() which simply
reports __get_dumpable(mm->flags), this is compatible with
/proc/sys/fs/suid_dumpable we already have.
Alex Kelly [Fri, 5 Oct 2012 00:15:24 +0000 (17:15 -0700)]
coredump: update coredump-related headers
Create a new header file, fs/coredump.h, which contains functions only
used by the new coredump.c. It also moves do_coredump to the
include/linux/coredump.h header file, for consistency.
Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Kelly [Fri, 5 Oct 2012 00:15:23 +0000 (17:15 -0700)]
coredump: make core dump functionality optional
Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of
core dump. This saves approximately 2.6k in the compiled kernel, and
complements CONFIG_ELF_CORE, which now depends on it.
CONFIG_COREDUMP also disables coredump-related sysctls, except for
suid_dumpable and related functions, which are necessary for ptrace.
[akpm@linux-foundation.org: fix binfmt_aout.c build] Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
device_cgroup: convert device_cgroup internally to policy + exceptions
The original model of device_cgroup is having a whitelist where all the
allowed devices are listed. The problem with this approach is that is
impossible to have the case of allowing everything but few devices.
The reason for that lies in the way the whitelist is handled internally:
since there's only a whitelist, the "all devices" entry would have to be
removed and replaced by the entire list of possible devices but the ones
that are being denied. Since dev_t is 32 bits long, representing the allowed
devices as a bitfield is not memory efficient.
This patch replaces the "whitelist" by a "exceptions" list and the default
policy is kept as "deny_all" variable in dev_cgroup structure.
The current interface determines that whenever "a" is written to devices.allow
or devices.deny, the entry masking all devices will be added or removed,
respectively. This behavior is kept and it's what will determine the default
policy:
# cat devices.list
a *:* rwm
# echo a >devices.deny
# cat devices.list
# echo a >devices.allow
# cat devices.list
a *:* rwm
The interface is also preserved. For example, if one wants to block only access
to /dev/null:
# ls -l /dev/null
crw-rw-rw- 1 root root 1, 3 Jul 24 16:17 /dev/null
# echo a >devices.allow
# echo "c 1:3 rwm" >devices.deny
# cat /dev/null
cat: /dev/null: Operation not permitted
# echo >/dev/null
bash: /dev/null: Operation not permitted
mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 r" >devices.allow
# cat /dev/null
# echo >/dev/null
bash: /dev/null: Operation not permitted
mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 rw" >devices.allow
# echo >/dev/null
# cat /dev/null
# mknod /tmp/null c 1 3
mknod: `/tmp/null': Operation not permitted
# echo "c 1:3 rwm" >devices.allow
# echo >/dev/null
# cat /dev/null
# mknod /tmp/null c 1 3
#
Note that I didn't rename the functions/variables in this patch, but in the
next one to make reviewing easier.
Signed-off-by: Aristeu Rozanski <aris@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Santos [Fri, 5 Oct 2012 00:15:10 +0000 (17:15 -0700)]
kernel-doc: don't mangle whitespace in Example section
A section with the name "Example" (case-insensitive) has a special meaning
to kernel-doc. These sections are output using mono-type fonts. However,
leading whitespace is stripped, thus robbing a lot of meaning from this,
as indented code examples will be mangled.
This patch preserves the leading whitespace for "Example" sections. More
accurately, it preserves it for all sections, but removes it later if the
section isn't an "Example" section.
Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Santos [Fri, 5 Oct 2012 00:15:08 +0000 (17:15 -0700)]
kernel-doc: bugfix - empty line in Example section
If you have a section named "Example" that contains an empty line,
attempting to generate htmldocs give you the error:
/path/Documentation/DocBook/kernel-api.xml:3455: parser error : Opening and ending tag mismatch: programlisting line 3449 and para
</para><para>
^
/path/Documentation/DocBook/kernel-api.xml:3473: parser error : Opening and ending tag mismatch: para line 3467 and programlisting
</programlisting></informalexample>
^
/path/Documentation/DocBook/kernel-api.xml:3678: parser error : Opening and ending tag mismatch: programlisting line 3672 and para
</para><para>
^
/path/Documentation/DocBook/kernel-api.xml:3701: parser error : Opening and ending tag mismatch: para line 3690 and programlisting
</programlisting></informalexample>
^
unable to parse
/path/Documentation/DocBook/kernel-api.xml
Essentially, the script attempts to close a <programlisting> with a
closing tag for a <para> block. This patch corrects the problem by
simply not outputting anything extra when we're dumping pre-formatted
text, since the empty line will be rendered correctly anyway.
Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the issue by appending all lines ending in a blackslash
(optionally followed by whitespace), removing the backslash and any
whitespace after it prior to appending (just like the C pre-processor
would).
This fixes a break in kerel-doc introduced by the additions to rbtree.h.
Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This does the following:
1: Splits the arguments of a function call to stop it
from exceeding 80 characters
2: Re-indents the arguments of another function call
to prevent the splitting of a quoted string.
Maintain an index of directory inodes by starting cluster, so that
fat_get_parent() can return the proper cached inode rather than inventing
one that cannot be traced back to the filesystem root.
Add a new msdos/vfat binary mount option "nfs" so that FAT filesystems
that are _not_ exported via NFS are not saddled with maintenance of an
index they will never use.
Finally, simplify NFS file handle generation and lookups. An
ext2-congruent implementation is adequate for FAT needs.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Under memory pressure, the system may evict dentries from cache. When the
FAT driver receives a NFS request involving an evicted dentry, it is
unable to reconnect it to the filesystem root. This causes the request to
fail, often with ENOENT.
This is partially due to ineffectiveness of the current FAT NFS
implementation, and partially due to an unimplemented fh_to_parent method.
The latter can cause file accesses to fail on shares exported with
subtree_check.
This patch set provides the FAT driver with the ability to
reconnect dentries. NFS file handle generation and lookups are simplified
and made congruent with ext2.
Testing has involved a memory-starved virtual machine running 3.5-rc5 that
exports a ~2 GB vfat filesystem containing a kernel tree (~770 MB, ~40000
files, 9 levels). Both 'cp -r' and 'ls -lR' operations were performed
from a client, some overlapping, some consecutive. Exports with
'subtree_check' and 'no_subtree_check' have been tested.
Note that while this patch set improves FAT's NFS support, it does not
eliminate ESTALE errors completely.
The following should be considered for NFS clients who are sensitive to ESTALE:
* Mounting with lookupcache=none
Unfortunately this can degrade performance severely, particularly for deep
filesystems.
* Incorporating VFS patches to retry ESTALE failures on the client-side,
such as https://lkml.org/lkml/2012/6/29/381
* Handling ESTALE errors in client application code
This patch:
Move NFS-related code into its own C file. No functional changes.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit c3b79770e51a ("rtc: m41t80: Workaround broken alarm
functionality") disabled m41t80's alarm functions. But since those
functions were not touched, building this driver triggers these GCC
warnings:
drivers/rtc/rtc-m41t80.c:216:12: warning: 'm41t80_rtc_alarm_irq_enable' defined but not used [-Wunused-function]
drivers/rtc/rtc-m41t80.c:238:12: warning: 'm41t80_rtc_set_alarm' defined but not used [-Wunused-function]
drivers/rtc/rtc-m41t80.c:308:12: warning: 'm41t80_rtc_read_alarm' defined but not used [-Wunused-function]
Remove these functions (and the commented out references to them) to
silence these warnings. Anyone wanting to fix the alarm irq functionality
can easily find the removed code in the git log of this file or through
some web searches.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Cc: John Stultz <john.stultz@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:29 +0000 (17:14 -0700)]
drivers/rtc/rtc-em3027.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:27 +0000 (17:14 -0700)]
drivers/rtc/rtc-isl1208.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:25 +0000 (17:14 -0700)]
drivers/rtc/rtc-pcf8563.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:22 +0000 (17:14 -0700)]
drivers/rtc/rtc-rs5c372.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future. while at it also fix a checkpatch
warn WARNING: sizeof rs5c->buf should be sizeof(rs5c->buf)
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:21 +0000 (17:14 -0700)]
drivers/rtc/rtc-s35390a.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:18 +0000 (17:14 -0700)]
drivers/rtc/rtc-x1205.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shubhrajyoti D [Fri, 5 Oct 2012 00:14:17 +0000 (17:14 -0700)]
drivers/rtc/rtc-ds1672.c: convert struct i2c_msg initialization to C99 format
Convert the struct i2c_msg initialization to C99 format. This makes
maintaining and editing the code simpler. Also helps once other fields
like transferred are added in future.
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com> Reviewed-by: Felipe Balbi <balbi@ti.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Fries [Fri, 5 Oct 2012 00:14:12 +0000 (17:14 -0700)]
rtc_sysfs_show_hctosys(): display 0 if resume failed
Without this patch /sys/class/rtc/$CONFIG_RTC_HCTOSYS_DEVICE/hctosys
contains a 1 (meaning "This rtc was used to initialize the system
clock") even if setting the time by do_settimeofday() at bootup failed.
The RTC can also be used to set the clock on resume, if it did 1,
otherwise 0. Previously there was no indication if the RTC was used
to set the clock in resume.
This uses only CONFIG_RTC_HCTOSYS_DEVICE for conditional compilation
instead of it and CONFIG_RTC_HCTOSYS to be more consistent.
rtc_hctosys_ret was moved to class.c so class.c no longer depends on
hctosys.c.
[sfr@canb.auug.org.au: fix build] Signed-off-by: David Fries <David@Fries.net> Cc: Matthew Garrett <mjg@redhat.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Julia Lawall [Fri, 5 Oct 2012 00:14:07 +0000 (17:14 -0700)]
drivers/rtc/rtc-coh901331.c: use clk_prepare_enable() and clk_disable_unprepare()
clk_prepare_enable and clk_disable_unprepare combine clk_prepare and
clk_enable, and clk_disable and clk_unprepare. They make the code more
concise, and ensure that clk_unprepare is called when clk_enable fails.
A simplified version of the semantic patch that introduces calls to these
functions is as follows: (http://coccinelle.lip6.fr/)
Venu Byravarasu [Fri, 5 Oct 2012 00:14:04 +0000 (17:14 -0700)]
rtc: rc5t583: add ricoh rc5t583 RTC driver
Add an RTC driver for the RTC device on Ricoh MFD Rc5t583. Ricoh RTC has
3 types of alarms. The current patch adds support for the Y-Alarm of
RC5t583 RTC.
Signed-off-by: Venu Byravarasu <vbyravarasu@nvidia.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are several comparisons of a unsigned int to less than zero int
spear RTC driver. Such a check will always be true. In all these cases a
signed int is assigned to the unsigned variable, which is checked, before.
So the right fix is to make the checked variable signed as well. In one
case the check can be dropped completely, because all it does it returns
'err' if 'err' is less than zero, otherwise it returns 0. Since in this
particular case 'err' is always either 0 or less this is the same as just
returning 'err'.
The issue has been found using the following coccinelle semantic patch:
//<smpl>
@@
type T;
unsigned T i;
@@
(
*i < 0
|
*i >= 0
)
//</smpl>
The irq field of the jz4740_irc struct is unsigned. Yet we assign the
result of platform_get_irq() to it. platform_get_irq() may return a
negative error code and the code checks for this condition by checking if
'irq' is less than zero. But since 'irq' is unsigned this test will
always be false. Fix it by making 'irq' signed.
The issue was found using the following coccinelle semantic patch:
//<smpl>
@@
type T;
unsigned T i;
@@
(
*i < 0
|
*i >= 0
)
//</smpl>
Stephen Warren [Fri, 5 Oct 2012 00:13:56 +0000 (17:13 -0700)]
rtc: add MAX8907 RTC driver
The MAX8907 is an I2C-based power-management IC containing voltage
regulators, a reset controller, a real-time clock, and a touch-screen
controller.
The driver is based on an original by or fixed by:
* Tom Cherry
* Prashant Gaikwad
* Joseph Yoon
During upstreaming, I (swarren):
* Converted to regmap.
* Fixed handling of RTC_HOUR register containing 12.
* Fixed handling of RTC_WEEKDAY register.
* General cleanup.
Signed-off-by: Stephen Warren <swarren@nvidia.com> Cc: Tom Cherry <tcherry@nvidia.com> Cc: Prashant Gaikwad <pgaikwad@nvidia.com> Cc: Joseph Yoon <tyoon@nvidia.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vincent Palatin [Fri, 5 Oct 2012 00:13:52 +0000 (17:13 -0700)]
rtc: recycle id when unloading a rtc driver
When calling rtc_device_unregister, we are not freeing the id used by the
driver. So when doing a unload/load cycle for a RTC driver (e.g. rmmod
rtc_cmos && modprobe rtc_cmos), its id is incremented by one. As a
consequence, we no longer have neither an rtc0 driver nor a
/proc/driver/rtc (as it only exists for the first driver).
Signed-off-by: Vincent Palatin <vpalatin@chromium.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kim, Milo [Fri, 5 Oct 2012 00:13:45 +0000 (17:13 -0700)]
rtc-proc: permit the /proc/driver/rtc device to use other devices
To get time information via /proc/driver/rtc, only the first device (rtc0)
is used. If the rtcN (eg. rtc1 or rtc2) is used for the system clock,
there is no way to get information of rtcN via /proc/driver/rtc. With
this patch, the time data can be retrieved from the system clock RTC.
If the RTC_HCTOSYS_DEVICE is not defined, then rtc0 is used by default.
Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Gardner [Fri, 5 Oct 2012 00:13:44 +0000 (17:13 -0700)]
drivers/rtc/rtc-isl1208.c: add support for the ISL1218
The ISL1218 chip is identical to the ISL1208, except that it has 6
additional user-storage registers. This patch does not enable access to
those additional registers, but only adds the chip name to the list.
Signed-off-by: Ben Gardner <gardner.ben@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Fri, 5 Oct 2012 00:13:42 +0000 (17:13 -0700)]
binfmt_elf: Uninitialized variable
load_elf_interp() has interp_map_addr carefully described as
"uninitialized_var" and marked so as to avoid a warning. However if you
trace the code it is passed into load_elf_interp and then this value is
checked against NULL.
As this return value isn't used this is actually safe but it freaks
various analysis tools that see un-initialized memory addresses being read
before their value is ever defined.
Set it to NULL as a matter of programming good taste if nothing else
Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paton J. Lewis [Fri, 5 Oct 2012 00:13:39 +0000 (17:13 -0700)]
epoll: support for disabling items, and a self-test app
Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll
item. If epoll_ctl doesn't return -EBUSY in this case, it is then safe to
delete the epoll item in a multi-threaded environment. Also added a new
test_epoll self- test app to both demonstrate the need for this feature
and test it.
Signed-off-by: Paton J. Lewis <palewis@adobe.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jason Baron <jbaron@redhat.com> Cc: Paul Holland <pholland@adobe.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 5 Oct 2012 00:13:36 +0000 (17:13 -0700)]
CodingStyle: add networking specific block comment style
The block comment style in net/ and drivers/net is non-standard.
Document it.
Signed-off-by: Joe Perches <joe@perches.com> Cc: "Allan, Bruce W" <bruce.w.allan@intel.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Fri, 5 Oct 2012 00:13:35 +0000 (17:13 -0700)]
checkpatch: check networking specific block comment style
In an effort to get fewer checkpatch reviewer corrections, add a
networking specific style test for the preferred networking comment style.
/* The preferred style for block comments in
* drivers/net/... and net/... is like this
*/
These tests are only used in net/ and drivers/net/
Tested with:
$ cat drivers/net/t.c
/* foo */
/*
* foo
*/
/* foo
*/
/* foo
* bar */
$ ./scripts/checkpatch.pl -f drivers/net/t.c
WARNING: networking block comments don't use an empty /* line, use /* Comment...
#4: FILE: net/t.c:4:
+
+/*
WARNING: networking block comments put the trailing */ on a separate line
#12: FILE: net/t.c:12:
+ * bar */
total: 0 errors, 2 warnings, 12 lines checked
Signed-off-by: Joe Perches <joe@perches.com> Cc: "Allan, Bruce W" <bruce.w.allan@intel.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pasi Savanainen [Fri, 5 Oct 2012 00:13:29 +0000 (17:13 -0700)]
checkpatch: check utf-8 content from a commit log when it's missing from charset
Check that a commit log doesn't contain UTF-8 when a mail header
explicitly defines a different charset, like
'Content-Type: text/plain; charset="us-ascii"'
Signed-off-by: Pasi Savanainen <pasi.savanainen@nixu.com> Cc: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Fri, 5 Oct 2012 00:13:28 +0000 (17:13 -0700)]
scatterlist: atomic sg_mapping_iter() no longer needs disabled IRQs
SG mapping iterator w/ SG_MITER_ATOMIC set required IRQ disabled because
it originally used KM_BIO_SRC_IRQ to allow use from IRQ handlers.
kmap_atomic() has long been updated to handle stacking atomic mapping
requests on per-cpu basis and only requires not sleeping while mapped.
Update sg_mapping_iter such that atomic iterators only require disabling
preemption instead of disabling IRQ.
While at it, convert wte weird @ARG@ notations to @ARG in the comment of
sg_miter_start().
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Alex Dubov <oakad@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Fri, 5 Oct 2012 00:13:24 +0000 (17:13 -0700)]
lib/vsprintf.c: improve standard conformance of sscanf()
Xen's pciback points out a couple of deficiencies with vsscanf()'s
standard conformance:
- Trailing character matching cannot be checked by the caller: With a
format string of "(%x:%x.%x) %n" absence of the closing parenthesis
cannot be checked, as input of "(00:00.0)" doesn't cause the %n to be
evaluated (because of the code not skipping white space before the
trailing %n).
- The parameter corresponding to a trailing %n could get filled even if
there was a matching error: With a format string of "(%x:%x.%x)%n",
input of "(00:00.0]" would still fill the respective variable pointed to
(and hence again make the mismatch non-detectable by the caller).
This patch aims at fixing those, but leaves other non-conforming aspects
of it untouched, among them these possibly relevant ones:
- improper handling of the assignment suppression character '*' (blindly
discarding all succeeding non-white space from the format and input
strings),
- not honoring conversion specifiers for %n, - not recognizing the C99
conversion specifier 't' (recognized by vsprintf()).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lib/spinlock_debug: avoid livelock in do_raw_spin_lock()
The logic in do_raw_spin_lock() attempts to acquire a spinlock by invoking
arch_spin_trylock() in a loop with a delay between each attempt. Now
consider the following situation in a 2 CPU system:
1. CPU-0 continually acquires and releases a spinlock in a
tight loop; it stays in this loop until some condition X
is satisfied. X can only be satisfied by another CPU.
2. CPU-1 tries to acquire the same spinlock, in an attempt
to satisfy the aforementioned condition X. However, it
never sees the unlocked value of the lock because the
debug spinlock code uses trylock instead of just lock;
it checks at all the wrong moments - whenever CPU-0 has
locked the lock.
Now in the absence of debug spinlocks, the architecture specific spinlock
code can correctly allow CPU-1 to wait in a "queue" (e.g., ticket
spinlocks), ensuring that it acquires the lock at some point. However,
with the debug spinlock code, livelock can easily occur due to the use of
try_lock, which obviously cannot put the CPU in that "queue". This
queueing mechanism is implemented in both x86 and ARM spinlock code.
Note that the situation mentioned above is not hypothetical. A real
problem was encountered where CPU-0 was running hrtimer_cancel with
interrupts disabled, and CPU-1 was attempting to run the hrtimer that
CPU-0 was trying to cancel.
Address this by actually attempting arch_spin_lock once it is suspected
that there is a spinlock lockup. If we're in a situation that is
described above, the arch_spin_lock should succeed; otherwise other
timeout mechanisms (e.g., watchdog) should alert the system of a lockup.
Therefore, if there is a genuine system problem and the spinlock can't be
acquired, the end result (irrespective of this change being present) is
the same. If there is a livelock caused by the debug code, this change
will allow the lock to be acquired, depending on the implementation of the
lower level arch specific spinlock code.
[akpm@linux-foundation.org: tweak comment] Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
genalloc: make it possible to use a custom allocation algorithm
Premit use of another algorithm than the default first-fit one. For
example a custom algorithm could be used to manage alignment requirements.
As I can't predict all the possible requirements/needs for all allocation
uses cases, I add a "free" field 'void *data' to pass any needed
information to the allocation function. For example 'data' could be used
to handle a structure where you store the alignment, the expected memory
bank, the requester device, or any information that could influence the
allocation algorithm.
An usage example may look like this:
struct my_pool_constraints {
int align;
int bank;
...
};
unsigned long my_custom_algo(unsigned long *map, unsigned long size,
unsigned long start, unsigned int nr, void *data)
{
struct my_pool_constraints *constraints = data;
...
deal with allocation contraints
...
return the index in bitmap where perform the allocation
}
Add of best-fit algorithm function:
most of the time best-fit is slower then first-fit but memory fragmentation
is lower. The random buffer allocation/free tests don't show any arithmetic
relation between the allocation time and fragmentation but the
best-fit algorithm
is sometime able to perform the allocation when the first-fit can't.
This new algorithm help to remove static allocations on ESRAM, a small but
fast on-chip RAM of few KB, used for high-performance uses cases like DMA
linked lists, graphic accelerators, encoders/decoders. On the Ux500
(in the ARM tree) we have define 5 ESRAM banks of 128 KB each and use of
static allocations becomes unmaintainable:
cd arch/arm/mach-ux500 && grep -r ESRAM .
./include/mach/db8500-regs.h:/* Base address and bank offsets for ESRAM */
./include/mach/db8500-regs.h:#define U8500_ESRAM_BASE 0x40000000
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK_SIZE 0x00020000
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK0 U8500_ESRAM_BASE
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK1 (U8500_ESRAM_BASE + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK2 (U8500_ESRAM_BANK1 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK3 (U8500_ESRAM_BANK2 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_BANK4 (U8500_ESRAM_BANK3 + U8500_ESRAM_BANK_SIZE)
./include/mach/db8500-regs.h:#define U8500_ESRAM_DMA_LCPA_OFFSET 0x10000
./include/mach/db8500-regs.h:#define U8500_DMA_LCPA_BASE
(U8500_ESRAM_BANK0 + U8500_ESRAM_DMA_LCPA_OFFSET)
./include/mach/db8500-regs.h:#define U8500_DMA_LCLA_BASE U8500_ESRAM_BANK4
I want to use genalloc to do dynamic allocations but I need to be able to
fine tune the allocation algorithm. I my case best-fit algorithm give
better results than first-fit, but it will not be true for every use case.
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@stericsson.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Fri, 5 Oct 2012 00:13:17 +0000 (17:13 -0700)]
lib/Kconfig.debug: adjust hard-lockup related Kconfig options
The main option should not appear in the resulting .config when the
dependencies aren't met (i.e. use "depends on" rather than directly
setting the default from the combined dependency values).
The sub-options should depend on the main option rather than a more
generic higher level one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Elder [Fri, 5 Oct 2012 00:13:16 +0000 (17:13 -0700)]
lib/parser.c: avoid overflow in match_number()
The result of converting an integer value to another signed integer type
that's unable to represent the original value is implementation defined.
(See notes in section 6.3.1.3 of the C standard.)
In match_number(), the result of simple_strtol() (which returns type long)
is assigned to a value of type int.
Instead, handle the result of simple_strtol() in a well-defined way, and
return -ERANGE if the result won't fit in the int variable used to hold
the parsed result.
No current callers pay attention to the particular error value returned,
so this additional return code shouldn't do any harm.
[akpm@linux-foundation.org: coding-style tweaks] Signed-off-by: Alex Elder <elder@inktank.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>