Linus Torvalds [Thu, 28 Feb 2013 20:43:43 +0000 (12:43 -0800)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull first round of SCSI updates from James Bottomley:
"The patch set is mostly driver updates (bnx2fc, ipr, lpfc, qla4) and a
few bug fixes"
Pull delayed because google hates James, and sneakily considers his pull
requests spam. Why, google, why?
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (60 commits)
[SCSI] aacraid: 1024 max outstanding command support for Series 7 and above
[SCSI] bnx2fc: adjust duplicate test
[SCSI] qla4xxx: Update driver version to 5.03.00-k4
[SCSI] qla4xxx: Fix return code for qla4xxx_session_get_param.
[SCSI] qla4xxx: wait for boot target login response during probe.
[SCSI] qla4xxx: Added support for force firmware dump
[SCSI] qla4xxx: Re-register IRQ handler while retrying initialize of adapter
[SCSI] qla4xxx: Throttle active IOCBs to firmware limits
[SCSI] qla4xxx: Remove unnecessary code from qla4xxx_init_local_data
[SCSI] qla4xxx: Quiesce driver activities while loopback
[SCSI] qla4xxx: Rename MBOX_ASTS_IDC_NOTIFY to MBOX_ASTS_IDC_REQUEST_NOTIFICATION
[SCSI] qla4xxx: Add spurious interrupt messages under debug level 2
[SCSI] cxgb4i: Remove the scsi host device when removing device
[SCSI] bfa: fix strncpy() limiter in bfad_start_ops()
[SCSI] qla4xxx: Update driver version to 5.03.00-k3
[SCSI] qla4xxx: Correct the validation to check in get_sys_info mailbox
[SCSI] qla4xxx: Pass correct function param to qla4_8xxx_rd_direct
[SCSI] lpfc 8.3.37: Update lpfc version for 8.3.37 driver release
[SCSI] lpfc 8.3.37: Fixed infinite loop in lpfc_sli4_fcf_rr_next_index_get.
[SCSI] lpfc 8.3.37: Fixed crash due to SLI Port invalid resource count
...
Linus Torvalds [Thu, 28 Feb 2013 04:58:09 +0000 (20:58 -0800)]
Merge branch 'akpm' (final batch from Andrew)
Merge third patch-bumb from Andrew Morton:
"This wraps me up for -rc1.
- Lots of misc stuff and things which were deferred/missed from
patchbombings 1 & 2.
- ocfs2 things
- lib/scatterlist
- hfsplus
- fatfs
- documentation
- signals
- procfs
- lockdep
- coredump
- seqfile core
- kexec
- Tejun's large IDR tree reworkings
- ipmi
- partitions
- nbd
- random() things
- kfifo
- tools/testing/selftests updates
- Sasha's large and pointless hlist cleanup"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (163 commits)
hlist: drop the node parameter from iterators
kcmp: make it depend on CHECKPOINT_RESTORE
selftests: add a simple doc
tools/testing/selftests/Makefile: rearrange targets
selftests/efivarfs: add create-read test
selftests/efivarfs: add empty file creation test
selftests: add tests for efivarfs
kfifo: fix kfifo_alloc() and kfifo_init()
kfifo: move kfifo.c from kernel/ to lib/
arch Kconfig: centralise CONFIG_ARCH_NO_VIRT_TO_BUS
w1: add support for DS2413 Dual Channel Addressable Switch
memstick: move the dereference below the NULL test
drivers/pps/clients/pps-gpio.c: use devm_kzalloc
Documentation/DMA-API-HOWTO.txt: fix typo
include/linux/eventfd.h: fix incorrect filename is a comment
mtd: mtd_stresstest: use prandom_bytes()
mtd: mtd_subpagetest: convert to use prandom library
mtd: mtd_speedtest: use prandom_bytes
mtd: mtd_pagetest: convert to use prandom library
mtd: mtd_oobtest: convert to use prandom library
...
Sasha Levin [Thu, 28 Feb 2013 01:06:00 +0000 (17:06 -0800)]
hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Thu, 28 Feb 2013 01:05:58 +0000 (17:05 -0800)]
kcmp: make it depend on CHECKPOINT_RESTORE
Since kcmp syscall has been implemented (initially on x86 architecture) a
number of other archs wire it up as well: xtensa, sparc, sh, s390, mips,
microblaze, m68k (not taking into account those who uses
<asm-generic/unistd.h> for syscall numbers definitions).
But the Makefile, which turns kcmp.o generation on still depends on former
config-x86. Thus get rid of this limitation and make kcmp.o depend on
CHECKPOINT_RESTORE option.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeremy Kerr [Thu, 28 Feb 2013 01:05:57 +0000 (17:05 -0800)]
selftests: add a simple doc
This change adds a little documentation to the tests under
tools/testing/selftests/, based on akpm's explanation.
[akpm@linux-foundation.org: move from Documentation to tools/testing/selftests/README.txt] Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeremy Kerr [Thu, 28 Feb 2013 01:05:55 +0000 (17:05 -0800)]
selftests/efivarfs: add create-read test
Test that reads from a newly-created efivarfs file (with no data written)
will return EOF.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Lingzhu Xiang <lxiang@redhat.com> Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeremy Kerr [Thu, 28 Feb 2013 01:05:53 +0000 (17:05 -0800)]
selftests/efivarfs: add empty file creation test
Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Lingzhu Xiang <lxiang@redhat.com> Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeremy Kerr [Thu, 28 Feb 2013 01:05:52 +0000 (17:05 -0800)]
selftests: add tests for efivarfs
This change adds a few initial efivarfs tests to the
tools/testing/selftests directory.
The open-unlink test is based on code from Lingzhu Xiang.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Lingzhu Xiang <lxiang@redhat.com> Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stefani Seibold [Thu, 28 Feb 2013 01:05:51 +0000 (17:05 -0800)]
kfifo: fix kfifo_alloc() and kfifo_init()
Fix kfifo_alloc() and kfifo_init() to alloc at least the requested number
of elements. Since the kfifo operates on power of 2 the request size will
be rounded up to the next power of two.
Julia Lawall [Thu, 28 Feb 2013 01:05:44 +0000 (17:05 -0800)]
drivers/pps/clients/pps-gpio.c: use devm_kzalloc
devm_kzalloc allocates memory that is released when a driver detaches.
This patch uses devm_kzalloc for data that is allocated in the probe
function of a platform device and is only freed in the remove function.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Rodolfo Giometti <giometti@enneenne.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Elder [Thu, 28 Feb 2013 01:05:28 +0000 (17:05 -0800)]
nbd: fix sparse warning
I just fixed this in "drivers/block/rbd.c" and I noticed that
"drivers/block/nbd.c" has the same problem. Fix a warning issued by
sparse by adding some lockdep annotations to indicate the queue lock gets
dropped (because it's held when do_nbd_request() is called) and
re-acquired within the function.
Signed-off-by: Alex Elder <elder@inktank.com> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Paul Clements <paul.clements@us.sios.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wouter Verhelst [Thu, 28 Feb 2013 01:05:27 +0000 (17:05 -0800)]
nbd: update documentation and link to mailinglist
Documentation/blockdev/nbd.txt contained some documentation which was
horribly outdated and probably still dates from the original patch that
added NBD support to the kernel.
This patch removes the useless and outdated bits. The tools on nbd.sf.net
are fully documented in manpages, which is where documentation for the
non-kernel bits should live.
Additionally, add a reference to the MAINTAINERS file for the nbd-general
mailinglist that is used for discussion of the userland tools and the
kernel module already.
Signed-off-by: Wouter Verhelst <w@uter.be> Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo Bonzini [Thu, 28 Feb 2013 01:05:26 +0000 (17:05 -0800)]
nbd: show read-only state in sysfs
Pass the read-only flag to set_device_ro, so that it will be visible to
the block layer and in sysfs.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Clements <Paul.Clements@steeleye.com> Cc: Alex Bligh <alex@alex.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo Bonzini [Thu, 28 Feb 2013 01:05:25 +0000 (17:05 -0800)]
nbd: fsync and kill block device on shutdown
There are two problems with shutdown in the NBD driver.
1: Receiving the NBD_DISCONNECT ioctl does not sync the filesystem.
This patch adds the sync operation into __nbd_ioctl()'s
NBD_DISCONNECT handler. This is useful because BLKFLSBUF is restricted
to processes that have CAP_SYS_ADMIN, and the NBD client may not
possess it (fsync of the block device does not sync the filesystem,
either).
2: Once we clear the socket we have no guarantee that later reads will
come from the same backing storage.
The patch adds calls to kill_bdev() in __nbd_ioctl()'s socket
clearing code so the page cache is cleaned, lest reads that hit on the
page cache will return stale data from the previously-accessible disk.
Example:
# qemu-nbd -r -c/dev/nbd0 /dev/sr0
# file -s /dev/nbd0
/dev/stdin: # UDF filesystem data (version 1.5) etc.
# qemu-nbd -d /dev/nbd0
# qemu-nbd -r -c/dev/nbd0 /dev/sda
# file -s /dev/nbd0
/dev/stdin: # UDF filesystem data (version 1.5) etc.
While /dev/sda has:
# file -s /dev/sda
/dev/sda: x86 boot sector; etc.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paul Clements <Paul.Clements@steeleye.com> Cc: Alex Bligh <alex@alex.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Bligh [Thu, 28 Feb 2013 01:05:23 +0000 (17:05 -0800)]
nbd: support FLUSH requests
Currently, the NBD device does not accept flush requests from the Linux
block layer. If the NBD server opened the target with neither O_SYNC nor
O_DSYNC, however, the device will be effectively backed by a writeback
cache. Without issuing flushes properly, operation of the NBD device will
not be safe against power losses.
The NBD protocol has support for both a cache flush command and a FUA
command flag; the server will also pass a flag to note its support for
these features. This patch adds support for the cache flush command and
flag. In the kernel, we receive the flags via the NBD_SET_FLAGS ioctl,
and map NBD_FLAG_SEND_FLUSH to the argument of blk_queue_flush. When the
flag is active the block layer will send REQ_FLUSH requests, which we
translate to NBD_CMD_FLUSH commands.
FUA support is not included in this patch because all free software
servers implement it with a full fdatasync; thus it has no advantage over
supporting flush only. Because I [Paolo] cannot really benchmark it in a
realistic scenario, I cannot tell if it is a good idea or not. It is also
not clear if it is valid for an NBD server to support FUA but not flush.
The Linux block layer gives a warning for this combination, the NBD
protocol documentation says nothing about it.
The patch also fixes a small problem in the handling of flags: nbd->flags
must be cleared at the end of NBD_DO_IT, but the driver was not doing
that. The bug manifests itself as follows. Suppose you two different
client/server pairs to start the NBD device. Suppose also that the first
client supports NBD_SET_FLAGS, and the first server sends
NBD_FLAG_SEND_FLUSH; the second pair instead does neither of these two
things. Before this patch, the second invocation of NBD_DO_IT will use a
stale value of nbd->flags, and the second server will issue an error every
time it receives an NBD_CMD_FLUSH command.
This bug is pre-existing, but it becomes much more important after this
patch; flush failures make the device pretty much unusable, unlike
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Bligh <alex@alex.org.uk> Acked-by: Paul Clements <Paul.Clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xi Wang [Thu, 28 Feb 2013 01:05:21 +0000 (17:05 -0800)]
sysctl: fix null checking in bin_dn_node_address()
The null check of `strchr() + 1' is broken, which is always non-null,
leading to OOB read. Instead, check the result of strchr().
Signed-off-by: Xi Wang <xi.wang@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ming Lei [Thu, 28 Feb 2013 01:05:19 +0000 (17:05 -0800)]
block/partitions: optimize memory allocation in check_partition()
Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch, so
it is easy to trigger page allocation failure by check_partition,
especially in hotplug block device situation(such as, USB mass storage,
MMC card, ...), and Felipe Balbi has observed the failure.
This patch does below optimizations on the allocation of struct
parsed_partitions to try to address the issue:
- make parsed_partitions.parts as pointer so that the pointed memory can
fit in 32KB buffer, then approximate 32KB memory can be saved
- vmalloc the buffer pointed by parsed_partitions.parts because 32KB is
still a bit big for kmalloc
- given that many devices have the partition count limit, so only
allocate disk_max_parts() partitions instead of 256 partitions always
Signed-off-by: Ming Lei <ming.lei@canonical.com> Reported-by: Felipe Balbi <balbi@ti.com> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ming Lei [Thu, 28 Feb 2013 01:05:18 +0000 (17:05 -0800)]
block/partitions/mac.c: obey the state->limit constraint
It isn't necessary to read the information of partitions whose number is
equal and more than state->limit since only maximum state->limit
partitions will be added inside rescan_partitions().
That is also what other kind of partitions are doing.
Signed-off-by: Ming Lei <ming.lei@canonical.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Jones [Thu, 28 Feb 2013 01:05:17 +0000 (17:05 -0800)]
block/partitions/efi.c: ensure that the GPT header is at least the size of the structure.
UEFI 2.3.1D will include a change to the spec language mandating that a
GPT header must be greater than *or equal to* the size of the defined
structure. While verifying that this would work on Linux, I discovered
that we're not actually checking the minimum bound at all.
The result of this is that when we verify the checksum, it's possible that
on a malformed header (with header_size of 0), we won't actually verify
any data.
[akpm@linux-foundation.org: fix printk warning] Signed-off-by: Peter Jones <pjones@redhat.com> Acked-by: Matt Fleming <matt.fleming@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Stephen Warren <swarren@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
block/partition/msdos: detect AIX formatted disks even without 55aa
AIX formatted disks do not always have the MSDOS 55aa signature.
This happens e.g. for unbootable AIX disks.
Up to now, such disks were not recognized as AIX disks, because of the
missing 55aa. Fix that by inverting the two tests. Let's first
check for the AIX magic strings, and only if that fails check for
the MSDOS magic word.
Signed-off-by: Philippe De Muyter <phdm@macqel.be> Cc: Andreas Mohr <andi@lisas.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Jens Axboe <axboe@kernel.dk> Cc: Olaf Hering <olh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Corey Minyard [Thu, 28 Feb 2013 01:05:12 +0000 (17:05 -0800)]
ipmi: add new kernel options to prevent automatic ipmi init
The configuration change building ipmi_si into the kernel precludes the
use of a custom driver that can utilize more than one KCS interface,
multiple IPMBs, and more than one BMC. This capability is important for
fault-tolerant systems.
Even if the kernel option ipmi_si.trydefaults=0 is specified, ipmi_si
discovers and claims one of the KCS interfaces on a Stratus server. The
inability to now prevent the kernel from managing this device is a
regression from previous kernels. The regression breaks a capability
fault-tolerant vendors have relied upon.
To support both ACPI opregion access and the need to avoid activation of
ipmi_si on some platforms, we've added two new kernel options,
ipmi_si.tryacpi and ipmi_si.trydmi be added to prevent ipmi_si from
initializing when these options are set to 0 on the kernel command line.
With these options at the default value of 1, ipmi_si init proceeds
according to the kernel default.
Tested-by: Jim Paradis <jparadis@redhat.com> Signed-off-by: Robert Evans <Robert.Evans@stratus.com> Signed-off-by: Jim Paradis <jparadis@redhat.com> Signed-off-by: Tony Camuso <tcamuso@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Given the obvious distinction between kernel and userspace supported
by uapi/, it seems unnecessary to comment on that.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 28 Feb 2013 01:05:10 +0000 (17:05 -0800)]
idr: explain WARN_ON_ONCE() on negative IDs out-of-range ID
Until recently, when an negative ID is specified, idr functions used to
ignore the sign bit and proceeded with the operation with the rest of
bits, which is bizarre and error-prone. The behavior recently got changed
so that negative IDs are treated as invalid but we're triggering
WARN_ON_ONCE() on negative IDs just in case somebody was depending on the
sign bit being ignored, so that those can be detected and fixed easily.
We only need this for a while. Explain why WARN_ON_ONCE()s are there and
that they can be removed later.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 28 Feb 2013 01:05:08 +0000 (17:05 -0800)]
idr: implement lookup hint
While idr lookup isn't a particularly heavy operation, it still is too
substantial to use in hot paths without worrying about the performance
implications. With recent changes, each idr_layer covers 256 slots
which should be enough to cover most use cases with single idr_layer
making lookup hint very attractive.
This patch adds idr->hint which points to the idr_layer which
allocated an ID most recently and the fast path lookup becomes
if (look up target's prefix matches that of the hinted layer)
return hint->ary[ID's offset in the leaf layer];
which can be inlined.
idr->hint is set to the leaf node on idr_fill_slot() and cleared from
free_layer().
[andriy.shevchenko@linux.intel.com: always do slow path when hint is uninitialized] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 28 Feb 2013 01:05:06 +0000 (17:05 -0800)]
idr: make idr_layer larger
With recent preloading changes, idr no longer keeps full layer cache per
each idr instance (used to be ~6.5k per idr on 64bit) and the previous
patch removed restriction on the bitmap size. Both now allow us to have
larger layers.
Increase IDR_BITS to 8 regardless of BITS_PER_LONG. Each layer is
slightly larger than 2k on 64bit and 1k on 32bit and carries 256 entries.
The size isn't too large, especially compared to what we used to waste on
per-idr caches, and 256 entries should be able to serve most use cases
with single layer. The max tree depth is 4 which is much better than the
previous 6 on 64bit and 7 on 32bit.
Tejun Heo [Thu, 28 Feb 2013 01:05:05 +0000 (17:05 -0800)]
idr: remove length restriction from idr_layer->bitmap
Currently, idr->bitmap is declared as an unsigned long which restricts
the number of bits an idr_layer can contain. All bitops can handle
arbitrary positive integer bit number and there's no reason for this
restriction.
Declare idr_layer->bitmap using DECLARE_BITMAP() instead of a single
unsigned long.
* idr_layer->bitmap is now an array. '&' dropped from params to
bitops.
* Replaced "== IDR_FULL" tests with bitmap_full() and removed
IDR_FULL.
* Replaced find_next_bit() on ~bitmap with find_next_zero_bit().
* Replaced "bitmap = 0" with bitmap_clear().
This patch doesn't (or at least shouldn't) introduce any behavior
changes.
Tejun Heo [Thu, 28 Feb 2013 01:05:04 +0000 (17:05 -0800)]
idr: remove MAX_IDR_MASK and move left MAX_IDR_* into idr.c
MAX_IDR_MASK is another weirdness in the idr interface. As idr covers
whole positive integer range, it's defined as 0x7fffffff or INT_MAX.
Its usage in idr_find(), idr_replace() and idr_remove() is bizarre.
They basically mask off the sign bit and operate on the rest, so if
the caller, by accident, passes in a negative number, the sign bit
will be masked off and the remaining part will be used as if that was
the input, which is worse than crashing.
The constant is visible in idr.h and there are several users in the
kernel.
Basically used to test if adap->nr is a negative number which isn't
-1 and returns -EINVAL if so. idr_alloc() already has negative
@start checking (w/ WARN_ON_ONCE), so this can go away.
Used to wrap cyclic @start. Can be replaced with max(next, 0).
Note that this type of cyclic allocation using idr is buggy. These
are prone to spurious -ENOSPC failure after the first wraparound.
* fs/super.c:get_anon_bdev()
The ID allocated from ida is masked off before being tested whether
it's inside valid range. ida allocated ID can never be a negative
number and the masking is unnecessary.
Update idr_*() functions to fail with -EINVAL when negative @id is
specified and update other MAX_IDR_MASK users as described above.
This leaves MAX_IDR_MASK without any user, remove it and relocate
other MAX_IDR_* constants to lib/idr.c.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jean Delvare <khali@linux-fr.org> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Wolfram Sang <wolfram@the-dreams.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 28 Feb 2013 01:05:02 +0000 (17:05 -0800)]
idr: fix top layer handling
Most functions in idr fail to deal with the high bits when the idr
tree grows to the maximum height.
* idr_get_empty_slot() stops growing idr tree once the depth reaches
MAX_IDR_LEVEL - 1, which is one depth shallower than necessary to
cover the whole range. The function doesn't even notice that it
didn't grow the tree enough and ends up allocating the wrong ID
given sufficiently high @starting_id.
For example, on 64 bit, if the starting id is 0x7fffff01,
idr_get_empty_slot() will grow the tree 5 layer deep, which only
covers the 30 bits and then proceed to allocate as if the bit 30
wasn't specified. It ends up allocating 0x3fffff01 without the bit
30 but still returns 0x7fffff01.
* __idr_remove_all() will not remove anything if the tree is fully
grown.
* idr_find() can't find anything if the tree is fully grown.
* idr_for_each() and idr_get_next() can't iterate anything if the tree
is fully grown.
Fix it by introducing idr_max() which returns the maximum possible ID
given the depth of tree and replacing the id limit checks in all
affected places.
As the idr_layer pointer array pa[] needs to be 1 larger than the
maximum depth, enlarge pa[] arrays by one.
While this plugs the discovered issues, the whole code base is
horrible and in desparate need of rewrite. It's fragile like hell,
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 28 Feb 2013 01:04:53 +0000 (17:04 -0800)]
ipc: convert to idr_alloc()
Convert to the much saner new idr interface.
The new interface doesn't directly translate to the way idr_pre_get()
was used around ipc_addid() as preloading disables preemption. From
my cursory reading, it seems like we should be able to do all
allocation from ipc_addid(), so I moved it there. Can you please
check whether this would be okay? If this is wrong and ipc_addid()
should be allowed to be called from non-sleepable context, I'd suggest
allocating id itself in the outer functions and later install the
pointer using idr_replace().
Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 28 Feb 2013 01:04:50 +0000 (17:04 -0800)]
inotify: convert to idr_alloc()
Convert to the much saner new idr interface.
Note that the adhoc cyclic id allocation is buggy. If wraparound
happens, the previous code with idr_get_new_above() may segfault and
the converted code will trigger WARN and return -EINVAL. Even if it's
fixed to wrap to zero, the code will be prone to unnecessary -ENOSPC
failures after the first wraparound. We probably need to implement
proper cyclic support in idr.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 28 Feb 2013 01:04:49 +0000 (17:04 -0800)]
dlm: convert to idr_alloc()
Convert to the much saner new idr interface. Error return values from
recover_idr_add() mix -1 and -errno. The conversion doesn't change
that but it looks iffy.
Tejun Heo [Thu, 28 Feb 2013 01:04:34 +0000 (17:04 -0800)]
macvtap: convert to idr_alloc()
Convert to the much saner new idr interface.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 28 Feb 2013 01:04:16 +0000 (17:04 -0800)]
IB/core: convert to idr_alloc()
Convert to the much saner new idr interface.
v2: Mike triggered WARN_ON() in idr_preload() because send_mad(),
which may be used from non-process context, was calling
idr_preload() unconditionally. Preload iff @gfp_mask has
__GFP_WAIT.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reported-by: "Marciniszyn, Mike" <mike.marciniszyn@intel.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 28 Feb 2013 01:04:15 +0000 (17:04 -0800)]
i2c: convert to idr_alloc()
Convert to the much saner new idr interface.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jean Delvare <khali@linux-fr.org> Cc: Wolfram Sang <wolfram@the-dreams.de> Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 28 Feb 2013 01:04:05 +0000 (17:04 -0800)]
firewire: convert to idr_alloc()
Convert to the much saner new idr interface.
v2: Stefan pointed out that add_client_resource() may be called from
non-process context. Preload iff @gfp_mask contains __GFP_WAIT.
Also updated to include minor upper limit check.
[tim.gardner@canonical.com: fix accidentally orphaned 'minor'[ Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tejun Heo [Thu, 28 Feb 2013 01:04:00 +0000 (17:04 -0800)]
atm/nicstar: convert to idr_alloc()
Convert to the much saner new idr interface. The existing code looks
buggy to me - ID 0 is treated as no-ID but allocation specifies 0 as
lower limit and there's no error handling after partial success. This
conversion keeps the bugs unchanged.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Chas Williams <chas@cmf.nrl.navy.mil> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>