Florian Fainelli [Sat, 23 Aug 2008 16:54:37 +0000 (18:54 +0200)]
Documentation: Document the RB532 specific kmac tag
The Routerboard 532 bootloader passes the korina ethernet
MAC adapter address to the kernel on the command line.
Document this in the kernel-parameters file.
Florian Fainelli [Fri, 22 Aug 2008 15:01:31 +0000 (17:01 +0200)]
MIPS: RB532: Remove gpio bootup state
We are no longer using gpio bootup state, so do not export
it and do not parse the kernel command line tag for it.
Instead we provide gpio-keys for the button the gpio bootup
state was checking.
Florian Fainelli [Fri, 22 Aug 2008 15:01:03 +0000 (17:01 +0200)]
MIPS: RB532: Use physical addresses for gpio and device controller registers
This patch fixes the misuse of virtual addresses for the GPIO and third
device controller which would lead to problems while accessing ioremap'd
registers.
Atsushi Nemoto [Tue, 19 Aug 2008 13:55:14 +0000 (22:55 +0900)]
MIPS: TXx9: Make spi_eeprom.c more generic
Helper routines in txx9/rbtx4938/spi_eeprom.c is not TX4938 specific.
Move it to txx9/generic/ directory and make it works with SPI bus
number other than 0.
Atsushi Nemoto [Tue, 19 Aug 2008 13:55:09 +0000 (22:55 +0900)]
MIPS: TXx9: Cache fixup
TX39/TX49 can enable/disable I/D cache at runtime. Add kernel options
to control them. This is useful to debug some cache-related issues,
such as aliasing or I/D coherency. Also enable CWF bit for TX49 SoCs.
Atsushi Nemoto [Tue, 19 Aug 2008 13:55:06 +0000 (22:55 +0900)]
MIPS: TXx9: Improve handling of built-in and command-line args
* Make prom_init_cmdline() static and be called from prom_init.
* Append built-in args if the first character was '+'.
* Drop command-line args if the first character of built-in was '-'.
* Enclose args include spaces by quotes.
* TX4938_NAND_BOOT is no longer needed.
Linus Torvalds [Fri, 10 Oct 2008 20:10:51 +0000 (13:10 -0700)]
Merge branch 'rcu-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'rcu-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
rcu: RCU-based detection of stalled CPUs for Classic RCU, fix
rcu: RCU-based detection of stalled CPUs for Classic RCU
rcu: add rcu_read_lock_sched() / rcu_read_unlock_sched()
rcu: fix sparse shadowed variable warning
doc/RCU: fix pseudocode in rcuref.txt
rcuclassic: fix compiler warning
rcu: use irq-safe locks
rcuclassic: fix compilation NG
rcu: fix locking cleanup fallout
rcu: remove redundant ACCESS_ONCE definition from rcupreempt.c
rcu: fix classic RCU locking cleanup lockdep problem
rcu: trace fix possible mem-leak
rcu: just rename call_rcu_bh instead of making it a macro
rcu: remove list_for_each_rcu()
rcu: fixes to include/linux/rcupreempt.h
rcu: classic RCU locking and memory-barrier cleanups
rcu: prevent console flood when one CPU sees another AWOL via RCU
rcu, debug: detect stalled grace periods, cleanups
rcu, debug: detect stalled grace periods
rcu classic: new algorithm for callbacks-processing(v2)
...
Linus Torvalds [Fri, 10 Oct 2008 19:44:43 +0000 (12:44 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
selinux: Fix an uninitialized variable BUG/panic in selinux_secattr_to_sid()
selinux: use default proc sid on symlinks
file capabilities: uninline cap_safe_nice
Update selinux info in MAINTAINERS and Kconfig help text
SELinux: add gitignore file for mdp script
SELinux: add boundary support and thread context assignment
securityfs: do not depend on CONFIG_SECURITY
selinux: add support for installing a dummy policy (v2)
security: add/fix security kernel-doc
selinux: Unify for- and while-loop style
selinux: conditional expression type validation was off-by-one
smack: limit privilege by label
SELinux: Fix a potentially uninitialised variable in SELinux hooks
SELinux: trivial, remove unneeded local variable
SELinux: Trivial minor fixes that change C null character style
make selinux_write_opts() static
Linus Torvalds [Fri, 10 Oct 2008 19:42:31 +0000 (12:42 -0700)]
Merge branch 'sched-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits)
sched debug: add name to sched_domain sysctl entries
sched: sync wakeups vs avg_overlap
sched: remove redundant code in cpu_cgroup_create()
sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rq
cpusets: scan_for_empty_cpusets(), cpuset doesn't seem to be so const
sched: minor optimizations in wake_affine and select_task_rq_fair
sched: maintain only task entities in cfs_rq->tasks list
sched: fixup buddy selection
sched: more sanity checks on the bandwidth settings
sched: add some comments to the bandwidth code
sched: fixlet for group load balance
sched: rework wakeup preemption
CFS scheduler: documentation about scheduling policies
sched: clarify ifdef tangle
sched: fix list traversal to use _rcu variant
sched: turn off WAKEUP_OVERLAP
sched: wakeup preempt when small overlap
kernel/cpu.c: create a CPU_STARTING cpu_chain notifier
kernel/cpu.c: Move the CPU_DYING notifiers
sched: fix __load_balance_iterator() for cfq with only one task
...
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: skcipher - Use RNG interface instead of get_random_bytes
crypto: rng - RNG interface and implementation
crypto: api - Add fips_enable flag
crypto: skcipher - Move IV generators into their own modules
crypto: cryptomgr - Test ciphers using ECB
crypto: api - Use test infrastructure
crypto: cryptomgr - Add test infrastructure
crypto: tcrypt - Add alg_test interface
crypto: tcrypt - Abort and only log if there is an error
crypto: crc32c - Use Intel CRC32 instruction
crypto: tcrypt - Avoid using contiguous pages
crypto: api - Display larval objects properly
crypto: api - Export crypto_alg_lookup instead of __crypto_alg_lookup
crypto: Kconfig - Replace leading spaces with tabs
Linus Torvalds [Fri, 10 Oct 2008 18:16:33 +0000 (11:16 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (29 commits)
RDMA/nes: Fix slab corruption
IB/mlx4: Set RLKEY bit for kernel QPs
RDMA/nes: Correct error_module bit mask
RDMA/nes: Fix routed RDMA connections
RDMA/nes: Enhanced PFT management scheme
RDMA/nes: Handle AE bounds violation
RDMA/nes: Limit critical error interrupts
RDMA/nes: Stop spurious MAC interrupts
RDMA/nes: Correct tso_wqe_length
RDMA/nes: Fill in firmware version for ethtool
RDMA/nes: Use ethtool timer value
RDMA/nes: Correct MAX TSO frags value
RDMA/nes: Enable MC/UC after changing MTU
RDMA/nes: Free NIC TX buffers when destroying NIC QP
RDMA/nes: Fix MDC setting
RDMA/nes: Add wqm_quanta module option
RDMA/nes: Module parameter permissions
RDMA/cxgb3: Set active_mtu in ib_port_attr
RDMA/nes: Add support for 4-port 1G HP blade card
RDMA/nes: Make mini_cm_connect() static
...
Currently we disable barriers as soon as we get a buffer in xlog_iodone
that has the XBF_ORDERED flag cleared. But this can be the case not only
for buffers where the barrier failed, but also the first buffer of a
split log write in case of a log wraparound. Due to the disabled
barriers we can easily get directory corruption on unclean shutdowns.
So instead of using this check add a new buffer flag for failed barrier
writes.
This is a regression vs 2.6.26 caused by patch to use the right macro
to check for the ORDERED flag, as we previously got true returned for
every buffer.
Thanks to Toei Rei for reporting the bug.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: David Chinner <david@fromorbit.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
GFS2: Support for I/O barriers
GFS2: Add UUID to GFS2 sb
GFS2: high time to take some time over atime
GFS2: The war on bloat
GFS2: GFS2 will panic if you misspell any mount options
GFS2: Direct IO write at end of file error
GFS2: Use an IS_ERR test rather than a NULL test
GFS2: Fix race relating to glock min-hold time
GFS2: Fix & clean up GFS2 rename
GFS2: rm on multiple nodes causes panic
GFS2: Fix metafs mounts
GFS2: Fix debugfs glock file iterator
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (37 commits)
[SCSI] zfcp: fix double dbf id usage
[SCSI] zfcp: wait on SCSI work to be finished before proceeding with init dev
[SCSI] zfcp: fix erp list usage without using locks
[SCSI] zfcp: prevent fc_remote_port_delete calls for unregistered rport
[SCSI] zfcp: fix deadlock caused by shared work queue tasks
[SCSI] zfcp: put threshold data in hba trace
[SCSI] zfcp: Simplify zfcp data structures
[SCSI] zfcp: Simplify get_adapter_by_busid
[SCSI] zfcp: remove all typedefs and replace them with standards
[SCSI] zfcp: attach and release SAN nameserver port on demand
[SCSI] zfcp: remove unused references, declarations and flags
[SCSI] zfcp: Update message with input from review
[SCSI] zfcp: add queue_full sysfs attribute
[SCSI] scsi_dh: suppress comparison warning
[SCSI] scsi_dh: add Dell product information into rdac device handler
[SCSI] qla2xxx: remove the unused SCSI_QLOGIC_FC_FIRMWARE option
[SCSI] qla2xxx: fix printk format warnings
[SCSI] qla2xxx: Update version number to 8.02.01-k8.
[SCSI] qla2xxx: Ignore payload reserved-bits during RSCN processing.
[SCSI] qla2xxx: Additional residual-count corrections during UNDERRUN handling.
...
Linus Torvalds [Fri, 10 Oct 2008 17:52:45 +0000 (10:52 -0700)]
Merge branch 'for-2.6.28' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.28' of git://git.kernel.dk/linux-2.6-block: (132 commits)
doc/cdrom: Trvial documentation error, file not present
block_dev: fix kernel-doc in new functions
block: add some comments around the bio read-write flags
block: mark bio_split_pool static
block: Find bio sector offset given idx and offset
block: gendisk integrity wrapper
block: Switch blk_integrity_compare from bdev to gendisk
block: Fix double put in blk_integrity_unregister
block: Introduce integrity data ownership flag
block: revert part of d7533ad0e132f92e75c1b2eb7c26387b25a583c1
bio.h: Remove unused conditional code
block: remove end_{queued|dequeued}_request()
block: change elevator to use __blk_end_request()
gdrom: change to use __blk_end_request()
memstick: change to use __blk_end_request()
virtio_blk: change to use __blk_end_request()
blktrace: use BLKTRACE_BDEV_SIZE as the name size for setup structure
block: add lld busy state exporting interface
block: Fix blk_start_queueing() to not kick a stopped queue
include blktrace_api.h in headers_install
...
and as Ingo says: "these are the easiest, purely independent x86 topics
with no conflicts, in one nice Octopus merge".
* 'x86-v28-for-linus-phase1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (147 commits)
x86: mtrr_cleanup: treat WRPROT as UNCACHEABLE
x86: mtrr_cleanup: first 1M may be covered in var mtrrs
x86: mtrr_cleanup: print out correct type v2
x86: trivial printk fix in efi.c
x86, debug: mtrr_cleanup print out var mtrr before change it
x86: mtrr_cleanup try gran_size to less than 1M, v3
x86: mtrr_cleanup try gran_size to less than 1M, cleanup
x86: change MTRR_SANITIZER to def_bool y
x86, debug printouts: IOMMU setup failures should not be KERN_ERR
x86: export set_memory_ro and set_memory_rw
x86: mtrr_cleanup try gran_size to less than 1M
x86: mtrr_cleanup prepare to make gran_size to less 1M
x86: mtrr_cleanup safe to get more spare regs now
x86_64: be less annoying on boot, v2
x86: mtrr_cleanup hole size should be less than half of chunk_size, v2
x86: add mtrr_cleanup_debug command line
x86: mtrr_cleanup optimization, v2
x86: don't need to go to chunksize to 4G
x86_64: be less annoying on boot
x86, olpc: fix endian bug in openfirmware workaround
...
Linus Torvalds [Fri, 10 Oct 2008 15:00:17 +0000 (08:00 -0700)]
PnP: move pnpacpi/pnpbios_init to after PCI init
We already did that a long time ago for pnp_system_init, but
pnpacpi_init and pnpbios_init remained as subsys_initcalls, and get
linked into the kernel before the arch-specific routines that finalize
the PCI resources (pci_subsys_init).
This means that the PnP routines would either register their resources
before the PCI layer could, or would be unable to check whether a PCI
resource had already been registered. Both are problematic.
I wanted to do this before 2.6.27, but every time we change something
like this, something breaks. That said, _every_ single time we trust
some firmware (like PnP tables) more than we trust the hardware itself
(like PCI probing), the problems have been worse.
Linus Torvalds [Fri, 10 Oct 2008 14:46:45 +0000 (07:46 -0700)]
Merge branch 'upstream-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-2.6.28' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ata_piix: IDE Mode SATA patch for Intel Ibex Peak DeviceIDs
libata-eh: clear UNIT ATTENTION after reset
ata_piix: add Hercules EC-900 mini-notebook to ich_laptop short cable list
libata: reorder ata_device to remove 8 bytes of padding on 64 bits
[libata] pata_bf54x: Add proper PM operation
pata_sil680: convert CONFIG_PPC_MERGE to CONFIG_PPC
libata: Implement disk shock protection support
[libata] Introduce ata_id_has_unload()
PATA: RPC now selects HAVE_PATA_PLATFORM for pata platform driver
ata_piix: drop merged SCR access and use slave_link instead
libata: implement slave_link
libata: misc updates to prepare for slave link
libata: reimplement link iterator
libata: make SCR access ops per-link
Milan Broz [Fri, 10 Oct 2008 12:37:08 +0000 (13:37 +0100)]
dm crypt: avoid unnecessary wait when splitting bio
Don't wait between submitting crypt requests for a bio unless
we are short of memory.
There are two situations when we must split an encrypted bio:
1) there are no free pages;
2) the new bio would violate underlying device restrictions
(e.g. max hw segments).
In case (2) we do not need to wait.
Add output variable to crypt_alloc_buffer() to distinguish between
these cases.
Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Milan Broz [Fri, 10 Oct 2008 12:37:07 +0000 (13:37 +0100)]
dm crypt: fix async inc_pending
The pending reference count must be incremented *before* the async work is
queued to another thread, not after. Otherwise there's a race if the
work completes and decrements the reference count before it gets incremented.
Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Make the caller reponsible for incrementing the pending count before calling
kcryptd_crypt_write_io_submit() in the non-async case to bring it into line
with the async case.
Jonathan Brassow [Fri, 10 Oct 2008 12:36:59 +0000 (13:36 +0100)]
dm raid1: kcopyd should stop on error if errors handled
dm-raid1 is setting the 'DM_KCOPYD_IGNORE_ERROR' flag unconditionally
when assigning kcopyd work. kcopyd is responsible for copying an
assigned section of disk to one or more other disks. The
'DM_KCOPYD_IGNORE_ERROR' flag affects kcopyd in the following way:
When not set:
kcopyd will immediately stop the copy operation when an error is
encountered.
When set:
kcopyd will try to proceed regardless of errors and try to continue
copying any remaining amount.
Since dm-raid1 tracks regions of the address space that are (or
are not) in sync and it now has the ability to handle these
errors, we can safely enable this optimization. This optimization
is conditional on whether mirror error handling has been enabled.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Chien Tung [Fri, 10 Oct 2008 00:41:05 +0000 (17:41 -0700)]
RDMA/nes: Fix slab corruption
Referencing cm_node after it is freed via rem_ref_cm_node() causes a
slab corruption. There is no need to set cm_node->cm_id to NULL in
mini_cm_close().
Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Linus Torvalds [Thu, 9 Oct 2008 21:04:54 +0000 (14:04 -0700)]
Don't allow splice() to files opened with O_APPEND
This is debatable, but while we're debating it, let's disallow the
combination of splice and an O_APPEND destination.
It's not entirely clear what the semantics of O_APPEND should be, and
POSIX apparently expects pwrite() to ignore O_APPEND, for example. So
we could make up any semantics we want, including the old ones.
But Miklos convinced me that we should at least give it some thought,
and that accepting writes at arbitrary offsets is wrong at least for
IS_APPEND() files (which always have O_APPEND set, even if the reverse
isn't true: you can obviously have O_APPEND set on a regular file).
So disallow O_APPEND entirely for now. I doubt anybody cares, and this
way we have one less gray area to worry about.
Matt Mackall [Wed, 8 Oct 2008 19:51:57 +0000 (14:51 -0500)]
SLOB: fix bogus ksize calculation fix
This fixes the previous fix, which was completely wrong on closer
inspection. This version has been manually tested with a user-space
test harness and generates sane values. A nearly identical patch has
been boot-tested.
The problem arose from changing how kmalloc/kfree handled alignment
padding without updating ksize to match. This brings it in sync.
Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
hwmon: (abituguru3) Enable DMI probing feature on Abit AT8 32X
Enable driver checking of the DMI product name (when enabled) on
an Abit AT8 32X, instead of falling back to a manual probe. This
eliminates false negatives and eventually will help avoid
unnecessary bus probes on unsupported mainboards.
Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk> Tested-by: Daniel Exner <dex@dragonslave.de> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
hwmon: (abituguru3) Enable reading from AUX3 fan on Abit AT8 32X
The table for the Abit AT8 32X was incorrectly missing an entry
for the sixth ("AUX3") fan. Add this entry, exporting the fan
reading to userspace.
Closes lm-sensors.org ticket #2339.
Signed-off-by: Alistair John Strachan <alistair@devzero.co.uk> Tested-by: Daniel Exner <dex@dragonslave.de> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Thu, 9 Oct 2008 13:33:58 +0000 (15:33 +0200)]
hwmon: (it87) Prevent power-off on Shuttle SN68PT
On the Shuttle SN68PT, FAN_CTL2 is apparently not connected to a fan,
but to something else. One user has reported instant system power-off
when changing the PWM2 duty cycle, so we disable it.
I use the board name string as the trigger in case the same board is
ever used in other systems.
This closes lm-sensors ticket #2349:
pwmconfig causes a hard poweroff
http://www.lm-sensors.org/ticket/2349
Corentin Chary [Thu, 9 Oct 2008 13:33:57 +0000 (15:33 +0200)]
eeepc-laptop: Fix hwmon interface
Creates a name file in the sysfs directory, that
is needed for the libsensors library to work.
Also rename fan1_pwm to pwm1 and scale its value as needed.
This fixes bug #11520:
http://bugzilla.kernel.org/show_bug.cgi?id=11520
Signed-off-by: Corentin Chary <corentincj@iksaif.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>
Randy Dunlap [Thu, 9 Oct 2008 08:42:38 +0000 (10:42 +0200)]
block_dev: fix kernel-doc in new functions
Fix kernel-doc in new functions:
Error(mmotm-2008-1002-1617//fs/block_dev.c:895): duplicate section name 'Description'
Error(mmotm-2008-1002-1617//fs/block_dev.c:924): duplicate section name 'Description'
Warning(mmotm-2008-1002-1617//fs/block_dev.c:1282): No description found for parameter 'pathname'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Andrew Patterson <andrew.patterson@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Denis ChengRq [Thu, 9 Oct 2008 06:57:05 +0000 (08:57 +0200)]
block: mark bio_split_pool static
Since all bio_split calls refer the same single bio_split_pool, the bio_split
function can use bio_split_pool directly instead of the mempool_t parameter;
then the mempool_t parameter can be removed from bio_split param list, and
bio_split_pool is only referred in fs/bio.c file, can be marked static.
block: Switch blk_integrity_compare from bdev to gendisk
The DM and MD integrity support now depends on being able to use
gendisks instead of block_devices when comparing integrity profiles.
Change function parameters accordingly.
Also update comparison logic so that two NULL profiles are a valid
configuration.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
A filesystem might supply its own integrity metadata. Introduce a
flag that indicates whether the filesystem or the block layer owns the
integrity buffer.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Kiyoshi Ueda [Wed, 1 Oct 2008 14:14:46 +0000 (10:14 -0400)]
block: remove end_{queued|dequeued}_request()
This patch removes end_queued_request() and end_dequeued_request(),
which are no longer used.
As a results, users of __end_request() became only end_request().
So the actual code in __end_request() is moved to end_request()
and __end_request() is removed.
Kiyoshi Ueda [Wed, 1 Oct 2008 14:13:44 +0000 (10:13 -0400)]
block: change elevator to use __blk_end_request()
This patch converts elevator to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
Related 'uptodate' arguments is converted to 'error'.
Kiyoshi Ueda [Wed, 1 Oct 2008 14:13:02 +0000 (10:13 -0400)]
gdrom: change to use __blk_end_request()
This patch converts gdrom to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
gd.transfer is '1' in error cases and '0' in non-error cases,
so gdrom hasn't been propagating any error code to the block layer.
We can just convert error cases to '-EIO'.
Kiyoshi Ueda [Wed, 1 Oct 2008 14:11:20 +0000 (10:11 -0400)]
virtio_blk: change to use __blk_end_request()
This patch converts virtio_blk to use __blk_end_request() directly
so that end_{queued|dequeued}_request() can be removed.
Related 'uptodate' argument is converted to 'error'.
Jens Axboe [Wed, 1 Oct 2008 14:16:25 +0000 (16:16 +0200)]
blktrace: use BLKTRACE_BDEV_SIZE as the name size for setup structure
Define as 32, which is is what BDEVNAME_SIZE is/was as well. This keeps
the user interface the same and gets rid of the difference between
kernel and user api here.
Kiyoshi Ueda [Wed, 1 Oct 2008 14:12:15 +0000 (16:12 +0200)]
block: add lld busy state exporting interface
This patch adds an new interface, blk_lld_busy(), to check lld's
busy state from the block layer.
blk_lld_busy() calls down into low-level drivers for the checking
if the drivers set q->lld_busy_fn() using blk_queue_lld_busy().
This resolves a performance problem on request stacking devices below.
Some drivers like scsi mid layer stop dispatching request when
they detect busy state on its low-level device like host/target/device.
It allows other requests to stay in the I/O scheduler's queue
for a chance of merging.
Request stacking drivers like request-based dm should follow
the same logic.
However, there is no generic interface for the stacked device
to check if the underlying device(s) are busy.
If the request stacking driver dispatches and submits requests to
the busy underlying device, the requests will stay in
the underlying device's queue without a chance of merging.
This causes performance problem on burst I/O load.
With this patch, busy state of the underlying device is exported
via q->lld_busy_fn(). So the request stacking driver can check it
and stop dispatching requests if busy.
The underlying device driver must return the busy state appropriately:
1: when the device driver can't process requests immediately.
0: when the device driver can process requests immediately,
including abnormal situations where the device driver needs
to kill all requests.
Elias Oltmanns [Wed, 1 Oct 2008 14:02:33 +0000 (16:02 +0200)]
block: Fix blk_start_queueing() to not kick a stopped queue
blk_start_queueing() should act like the generic queue unplugging
and kicking and ignore a stopped queue. Such a queue may not be
run until after a call to blk_start_queue().
Signed-off-by: Elias Oltmanns <eo@nebensachen.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Sven Schuetz [Fri, 26 Sep 2008 08:58:02 +0000 (10:58 +0200)]
include blktrace_api.h in headers_install
This header file is of interest for user space programming, i.e.
for tools that process blktrace data.
We would like to use it for a tool on-top of blktrace which processes
data provided by blktrace. For this purpose, it would be helpful
if the blktrace API would make it to /usr/include/linux.
The git tree for the blktrace tools comes with its own copy of this header
file. I didn't manage to replace that copy with the file generated
by the patch below yet. A few more cleanups would be needed.
For example, the blktrace ioctl numbers, which are currently defined in
usr/include/fs.h, might need to be moved. Should be feasible, though.
Signed-off-by: Sven Schuetz <sven@linux.vnet.ibm.com> Signed-off-by: Martin Peschke <mp3@de.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
By only allowing async IO to consume 3/4 ths of the tag depth, we
always have slots free to serve sync IO. This is important to avoid
having writes fill the entire tag queue, thus starving reads.
Original patch and idea from Linus Torvalds <torvalds@linux-foundation.org>
We really need to know about the hardware tagging support as well,
since if the SSD does not do tagging then we still want to idle.
Otherwise have the same dependent sync IO vs flooding async IO
problem as on rotational media.
block: add queue flag for SSD/non-rotational devices
We don't want to idle in AS/CFQ if the device doesn't have a seek
penalty. So add a QUEUE_FLAG_NONROT to indicate a non-rotational
device, low level drivers should set this flag upon discovery of
an SSD or similar device type.