Bugfix: With the curent linux-2.6.14-rc2-git6, EEH errors are
ignored because thier detection requires an unused, uninitialized
flag to be set. This patch removes the unused flag.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Linas Vepstas [Fri, 4 Nov 2005 00:49:38 +0000 (18:49 -0600)]
[PATCH] ppc64: bugfix: crash on PCI hotplug
09-hotplug-bugfix.patch
In the current 2.6.14-rc2-git6 kernel, performing a Dynamic LPAR Add
of a hotplug slot will crash the system, with the following (abbreviated)
stack trace:
The root cause was that __init __alloc_bootmem() was called long after
boot had finished, resulting in a crash because this routine is undefined
after boot time. The patch below fixes this crash, and adds some docs to
clarify the code.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Linas Vepstas [Fri, 4 Nov 2005 00:49:31 +0000 (18:49 -0600)]
[PATCH] ppc64: escape hatch for spinning interrupt deadlocks
08-eeh-spin-counter.patch
One an EEH event is triggers, all further I/O to a device is blocked (until
reset). Bad device drivers may end up spinning in their interrupt handlers,
trying to read an interrupt status register that will never change state.
This patch moves that spin counter to a per-device structure, and adds
some diagnostic prints to help locate the bad driver.
Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Linas Vepstas [Fri, 4 Nov 2005 00:49:23 +0000 (18:49 -0600)]
[PATCH] ppc64: serialize reports of PCI errors
07-eeh-report-race.patch
When a PCI slot is isolated, all PCI functions under that slot are affected.
If hese functions have separate device drivers, the EEH isolation event
might be reported multiple times. This patch adds a lock to prevent the
racing of such multiple reports. It also marks every device under the slot
as having experienced an EEH event, so that multiple reports may be
recognized more easily.
Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Linas Vepstas [Fri, 4 Nov 2005 00:49:15 +0000 (18:49 -0600)]
[PATCH] ppc64: avoid PCI error reporting for empty slots
06-eeh-empty-slot-error.patch
Performing PCI config-space reads to empty PCI slots can lead to reports of
"permanent failure" from the firmware. Ignore permanent failures on empty slots.
Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Linas Vepstas [Fri, 4 Nov 2005 00:48:52 +0000 (18:48 -0600)]
[PATCH] ppc64: PCI error rate statistics
04-eeh-statistics.patch
This minor patch adds some statistics-gathering counters that allow the
behaviour of the EEH subsystem o be monitored. While far from perfect,
it does provide a rudimentary device that makes understanding of the
current state of the system a bit easier.
Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Linas Vepstas [Fri, 4 Nov 2005 00:48:45 +0000 (18:48 -0600)]
[PATCH] ppc64: PCI address cache minor fixes
03-eeh-addr-cache-cleanup.patch
This is a minor patch to clean up a buglet related to the PCI address cache.
(The buglet doesn't manifes itself unless there are also bugs elsewhere,
which is why its minor.). Also:
-- Improved debug printing.
-- Declare some private routines as static
-- Adds reference counting to struct pci_dn->pcidev structure
Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Linas Vepstas [Fri, 4 Nov 2005 00:47:50 +0000 (18:47 -0600)]
[PATCH] ppc64: misc minor cleanup
02-eeh-minor-cleanup.patch
This patch performs some minor cleanup of the eeh.c file, including:
-- trim some trailing whitespace
-- remove extraneous #includes
-- use the macro PCI_DN uniformly, instead of the void pointer chase.
-- typos in comments
-- improved debug printk's
Signed-off-by: Linas Vepstas <linas@linas.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
[PATCH] ppc64: Don't panic when early __ioremap fails
Early calls to __ioremap() will panic if the hash insertion fails. This
patch makes them return NULL instead. It happens with some pSeries users
who enabled CONFIG_BOOTX_TEXT. The later is getting an incorrect address
for the fame buffer and the hash insertion fails. With this patch, it
will display an error instead of crashing at boot.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Olaf Hering [Wed, 9 Nov 2005 19:54:43 +0000 (20:54 +0100)]
[PATCH] ppc64 boot: fix compile warnings
Fix a few compile warnings
arch/ppc64/boot/addRamDisk.c:166: warning: int format, long unsigned int arg (arg 2)
arch/ppc64/boot/addRamDisk.c:170: warning: int format, long unsigned int arg (arg 2)
arch/ppc64/boot/addRamDisk.c:265: warning: unsigned int format, long unsigned int arg (arg 2)
arch/ppc64/boot/addRamDisk.c:302: warning: unsigned int format, long unsigned int arg (arg 3)
Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
Olaf Hering [Wed, 9 Nov 2005 19:53:43 +0000 (20:53 +0100)]
[PATCH] ppc64 boot: remove sysmap from required filenames
A stripped vmlinux does not contain enough symbols to recreate the
System.map. The System.map file is only used to determine the end of
the runtime memory size. This is the same value (rounded up to
PAGE_SIZE) as ->memsiz in the ELF program header.
Also, the target vmlinux.initrd doesnt work in 2.6.14:
arch/ppc64/boot/addRamDisk arch/ppc64/boot/ramdisk.image.gz vmlinux.strip arch/ppc64/boot/vmlinux.initrd
Name of vmlinux output file missing.
Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Paul Mackerras <paulus@samba.org>
Marcelo Tosatti [Wed, 9 Nov 2005 13:00:16 +0000 (11:00 -0200)]
[PATCH] fs_enet build fix
Due to the recent update of the platform code, some platform device
drivers fail to compile. This fix is for fs_enet, adding #include of a
new header, to which a number of platform stuff has been relocated.
Signed-off-by: Vitaly Bordug <vbordug@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Matt Porter [Wed, 9 Nov 2005 13:42:05 +0000 (06:42 -0700)]
[PATCH] ppc32: fix ppc44x fpu build
Fixes ppc44x fpu support that broke from a bad arch/powerpc merge.
Instead of adding KernelFP back in (which duplicates code) we use
the same kernel fpu unavailable handler as classic PPC processors.
Signed-off-by: Matt Porter <mporter@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
David Gibson [Wed, 9 Nov 2005 02:38:01 +0000 (13:38 +1100)]
[PATCH] powerpc: Move various ppc64 files with no ppc32 equivalent to powerpc
This patch moves a bunch of files from arch/ppc64 and
include/asm-ppc64 which have no equivalents in ppc32 code into
arch/powerpc and include/asm-powerpc. The file affected are:
abs_addr.h
compat.h
lppaca.h
paca.h
tce.h
cpu_setup_power4.S
ioctl32.c
firmware.c
pacaData.c
The only changes apart from the move and corresponding Makefile
changes are:
- #ifndef/#define in includes updated to _ASM_POWERPC_ form
- trailing whitespace removed
- comments giving full paths removed
- pacaData.c renamed paca.c to remove studlyCaps
- Misplaced { moved in lppaca.h
Built and booted on POWER5 LPAR (ARCH=powerpc and ARCH=ppc64), built
for 32-bit powermac (ARCH=powerpc).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
David Gibson [Wed, 9 Nov 2005 02:04:06 +0000 (13:04 +1100)]
[PATCH] powerpc: Merge current.h
This patch merges current.h. This is a one-big-ifdef merge, but both
versions are so tiny, I think we can live with it. While we're at it,
we get rid of the fairly pointless redirection through get_current()
in the ppc64 version.
Built and booted on POWER5 LPAR (ARCH=powerpc & ARCH=ppc64). Built
for 32-bit pmac (ARCH=powerpc & ARCH=ppc).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
David Gibson [Wed, 9 Nov 2005 00:21:07 +0000 (11:21 +1100)]
[PATCH] powerpc: Merge signal.h
Having already merged the ppc and ppc64 versions of signal.c, this
patch finishes the job by merging signal.h. The two versions were
almost identical already. Notable changes:
- We use BITS_PER_LONG to correctly size sigset_t
- Remove some uneeded #includes and struct forward
declarations. This does mean adding an include to signal_32.c which
relied on the indirect inclusion of sigcontext.h
- As the ppc64 version, the merged signal.h has prototypes for
do_signal() and do_signal32(). Thus remove extra prototypes from
ppc_ksyms.c which had them directly.
Built and booted on POWER5 LPAR (ARCH=ppc64 and ARCH=powerpc). Built
for 32-bit powermac (ARCH=ppc and ARCH=powerpc) and Walnut (ARCH=ppc).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Linus Torvalds [Wed, 9 Nov 2005 22:56:00 +0000 (14:56 -0800)]
Fix AGP compile on non-x86 architectures
AGP shouldn't use "global_flush_tlb()" to flush the AGP mappings, that i
spurely an x86'ism. The proper AGP mapping flusher that should be used
is "flush_agp_mappings()", which on x86 obviously happens to do a global
TLB flush.
This makes AGP (or at least the config _I_ happen to use) compile again
on ppc64.
[NETFILTER] ctnetlink: ICMP_ID is u_int16_t not u_int8_t.
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER] ctnetlink: Fix oops when no ICMP ID info in message
This patch fixes an userspace triggered oops. If there is no ICMP_ID
info the reference to attr will be NULL.
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER] ctnetlink: Add support to identify expectations by ID's
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER] ctnetlink: propagate error instaed of returning -EPERM
Propagate the error to userspace instead of returning -EPERM if the get
conntrack operation fails.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER] ctnetlink: return -EINVAL if size is wrong
Return -EINVAL if the size isn't OK instead of -EPERM.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER]: stop tracking ICMP error at early point
Currently connection tracking handles ICMP error like normal packets
if it failed to get related connection. But it fails that after all.
This makes connection tracking stop tracking ICMP error at early point.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 9 Nov 2005 21:02:16 +0000 (13:02 -0800)]
[NETFILTER] nfnetlink: only load subsystems if CAP_NET_ADMIN is set
Without this patch, any user can cause nfnetlink subsystems to be
autoloaded. Those subsystems however could add significant processing
overhead to packet processing, and would refuse any configuration messages
from non-CAP_NET_ADMIN processes anyway.
This patch follows a suggestion from Patrick McHardy.
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Philip Craig [Wed, 9 Nov 2005 21:01:53 +0000 (13:01 -0800)]
[NETFILTER] PPTP helper: fix PNS-PAC expectation call id
The reply tuple of the PNS->PAC expectation was using the wrong call id.
So we had the following situation:
- PNS behind NAT firewall
- PNS call id requires NATing
- PNS->PAC gre packet arrives first
then the PNS->PAC expectation is matched, and the other expectation
is deleted, but the PAC->PNS gre packets do not match the gre conntrack
because the call id is wrong.
We also cannot use ip_nat_follow_master().
Signed-off-by: Philip Craig <philipc@snapgear.com> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER] ctnetlink: get_conntrack can use GFP_KERNEL
ctnetlink_get_conntrack is always called from user context, so GFP_KERNEL
is enough.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Kill some useless headers included in ctnetlink. They aren't used in any
way.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER] ctnetlink: add module alias to fix autoloading
Add missing module alias. This is a must to load ctnetlink on demand. For
example, the conntrack tool will fail if the module isn't loaded.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER] ctnetlink: add marking support from userspace
This patch adds support for conntrack marking from user space.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER] ctnetlink: check if protoinfo is present
This fixes an oops triggered from userspace. If we don't pass information
about the private protocol info, the reference to attr will be NULL. This is
likely to happen in update messages.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER]: refcount leak of proto when ctnetlink dumping tuple
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[NETFILTER]: packet counter of conntrack is 32bits
The packet counter variable of conntrack was changed to 32bits from 64bits.
This follows that change.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 9 Nov 2005 19:33:07 +0000 (11:33 -0800)]
Fix ptrace self-attach rule
Before we did CLONE_THREAD, the way to check whether we were attaching
to ourselves was to just check "current == task", but with CLONE_THREAD
we should check that the thread group ID matches instead.
NeilBrown [Wed, 9 Nov 2005 05:39:45 +0000 (21:39 -0800)]
[PATCH] md: document sysfs usage of md, and make a couple of small refinements
Document in Documentation/md.txt the files that now appear in sysfs, and make
a couple of small refinements to exactly when 'level' and 'raid_disks' are
empty, to make it match the documentation.
Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:44 +0000 (21:39 -0800)]
[PATCH] md: improve 'scan_mode' and rename it to 'sync_action'
The current sync_action for an array can be one of
idle - nothing happening
resync - reduncancy being recalcualted
recover - missing device being recoverred to spare
check - user initiated check of redundancy
repair - like resync but user-initiated and ignores
bitmap optimisation.
Each of these strings can also be written to the 'sync_action' file to cause
that action to happen (if appropriate).
While 'sync' is not technically correct, as a recovery is *not* a 'sync', I
think it is the most servicable word here. Also 'action' is a strong word
than 'mode'.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:42 +0000 (21:39 -0800)]
[PATCH] md: ignore auto-readonly flag for arrays where it isn't meaningful
The 'auto-readonly' flag (which suppresses resync and superblock updates until
the first write) is not meaningful for personalities that don't support resync
or superblock writes (raid0, linear, etc).
So clear the setting early to avoid it confusing anything - e.g. appearing in
/proc/mdstat
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:41 +0000 (21:39 -0800)]
[PATCH] md: only try to print recovery/resync status for personalities that support recovery
The introduction of 'resync=PENDING' (for read-only devices) caused that
message to appear for non-syncable arrays like raid0 and linear. Simplest
thing is to not try to print any resync info unless the personality clearly
supports it.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:40 +0000 (21:39 -0800)]
[PATCH] md: split off some md attributes in sysfs to a separate group
Some, but not all, md array support data redundancy and hence support checking
and restoring that redundancy (resync, rebuild).
Some attributes apply specifically to functions involving this redundancy, and
so should only appear for md arrays for which they are meaningful. i.e. they
should not appear for raid0, linear, multpath, faulty.
This patch separates these into a distinct group and creates the group only if
the personality supports sync_request.
Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:39 +0000 (21:39 -0800)]
[PATCH] md: fix some locking and module refcounting issues with md's use of sysfs
1/ I really should be using the __ATTR macros for defining attributes, so
that the .owner field get set properly, otherwise modules can be removed
while sysfs files are open. This also involves some name changes of _show
routines.
2/ Always lock the mddev (against reconfiguration) for all sysfs attribute
access. This easily avoid certain races and is completely consistant with
other interfaces (ioctl and /proc/mdstat both always lock against
reconfiguration).
3/ raid5 attributes must check that the 'conf' structure actually exists
(the array could have been stopped while an attribute file was open).
4/ A missing 'kfree' from when the raid5_conf_t was converted to have a
kobject embedded, and then converted back again.
Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:38 +0000 (21:39 -0800)]
[PATCH] md: make manual repair work for raid1
Raid1 currently optimises resync using the intent bitmap etc. This
optimisation is not wanted when we explicitly request a repair through sysfs,
so add appropriate checks.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:37 +0000 (21:39 -0800)]
[PATCH] md: make sure /block link in /sys/.../md/ goes to correct devices
If a block_device is a partition, then it's kobject is
bdev->bd_part->kobj
otherwise (if it is a full device), the kobject is
bdev->bd_disk->kobj
As md wants back-links to the correct object (whether partition or not), we
need to respect this difference... (Thus current code shows a link to the
whole device, whether we are using a partition or not, which is wrong).
Signed-off-by: Neil Brown <neilb@suse.de> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:36 +0000 (21:39 -0800)]
[PATCH] md: allow md arrays to be started read-only (module parameter).
When an md array is started, the superblock will be written, and resync may
commense. This is not good if you want to be completely read-only as, for
example, when preparing to resume from a suspend-to-disk image.
So introduce a module parameter "start_ro" which can be set
to '1' at boot, at module load, or via
/sys/module/md_mod/parameters/start_ro
When this is set, new arrays get an 'auto-ro' mode, which disables all
internal io (superblock updates, resync, recovery) and is automatically
switched to 'rw' when the first write request arrives.
The array can be set to true 'ro' mode using 'mdadm -r' before the first
write request, or resync can be started without a write using 'mdadm -w'.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:35 +0000 (21:39 -0800)]
[PATCH] md: Remove attempt to use dynamic names in sysfs for component devices on an MD array.
With version-0.90 superblock, component devices on an md device to not have
any stable name related to the array -(version-1 assigns a fixed index when
a device is added to an array, and this remains despit any hot-swap).
The intial code for making these devices appear in sysfs used dynamic
names, which would change whenever a hot-spare was swapped for a failed or
missing device. This turns out not to be practical in sysfs for a number
of reasons.
This patch changes then naming of component devices to be based on the
result of 'bdevname'. This is stable and should be unique.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:34 +0000 (21:39 -0800)]
[PATCH] md: support BIO_RW_BARRIER for md/raid1
We can only accept BARRIER requests if all slaves handle
barriers, and that can, of course, change with time....
So we keep track of whether the whole array seems safe for barriers,
and also whether each individual rdev handles barriers.
We initially assumes barriers are OK.
When writing the superblock we try a barrier, and if that fails, we flag
things for no-barriers. This will usually clear the flags fairly quickly.
If writing the superblock finds that BIO_RW_BARRIER is -ENOTSUPP, we need to
resubmit, so introduce function "md_super_wait" which waits for requests to
finish, and retries ENOTSUPP requests without the barrier flag.
When writing the real raid1, write requests which were BIO_RW_BARRIER but
which aresn't supported need to be retried. So raid1d is enhanced to do this,
and when any bio write completes (i.e. no retry needed) we remove it from the
r1bio, so that devices needing retry are easy to find.
We should hardly ever get -ENOTSUPP errors when writing data to the raid.
It should only happen if:
1/ the device used to support BARRIER, but now doesn't. Few devices
change like this, though raid1 can!
or
2/ the array has no persistent superblock, so there was no opportunity to
pre-test for barriers when writing the superblock.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:31 +0000 (21:39 -0800)]
[PATCH] md: improvements to raid5 handling of read errors
Two refinements to the 'attempt-overwrite-on-read-error' mechanism.
1/ If the array is read-only, don't attempt an over-write.
2/ If there are more than max_nr_stripes read errors on a device with
no success, fail the drive. This will make sure a dead
drive will be eventually kicked even when we aren't trying
to rewrite (which would normally kick a dead drive more quickly.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:30 +0000 (21:39 -0800)]
[PATCH] md: change raid5 sysfs attribute to not create a new directory
There isn't really a need for raid5 attributes to be an a subdirectory,
so this patch moves them from
/sys/block/mdX/md/raid5/attribute
to
/sys/block/mdX/md/attribute
This suggests that all md personalities should co-operate about
namespace usage, but that shouldn't be a problem.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:26 +0000 (21:39 -0800)]
[PATCH] md: teach raid5 the difference between 'check' and 'repair'.
With this, raid5 can be asked to check parity without repairing it. It also
keeps a count of the number of incorrect parity blocks found (mismatches) and
reports them through sysfs.
Signed-off-by: Neil Brown <neilb@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:26 +0000 (21:39 -0800)]
[PATCH] md: allow a manual resync with md
You can trigger a 'check' with
echo check > /sys/block/mdX/md/scan_mode
or a check-and-repair errors with
echo repair > /sys/block/mdX/md/scan_mode
and read the current state from the same file.
Note: personalities need to know the different between 'check' and 'repair',
but don't yet. Until they do, 'check' will be the same as 'repair' and will
just do a normal resync pass.
Signed-off-by: Neil Brown <neilb@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:25 +0000 (21:39 -0800)]
[PATCH] md: add kobject/sysfs support to raid5
/sys/block/mdX/md/raid5/
contains raid5-related attributes.
Currently
stripe_cache_size
is number of entries in stripe cache, and is settable.
stripe_cache_active
is number of active entries, and in only readable.
Signed-off-by: Neil Brown <neilb@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:24 +0000 (21:39 -0800)]
[PATCH] md: extend md sysfs support to component devices.
Each device in an md array how has a corresponding
/sys/block/mdX/md/devNN/
directory which can contain attributes. Currently there is only 'state' which
summarises the state, nd 'super' which has a copy of the superblock, and
'block' which is a symlink to the block device.
Also, /sys/block/mdX/md/rdNN represents slot 'NN' in the array, and is a
symlink to the relevant 'devNN'. Obviously spare devices do not have a slot
in the array, and so don't have such a symlink.
Signed-off-by: Neil Brown <neilb@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Wed, 9 Nov 2005 05:39:22 +0000 (21:39 -0800)]
[PATCH] md: better handling of readerrors with raid5.
This patch changes the behaviour of raid5 when it gets a read error.
Instead of just failing the device, it tried to find out what should have
been there, and writes it over the bad block. For some media-errors, this
has a reasonable chance of fixing the error. If the write succeeds, and a
subsequent read succeeds as well, raid5 decided the address is OK and
conitnues.
Instead of failing a drive on read-error, we attempt to re-write the block,
and then re-read. If that all works, we allow the device to remain in the
array.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] fbdev: Possible endian fix in cfbimageblit
Fix possible endian bug(?) when bit testing in slow_imageblit(). This
function is rarely called (only if (width * bpp) % 32 != 0) thus the bug is
not triggered.
However, if the console is rotated at 90 or 270 degrees, the height becomes
the width, and a variety of fonts have heights that will force a call to
slow_imageblit().
[PATCH] fbcon: Console Rotation - Add ability to control rotation via sysfs
Add ability to set rotation via sysfs. The attributes are located in
/sys/class/graphics/fb[n] and accepts 0 - unrotated; 1 - clockwise; 2 - upside
down; 3 - counterclockwise.
The attributes are:
con_rotate (r/w) - set rotation of the active console
con_rotate_all (w) - set rotation of all consoles
rotate (r/w) - set rotation of the framebuffer, if supported.
Currently, none of the drivers support this.
This is probably temporary, since con_rotate and con_rotate_all are
console-specific and has no business being under the fb device. However,
until the console layer acquires it's own sysfs class, these attributes will
temporarily reside here.
[PATCH] fbcon: Console Rotation - Add support to rotate font bitmap
Add support to rotate the font bitmap. To save on processing time, the entire
fontdata will be rotated on a console switch, then stored in a buffer private
to fbcon. To further save on processing, the fontdata will only be rotated if
the font has changed or if the angle of rotation has changed. Only a single
copy of the rotated fontdata will be kept.
[PATCH] fbcon: Console Rotation - Add support to rotate the logo
Add support for rotating and positioning of the logo. Rotation and position
depends on 'int rotate' parameter added to fb_prepare_logo() and
fb_show_logo().
[PATCH] fbcon: Console Rotation - Prepare fbcon for console rotation
This patch series implements generic code to rotate the console at 90, 180,
and 270 degrees. The implementation is completely done in the framebuffer
console level, thus no changes to the framebuffer layer or to the drivers
are needed.
Console rotation is required by some Sharp-based devices where the natural
orientation of the display is not at 0 degrees. Also, users that have
displays that can pivot will benefit by having a console in portrait mode
if they so desire.
The choice to implement the code in the console layer rather than in the
framebuffer layer is due to the following reasons:
- it's fast
- it does not require driver changes
- it can coexist with devices that can rotate the display at the hardware level
- it complements graphics applications that can do display rotation
The changes to core fbcon are minimal-- recognition of the console
rotation angle so it can swap directions, origins and axes (xres vs yres,
xpanstep vs ypanstep, xoffset vs yoffset, etc) and storage of the rotation
angle per display. The bulk of the code that does the actual drawing to the
screen are placed in separate files. Each angle of rotation has separate
methods (bmove, clear, putcs, cursor, update_start which is derived from
update_var, and clear_margins). To mimimize processing time, the fontdata
are pre-rotated at each console switch (only if the font or the angle has
changed).
The option can be compiled out (CONFIG_FRAMEBUFFER_CONSOLE_ROTATION = n) if
rotation is not needed.
Choosing the rotation angle can be done in several ways:
1. boot option fbcon=rotate:n, where
n = 0 - normal
n = 1 - 90 degrees (clockwise)
n = 2 - 180 degrees (upside down)
n = 3 - 270 degrees (counterclockwise)
2. echo n > /sys/class/graphics/fb[num]/con_rotate
where n is the same as described above. It sets the angle of rotation
of the current console
3 echo n > /sys/class/graphics/fb[num]/con_rotate_all
where n is the same as described above. Globally sets the angle of
rotation.
GOTCHAS:
The option, especially at angles of 90 and 270 degrees, will exercise
the least used code of drivers. Namely, at these angles, panning is done
in the x-axis, so it can reveal bugs in the driver if xpanstep is set
incorrectly. A workaround is to set xpanstep = 0.
Secondly, at these angles, the framebuffer memory access can be
unaligned if (fontheight * bpp) % 32 ~= 0 which can reveal bugs in the drivers
imageblit, fillrect and copyarea functions. (I think cfbfillrect may have
this buglet). A workaround is to use a standard 8x16 font.
Speed:
The scrolling speed difference between 0 and 180 degrees is minimal,
somewhere areound 1-2%. At 90 or 270 degress, speed drops down to a vicinity
of 30-40%. This is understandable because the blit direction is across the
framebuffer "direction." Scrolling will be helped at these angles if xpanstep
is not equal to zero, use of 8x16 fonts, and setting xres_virtual >= xres * 2.
Note: The code is tested on little-endian only, so I don't know if it will
work in big-endian. Please let me know, it will take only less than a minute
of your time.
This patch prepares fbcon for console rotation and contains the following
changes:
- add rotate field in struct fbcon_ops to keep fbcon's current rotation
angle
- add con_rotate field in struct display to store per-display rotation angle
- create a private copy of the current var to fbcon. This will prevent
fbcon from directly manipulating info->var, especially the fields xoffset,
yoffset and vmode.
- add ability to swap pertinent axes (xres, yres; xpanstep, ypanstep; etc)
depending on the rotation angle
- change global update_var() (function that sets the screen start address)
as an fbcon method update_start. This is required because the axes, start
offset, and/or direction can be reversed depending on the rotation angle.
- add fbcon method rotate_font() which will rotate each character bitmap to
the correct angle of rotation.
- add fbcon boot option 'rotate' to select the angle of rotation at bootime.
Currently does nothing until all patches are applied.
The driver unconditionally sets xpanstep to 2. However, a value of 4
empirically works better at bpp = 8, and 2 for 16 and 32. This buglet was
exposed by the rotation code.
Second fix is the unconditional call to update_start() without verifying if
the offsets are correct. Remove this call, it's not necessary and secondly,
it causes a crash with invalid values.
Nick Piggin [Wed, 9 Nov 2005 05:39:04 +0000 (21:39 -0800)]
[PATCH] sched: resched and cpu_idle rework
Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
confusion, and make their semantics rigid. Improves efficiency of
resched_task and some cpu_idle routines.
* In resched_task:
- TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
and as we hold it during resched_task, then there is no need for an
atomic test and set there. The only other time this should be set is
when the task's quantum expires, in the timer interrupt - this is
protected against because the rq lock is irq-safe.
- If TIF_NEED_RESCHED is set, then we don't need to do anything. It
won't get unset until the task get's schedule()d off.
- If we are running on the same CPU as the task we resched, then set
TIF_NEED_RESCHED and no further action is required.
- If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
after TIF_NEED_RESCHED has been set, then we need to send an IPI.
Using these rules, we are able to remove the test and set operation in
resched_task, and make clear the previously vague semantics of
POLLING_NRFLAG.
* In idle routines:
- Enter cpu_idle with preempt disabled. When the need_resched() condition
becomes true, explicitly call schedule(). This makes things a bit clearer
(IMO), but haven't updated all architectures yet.
- Many do a test and clear of TIF_NEED_RESCHED for some reason. According
to the resched_task rules, this isn't needed (and actually breaks the
assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
held). So remove that. Generally one less locked memory op when switching
to the idle thread.
- Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
most polling idle loops. The above resched_task semantics allow it to be
set until before the last time need_resched() is checked before going into
a halt requiring interrupt wakeup.
Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
can be always left set, completely eliminating resched IPIs when rescheduling
the idle task.
POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Con Kolivas <kernel@kolivas.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>