Herbert Xu [Mon, 3 Oct 2005 21:35:55 +0000 (14:35 -0700)]
[IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl
The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
introduces __in_dev_get_rcu() to cover the second case.
1) RCU with refcnt should use in_dev_get().
2) RCU without refcnt should use __in_dev_get_rcu().
3) All others must hold RTNL and use __in_dev_get_rtnl().
There is one exception in net/ipv4/route.c which is in fact a pre-existing
race condition. I've marked it as such so that we remember to fix it.
This patch is based on suggestions and prior work by Suzanne Wood and
Paul McKenney.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 3 Oct 2005 21:18:10 +0000 (14:18 -0700)]
[IPV4]: Fix "Proxy ARP seems broken"
Meelis Roos <mroos@linux.ee> wrote:
> RK> My firewall setup relies on proxyarp working. However, with 2.6.14-rc3,
> RK> it appears to be completely broken. The firewall is 212.18.232.186,
>
> Same here with some kernel between 14-rc2 and 14-rc3 - no reposnse to
> ARP on a proxyarp gateway. Sorry, no exact revison and no more debugging
> yet since it'a a production gateway.
The breakage is caused by the change to use the CB area for flagging
whether a packet has been queued due to proxy_delay. This area gets
cleared every time arp_rcv gets called. Unfortunately packets delayed
due to proxy_delay also go through arp_rcv when they are reprocessed.
In fact, I can't think of a reason why delayed proxy packets should go
through netfilter again at all. So the easiest solution is to bypass
that and go straight to arp_process.
This is essentially what would've happened before netfilter support
was added to ARP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
During the build for ARM machine type "fortunet", this error occurred:
CC net/sysctl_net.o
net/sysctl_net.c:36: error: 'core_table' undeclared here (not in a function)
It appears that the following configuration settings cause this error
due to a missing include:
CONFIG_SYSCTL=y
CONFIG_NET=y
# CONFIG_INET is not set
core_table appears to be declared in net/sock.h. if CONFIG_INET were
defined, net/sock.h would have been included via:
sysctl_net.c -> net/ip.h -> linux/ip.h -> net/sock.h
so include it directly.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 3 Oct 2005 21:13:38 +0000 (14:13 -0700)]
[INET]: speedup inet (tcp/dccp) lookups
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)
(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)
1) First some performance data :
--------------------------------
tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()
The most time critical code is :
sk_for_each(sk, node, &head->chain) {
if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
goto hit; /* You sunk my battleship! */
}
The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.
As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.
This can be problematic if some chains are very long.
2) The goal
-----------
The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.
3) Description of the patch
---------------------------
Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.
struct sock_common {
unsigned short skc_family;
volatile unsigned char skc_state;
unsigned char skc_reuse;
int skc_bound_dev_if;
struct hlist_node skc_node;
struct hlist_node skc_bind_node;
atomic_t skc_refcnt;
+ unsigned int skc_hash;
struct proto *skc_prot;
};
Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.
Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)
- Adds a prefetch(head->chain.first) in
__inet_lookup_established()/__tcp_v4_check_established() and
__inet6_lookup_established()/__tcp_v6_check_established() and
__dccp_v4_check_established() to bring into cache the first element of the
list, before the {read|write}_lock(&head->lock);
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 3 Oct 2005 21:02:39 +0000 (14:02 -0700)]
[TG3]: Refine AMD K8 write-reorder chipset test.
Test for VIA K8T800 north bridge instead of AMD K8 HyperTransport
bridge based on new information from Andi Kleen. The AMD
HyperTransport interface is not responsible for PCI transactions
and so the re-ordering is more likely done by the VIA north bridge.
This code is subject to change if we get more information from AMD
or VIA.
PCI Express devices are excluded from doing the read flush since all
chipsets in the write_reorder list are PCI chipsets.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 3 Oct 2005 20:57:23 +0000 (13:57 -0700)]
[NET]: Fix packet timestamping.
I've found the problem in general. It affects any 64-bit
architecture. The problem occurs when you change the system time.
Suppose that when you boot your system clock is forward by a day.
This gets recorded down in skb_tv_base. You then wind the clock back
by a day. From that point onwards the offset will be negative which
essentially overflows the 32-bit variables they're stored in.
In fact, why don't we just store the real time stamp in those 32-bit
variables? After all, we're not going to overflow for quite a while
yet.
When we do overflow, we'll need a better solution of course.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
[PATCH] x86_64: Fix numa node topology detection for srat based x86_64 boxes
2.6.14-rc2 does not assign cpus to proper nodeids on our em64t numa boxen.
Our boxes use acpi srat for parsing the numa information.
srat_detect_node() used phys_proc_id[] to get to the cpu's local apic id,
but phys_proc_id[] represents the cpu<->initial_apic_id mapping. The
following patch fixes this problem. Now apicid_to_node[] is properly
indexed with the local apic id.
James Bottomley [Sat, 1 Oct 2005 14:38:05 +0000 (09:38 -0500)]
[SCSI] Legacy MegaRAID: Fix READ CAPACITY
Some Legacy megaraid cards can't actually cope with the scatter/gather
version of the READ CAPACITY command (which is what we now send them
since altering all SCSI internal I/O to go via the block layer). Fix
this (and a few other broken megaraid driver assumptions) by sending
the non-sg version of the command if the sg list only has a single
element.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Catalin Marinas [Sun, 2 Oct 2005 21:34:35 +0000 (22:34 +0100)]
[ARM] 2943/1: Clear the exclusive monitor in v6_early_abort
Patch from Catalin Marinas
Data abort caused by ldrex/strex can leave the exclusive monitor in an
unpredictable state. It is recommended that a clrex/strex is performed to
clear this state.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
James Bottomley [Sun, 2 Oct 2005 20:22:35 +0000 (15:22 -0500)]
[SCSI] aic7xxx/aic79xx: fix module removal path not to panic
In these drivers, scsi_remove_host() is called too late, at the point
it is called, the driver has already shut down too far to accept any
I/O that the shutdown might generate. Any generated I/O actually
triggers a panic.
Fix this by calling scsi_remove_host() as early as possible and not
calling scsi_host_put() until just before we kfree the ahc_softc.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
James Bottomley [Sun, 2 Oct 2005 17:59:49 +0000 (12:59 -0500)]
[SCSI] fix potential panic with proc on module removal
There's a problem in our host release in that it calls
scsi_proc_hostdir_rm(). However, if you hold a reference to the host as
you remove the module, the host template (which proc uses) will be freed
and the system will panic when the host device is finally released.
Fix this by moving scsi_proc_hostdir_rm() to where it should be: in
scsi_remove_host().
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Russell King [Sun, 2 Oct 2005 17:12:03 +0000 (18:12 +0100)]
[ARM] Fix init printk for EBSA110 network driver, and link timer
Arrange for the initialisation printks to happen after we've
registered the network interface, so we know what name the
device is. Also, check the link every 500ms (and use
msecs_to_jiffies.)
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Sven Henkel [Sat, 1 Oct 2005 22:30:33 +0000 (08:30 +1000)]
[PATCH] ppc32: Add new iBook 12" to PowerMac models table
This adds the new iBook G4 (manufactured after July 2005) to the
PowerMac models table. The model name (PowerBook6,7) is taken from a
12" iBook, I don't know if it also matches the 14" version. The patch
applies to a vanilla 2.6.13.2 kernel.
Signed-off-by: Sven Henkel <shenkel@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sven Henkel [Sat, 1 Oct 2005 22:29:18 +0000 (08:29 +1000)]
[PATCH] pmac/radeonfb: Add suspend support for M11 chip in new iBook 12"
This adds suspend support for the Radeon M11 chip in 12" iBooks
manufactured after July 2005. I don't know if the new 14" iBooks also
have that chip, so they might also be supported.
The chip identifies itself as "RV350 NV" (pci id 0x4e56), revision 0x80.
Apple calls it "Snowy", xfree86 names it "ATI FireGL Mobility T2 (M11)
NV (AGP)". So, we seem to be lucky here: The suspend-code for the M10
(which also is a "RV350 NV") works flawless for that chip.
Signed-off-by: Sven Henkel <shenkel@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Sat, 1 Oct 2005 18:04:18 +0000 (11:04 -0700)]
Fix inequality comparison against "task->state"
We should always use bitmask ops, rather than depend on some ordering of
the different states. With the TASK_NONINTERACTIVE flag, the inequality
doesn't really work.
Oleg Nesterov argues (likely correctly) that this test is unnecessary in
the first place. However, the minimal fix for now is to at least make
it work in the presense of TASK_NONINTERACTIVE. Waiting for consensus
from Roland & co on potential bigger cleanups.
[PATCH] ARM: Fix IXP2000 serial port resource range. For real this time.
Serial port only needs 32 bytes of resource space but we are currently
asking for 64K.
Signed-off-by: Deepak Saxena <dsaxena@plexity.net>
[ diff went missing first time due to corrupted patch ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As an alternative, people can use the boot time 'debug' option and/or use
'ethtool -s ethX msglvl xyz'. The different messages are listed at:
http://www.zoreil.com/~romieu/r8169/doc/msglvl.txt
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The old 550Mhz titanium powerbook can switch to a lower frequency
(500Mhz). A user has been repeately reporting overtemp conditions on his
machine at high speed so this simple patch adds support to PowerMac
cpufreq for this machine. The difference in frequency isn't big but seem
enough to fix that user's problems. The patch has been around for some
time now and doesn't seem to cause any problem, so I suppose it could go
in now.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alain RICHARD <alain.richard@equation.fr> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The tests Alok carried out on Petr's box confirmed that cpu_to_node[BP] is
not setup early enough by numa_init_array due to the x86_64 changes in
2.6.14-rc*, and unfortunately set wrongly by the work around code in
numa_init_array(). cpu_to_node[0] gets set with 1 early and later gets set
properly to 0 during identify_cpu() when all cpus are brought up, but
confusing the numa slab in the process.
Here is a quick fix for this. The right fix obviously is to have
cpu_to_node[bsp] setup early for numa_init_array(). The following patch
will fix the problem now, and the code can stay on even when
cpu_to_node{BP] gets fixed early correctly.
Thanks to Petr for access to his box.
Signed off by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix the BP node_to_cpumask. 2.6.14-rc* broke the boot cpu bit as the
cpu_to_node(0) is now not setup early enough for numa_init_array.
cpu_to_node[] is setup much later at srat_detect_node on acpi srat based
em64t machines. This seems like a problem on amd machines too, Tested on
em64t though. /sys/devices/system/node/node0/cpumap shows up sanely after
this patch.
Signed off by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] utilization of kprobe_mutex is incorrect on x86_64
The up()/down() orders are incorrect in arch/x86_64/kprobes.c file.
kprobe_mutext is used to protect the free kprobe instruction slot list.
arch_prepare_kprobe applies for a slot from the free list, and
arch_remove_kprobe returns a slot to the free list. The incorrect up()/down()
orders to operate on kprobe_mutex fail to protect the free list. If 2 threads
try to get/return kprobe instruction slot at the same time, the free slot list
might be broken, or a free slot might be applied by 2 threads.
[PATCH] eth1394: workaround limitation in rawiso routines
Work around limitation in rawiso routines. Required with 1394b cards on
architectures where PAGE_SIZE is 4096. Based on a previous patch by Ben
Collins.
Signed-off-by: Jody McIntyre <scjody@steamballoon.com> Cc: Ben Collins <bcollins@debian.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] ieee1394: skip unnecessary pause when scanning config ROMs
Skip a superfluous pause that occured when the config ROM of a node was
scanned unsuccessfully. This also occurs if a node without link wrongly
enables its "link active" self ID flag. A GWCTech 6-port hub does this.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jody McIntyre <scjody@steamballoon.com> Cc: Ben Collins <bcollins@debian.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] ieee1394: reorder activities after bus reset (fixes device detection)
Units were not detected if the local IRM performed a bus reset. ("The root
node is not cycle master capable; selecting a new root node and resetting...",
often seen with iPods and other SBP-2 devices). Rearrange the order of IRM
duties and node scanning. TODO: Audit the ROM caching and parsing code for
underlying issues.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jody McIntyre <scjody@steamballoon.com> Cc: Ben Collins <bcollins@debian.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set serialize_io=1 by default. This is safer and required by seemingly more
and more hardware. It causes little or no performance loss for S400 devices.
Performance of S800 1394b devices may drop by 25...30%. Therefore make the
parameter's description and dmesg message clearer about performance impact.
Update description of the max_speed parameter too. IEEE1394_SPEED_MAX is
currently S800.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jody McIntyre <scjody@steamballoon.com> Cc: Ben Collins <bcollins@debian.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] sbp2: fix deadlocks and delays on device removal/rmmod
Fixes for deadlocks of the ieee1394 and scsi subsystems and long delays in
futile error recovery attempts when SBP-2 devices are removed or drivers are
unloaded.
- Complete commands quickly with DID_NO_CONNECT if the 1394 node is gone or if
the 1394 low-level driver was unloaded.
- Skip unnecessary work in the eh_abort_handler and eh_device_reset_handler if
the node or 1394 low-level driver is gone.
- Let scsi's high-level shut down gracefully when sbp2 is being unloaded or
detached from the 1394 unit. A call to scsi_remove_device is added for this
purpose, which requires us to store a scsi_device pointer.
- scsi_device pointer is obtained from slave_alloc hook and cleared by
slave_destroy. This avoids usage of the pointer after the scsi device was
deleted e.g. by the user via scsi_mod's sysfs interface.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jody McIntyre <scjody@steamballoon.com> Cc: Ben Collins <bcollins@debian.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This change removes a bogus error message from the IOC4 serial driver
interrupt handler.
This error message is bogus for two reasons. First, it can never occur
given that current code takes care to initialize IOC4 in such a way that
these "unknown" interrupts could never occur. Second, this code fails to
take into account that other drivers can share the IOC4 interrupt mechanism
through SA_SHIRQ, and thus this driver is not in-fact "all-knowing".
Finally, this error message triggers every time some "unknown" interrupt
occurs -- it's not rate limited or repetition limited in any way, thereby
effectively denying use of the console device. Given its bogosity in the
first place, it's best to just get rid of it entirely.
Acked-by: Pat Gefre <pfg@sgi.com> Signed-off-by: Brent Casavant <bcasavan@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Check O_DIRECT and return -EINVAL error in open. dentry_open() also checks
this but only after the open method is called. This patch optimizes away
the unnecessary upcalls in this case.
It could be a correctness issue too: if filesystem has open() with side
effect, then it should fail before doing the open, not after.
Calling truncate() on hostfs spits a kernel warning "Something isn't
implemented here", but it still works fine.
Indeed, hostfs i_op->truncate doesn't do anything. But hostfs_setattr() ->
set_attr() correctly detects ATTR_SIZE and calls truncate() on the host. So
we should be safe (using ftruncate() may be better, in case the file is
unlinked on the host, but we aren't sure to have the file open for writing,
and reopening it would cause the same races; plus nobody should expect UML to
be so careful).
So, the warning is wrong, because the current implementation is working. Al,
am I correct, and can the warning be therefore dropped?
CC: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a) sysrq may be run when the scheduler is non-functioning
b) the warning I wanted to fix actually came from the fault handler run in
atomic context. But I fixed that not to take the semaphore in a separate
patch.
c) the fault handler is run because of a fault, and that fault was
unaffected by this patch.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] uml: clear SKAS0/3 flags when running in TT mode
SEGV_MAYBE_FIXABLE tests ptrace_faultinfo, and depends on it being 1 only in
SKAS3 mode, while currently when running with mode=tt it will be 1 anyway.
Fix this, and do the same for proc_mm.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I hadn't been running a SKAS3 host when testing the "uml: fix hang in TT mode
on fault" patch (commit 546fe1cbf91d4d62e3849517c31a2327c992e5c5), and I
didn't think enough to the missing trap_no in SKAS3 mode.
In fact, the resulting kernel doesn't work at all in SKAS3 mode.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Michael Krufky [Fri, 30 Sep 2005 18:58:58 +0000 (11:58 -0700)]
[PATCH] v4l: DViCO FusionHDTV5 Lite GPIO Fix
GPIO fix for the composite and tv mute states of bt8xx card #135: DViCO
FusionHDTV5 Lite. Without this patch, selecting one of these states could
produce unexpected behavior.
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Fri, 30 Sep 2005 18:58:57 +0000 (11:58 -0700)]
[PATCH] x86: hw_irq.h warning fix
include/asm/hw_irq.h:70: warning: `struct hw_interrupt_type' declared inside parameter list
include/asm/hw_irq.h:70: warning: its scope is only this definition or declaration, which is probably not what you want
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zach Brown [Fri, 30 Sep 2005 18:58:56 +0000 (11:58 -0700)]
[PATCH] aio: avoid extra aio_{read,write} call when ki_left == 0
Recently aio_p{read,write} changed to perform retries internally rather
than returning -EIOCBRETRY. This inadvertantly resulted in always calling
aio_{read,write} with ki_left at 0 which would in turn immediately return
0. Harmless, but we can avoid this call by checking in the caller.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zach Brown [Fri, 30 Sep 2005 18:58:55 +0000 (11:58 -0700)]
[PATCH] aio: remove unlocked task_list test and resulting race
Only one of the run or kick path is supposed to put an iocb on the run
list. If both of them do it than one of them can end up referencing a
freed iocb. The kick path could delete the task_list item from the wait
queue before getting the ctx_lock and putting the iocb on the run list.
The run path was testing the task_list item outside the lock so that it
could catch ki_retry methods that return -EIOCBRETRY *without* putting the
iocb on a wait queue and promising to call kick_iocb. This unlocked check
could then race with the kick path to cause both to try and put the iocb on
the run list.
The patch stops the run path from testing task_list by requring that any
ki_retry that returns -EIOCBRETRY *must* guarantee that kick_iocb() will be
called in the future. aio_p{read,write}, the only in-tree -EIOCBRETRY
users, are updated.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zach Brown [Fri, 30 Sep 2005 18:58:54 +0000 (11:58 -0700)]
[PATCH] aio: lock around kiocbTryKick()
Only one of the run or kick path is supposed to put an iocb on the run
list. If both of them do it than one of them can end up referencing a
freed iocb. The kick patch could set the Kicked bit before acquiring the
ctx_lock and putting the iocb on the run list. The run path, while holding
the ctx_lock, could see this partial kick and mistake it for a kick that
was deferred while it was doing work with the run_list NULLed out. It
would then race with the kick thread to add the iocb to the run list.
This patch moves the kick setting under the ctx_lock so that only one of
the kick or run path queues the iocb on the run list, as intended.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As requested by Thomas Gleixner <tglx@linutronix.de>:
"5d3d0f7704ed0bc7eaca0501eeae3e5da1ea6c87 breaks a couple of ARM
boards, which depend on the historical bootmem allocation order.
There is a cleaner solution around to remove the pgdat list
completely, but this is a topic for post 2.6.14
James Morris [Fri, 30 Sep 2005 18:24:34 +0000 (14:24 -0400)]
[PATCH] SELinux - fix SCTP socket bug and general IP protocol handling
The following patch updates the way SELinux classifies and handles IP
based protocols.
Currently, IP sockets are classified by SELinux as being either TCP, UDP
or 'Raw', the latter being a default for IP socket that is not TCP or UDP.
The classification code is out of date and uses only the socket type
parameter to socket(2) to determine the class of IP socket. So, any
socket created with SOCK_STREAM will be classified by SELinux as TCP, and
SOCK_DGRAM as UDP. Also, other socket types such as SOCK_SEQPACKET and
SOCK_DCCP are currently ignored by SELinux, which classifies them as
generic sockets, which means they don't even get basic IP level checking.
This patch changes the SELinux IP socket classification logic, so that
only an IPPROTO_IP protocol value passed to socket(2) classify the socket
as TCP or UDP. The patch also drops the check for SOCK_RAW and converts
it into a default, so that socket types like SOCK_DCCP and SOCK_SEQPACKET
are classified as SECCLASS_RAWIP_SOCKET (instead of generic sockets).
Note that protocol-specific support for SCTP, DCCP etc. is not addressed
here, we're just getting these protocols checked at the IP layer.
This fixes a reported problem where SCTP sockets were being recognized as
generic SELinux sockets yet still being passed in one case to an IP level
check, which then fails for generic sockets.
It will also fix bugs where any SOCK_STREAM socket is classified as TCP or
any SOCK_DGRAM socket is classified as UDP.
This patch also unifies the way IP sockets classes are determined in
selinux_socket_bind(), so we use the already calculated value instead of
trying to recalculate it.
Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Fri, 30 Sep 2005 16:34:42 +0000 (02:34 +1000)]
[PATCH] i386: include linux/irq.h rather than asm/hw_irq.h
I need the following patch to compile -git8 here, otherwise these
files fail to compile (asm/hw_irq.h needs definitions from
linux/irq.h and that file provides the required include ordering).
I did not do a full audit, though there looks to be many other
places that should get the same treatment, if this is the right
way to do it.
Al Viro [Fri, 30 Sep 2005 02:36:50 +0000 (03:36 +0100)]
[PATCH] bogus BUILD_BUG_ON() in bpa_iommu
BUILD_BUG_ON(1) is asking for trouble (and getting it) when used in that
manner - dead code elimination happens after we parse it and invalid
type is invalid type, dead code or not.
It might be version-dependent, but at least 4.0.1 refuses to accept
that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Gen FUKATSU [Fri, 30 Sep 2005 15:09:17 +0000 (16:09 +0100)]
[ARM] 2940/1: Fix BTB entry flush in arch/arm/mm/cache-v6.S
Patch from Gen FUKATSU
Invalidate BTB entry instruction flushes two instruction
at a time. Therefore this instruction should be done four
times after invalidate instruction cache line.
Signed-off-by: Gen Fukatsu Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[ARM] 2939/1: Fix compilation error in arch/arm/mm/flush.c
Patch from Catalin Marinas
When CONFIG_CPU_CACHE_VIPT is defined, the flush_pfn_alias() function is
implicitely declared and it later conflicts with its actual definition.
This patch moves the function definition to the beginning of the file.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
David S. Miller [Fri, 30 Sep 2005 01:50:34 +0000 (18:50 -0700)]
[SPARC64]: Fix several bugs in flush_ptrace_access().
1) Use cpudata cache line sizes, not magic constants.
2) Align start address in cheetah case so we do not get
unaligned address traps. (pgrep was good at triggering
this, via /proc/${pid}/cmdline accesses)
Signed-off-by: David S. Miller <davem@davemloft.net>
[TCP]: Don't over-clamp window in tcp_clamp_window()
From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Handle better the case where the sender sends full sized
frames initially, then moves to a mode where it trickles
out small amounts of data at a time.
This known problem is even mentioned in the comments
above tcp_grow_window() in tcp_input.c, specifically:
...
* The scheme does not work when sender sends good segments opening
* window and then starts to feed us spagetti. But it should work
* in common situations. Otherwise, we have to rely on queue collapsing.
...
When the sender gives full sized frames, the "struct sk_buff" overhead
from each packet is small. So we'll advertize a larger window.
If the sender moves to a mode where small segments are sent, this
ratio becomes tilted to the other extreme and we start overrunning
the socket buffer space.
tcp_clamp_window() tries to address this, but it's clamping of
tp->window_clamp is a wee bit too aggressive for this particular case.
Fix confirmed by Ion Badulescu.
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Kuznetsov has explained the situation as follows:
--------------------
I think the fix is incorrect. Look, the RFC function init_cwnd(mss) is
not continuous: f.e. for mss=1095 it needs initial window 1095*4, but
for mss=1096 it is 1096*3. We do not know exactly what mss sender used
for calculations. If we advertised 1096 (and calculate initial window
3*1096), the sender could limit it to some value < 1096 and then it
will need window his_mss*4 > 3*1096 to send initial burst.
See?
So, the honest function for inital rcv_wnd derived from
tcp_init_cwnd() is:
init_rcv_wnd(mss)=
min { init_cwnd(mss1)*mss1 for mss1 <= mss }
It is something sort of:
if (mss < 1096)
return mss*4;
if (mss < 1096*2)
return 1096*4;
return mss*2;
(I just scrablled a graph of piece of paper, it is difficult to see or
to explain without this)
I selected it differently giving more window than it is strictly
required. Initial receive window must be large enough to allow sender
following to the rfc (or just setting initial cwnd to 2) to send
initial burst. But besides that it is arbitrary, so I decided to give
slack space of one segment.
Actually, the logic was:
If mss is low/normal (<=ethernet), set window to receive more than
initial burst allowed by rfc under the worst conditions
i.e. mss*4. This gives slack space of 1 segment for ethernet frames.
For msses slighlty more than ethernet frame, take 3. Try to give slack
space of 1 frame again.
If mss is huge, force 2*mss. No slack space.
Value 1460*3 is really confusing. Minimal one is 1096*2, but besides
that it is an arbitrary value. It was meant to be ~4096. 1460*3 is
just the magic number from RFC, 1460*3 = 1095*4 is the magic :-), so
that I guess hands typed this themselves.
--------------------
Signed-off-by: David S. Miller <davem@davemloft.net>
[ARM] 2941/1: Fix running legacy binaries from a soft-float root filesystem with CONFIG_IWMMXT.
Patch from Daniel Jacobowitz
Thread flags are inherited on fork(). In order for a binary which has
the iWMMXt coprocessor enabled to run a binary which needs the FPA
emulation, we need to explicitly clear TIF_USING_IWMMXT if we are not
going to set it.
Signed-off-by: Daniel Jacobowitz <dan@codesourcery.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The SMU driver has a small mistake in the locking of the interrupt code,
if polled access and interrupt access race, interrupt may take a lock
and return without releasing it. This fixes it. With that patch, the
driver is rock solid with my experimental thermal control (which bangs
it pretty hard) racing with real time clock and cpufreq handling.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] readv/writev syscalls are not checked by lsm
it seems that readv(2)/writev(2) syscalls do not call
file_permission callback. Looks like this is overlook.
I have filled the issue into redhat bugzilla as
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=169433
and got the recommendation to post this on lsm mailing list.
The following trivial patch solves the problem.
Signed-off-by: Kostik Belousov <kostikbel@gmail.com> Signed-off-by: Chris Wright <chrisw@osdl.org>
Mike Waychison [Thu, 29 Sep 2005 22:01:27 +0000 (00:01 +0200)]
[PATCH] x86_64: Fix mce_log
The attempt to fixup the lockless mce log buffer introduced an infinite loop
when trying to find a free entry.
And:
Using rcu_dereference() to load mcelog.next doesn't seem to be sufficient
enough to ensure that mcelog.next is loaded each time around the loop in
mce_log(). Instead, use an explicit rmb() to ensure that the compiler gets it
right.
AK: turned the smp_wmbs into true wmbs to make sure they are not
reordered by the compiler on UP.
Roland McGrath [Thu, 29 Sep 2005 21:54:42 +0000 (14:54 -0700)]
[PATCH] Fix task state testing properly in do_signal_stop()
Any tests using < TASK_STOPPED or the like are left over from the time
when the TASK_ZOMBIE and TASK_DEAD bits were in the same word, and it
served to check for "stopped or dead". I think this one in
do_signal_stop is the only such case. It has been buggy ever since
exit_state was separated, and isn't testing the exit_state value.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>