git.proxmox.com Git - mirror_ubuntu-focal-kernel.git/log

PCI: revert "pcie: utilize pcie transaction pending bit"

Revert as it is reported to cause problems for people.

commit 4348a2dc49f9baecd34a9b0904245488c6189398
Author: Shaohua Li <shaohua.li@intel.com>
Date:   Wed Oct 24 10:45:08 2007 +0800

    pcie: utilize pcie transaction pending bit

    PCIE has a mechanism to wait for Non-Posted request to complete. I think
    pci_disable_device is a good place to do this.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Due to the regression reported at
http://bugzilla.kernel.org/show_bug.cgi?id=10065

Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Soeren Sonnenburg <kernel@nn7.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

PCI: iova: lockdep false alarm fix

lockdep goes off on the iova copy_reserved_iova() because it and a function
it calls grabs locks in the from, and the to of the copy operation.

The function grab locks of the same lock classes triggering the warning. The
first lock grabbed is for the constant reserved areas that is never accessed
after early boot. Technically you could do without grabbing the locks for the
"from" structure its copying reserved areas from.

But dropping the from locks to me looks wrong, even though it would be ok.

The affected code only runs in early boot as its setting up the DMAR
engines.

This patch gives the reserved_ioval_list locks special lockdep classes.

Signed-off-by: Mark Gross <mgross@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] mpc5200: Fix incorrect compatible string for the mdio node
[POWERPC] Update some defconfigs

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  [libata] ahci: SB600 workaround is suspect... play it safe for now
  sata_promise: fix hardreset hotplug events, take 2
  libata: improve HPA error handling
  libata: assume no device is attached if both IDENTIFYs are aborted
  pata_it821x: use raw nbytes in check_atapi_dma
  libata: implement ata_qc_raw_nbytes()

[libata] ahci: SB600 workaround is suspect... play it safe for now

At least one report claims that a878539ef994787c447a98c2e3ba0fe3dad984ec
failed to solve lockups, whereas the old limit-to-32-bit trick worked.

Restore the 32-bit limit, but also leave the 255-sector limit in place,
because we know that's needed as well.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-linus

* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-linus:
kbuild: soften modpost checks when doing cross builds

Merge branch 'linux-2.6' into merge

sata_promise: fix hardreset hotplug events, take 2

A Promise SATA controller will signal hotplug events when a hard
reset (COMRESET) is done on a port. These events aren't masked by
the driver, and the unexpected interrupts will cause a sequence
of failed reset attempts util libata's EH finally gives up.

This has not been a common problem so far, but the pending libata
hardreset-by-default changes makes it a critical issue.

The solution is to disable hotplug events before a reset, and to
reenable them afterwards. (Promise's driver does this too.)

This patch adds SATA-specific versions of ->freeze() and ->thaw()
that also disable and enable hotplug events. PATA ports continue
to use the old versions of ->freeze() and ->thaw().

Accesses to the hotplug register must be serialised via host->lock.
We rely on ap->lock == &ap->host->lock and that libata takes this
lock before ->freeze() and ->thaw(). Document this requirement.
The interrupt handler is adjusted so its hotplug register accesses
are inside the region protected by host->lock.

Tested on various chips (SATA300TX4, SATA300TX2plus, SATAII150TX4,
FastTrack TX4000) with various combinations of SATA and PATA disks,
with and without the pending hardreset-by-default changes.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Acked-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

Make printk() console semaphore accesses sensible

The printk() logic on when/how to get the console semaphore was
unreadable, this splits the code up into a few helper functions and
makes it easier to follow what is going on.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bsd_acct: using task_struct->tgid is not right in pid-namespaces

In case we're accounting from a sub-namespace, the tgids reported will not
refer to the right namespace.

Save the pid_namespace we're accounting in on the acct_glbs and use it in
do_acct_process.

Two less :) places using the task_struct.tgid member.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bsd_acct: plain current->real_parent access is not always safe

This is minor, but dereferencing even current real_parent is not safe on debug
kernels, since the memory, this points to, can be unmapped - RCU protection is
required.

Besides, the tgid field is deprecated and is to be replaced with task_tgid_xxx
call (the 2nd patch), so RCU will be required anyway.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

revert "kswapd should only wait on IO if there is IO"

Revert commit f1a9ee758de7de1e040de849fdef46e6802ea117:

  Author: Rik van Riel <riel@redhat.com>
  Date:   Thu Feb 7 00:14:08 2008 -0800

    kswapd should only wait on IO if there is IO

    The current kswapd (and try_to_free_pages) code has an oddity where the
    code will wait on IO, even if there is no IO in flight.  This problem is
    notable especially when the system scans through many unfreeable pages,
    causing unnecessary stalls in the VM.

    Additionally, tasks without __GFP_FS or __GFP_IO in the direct reclaim path
    will sleep if a significant number of pages are encountered that should be
    written out.  This gives kswapd a chance to write out those pages, while
    the direct reclaim task sleeps.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Because of large latencies and interactivity problems reported by Carlos,
here: http://lkml.org/lkml/2008/3/22/211

Cc: Rik van Riel <riel@redhat.com>
Cc: "Carlos R. Mafra" <crmafra2@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hw_random doc updates

Update documentation for the hw_random support to be current:

- Documentation/hw_random.txt has been updated to reflect the
   current code:  it's a framework now, a "core" with a small
   sysfs interface, that hardware-specific drivers plug in to.
   Text specific to Intel hardware is now at the end.

- Kconfig now references the Documentation/hw_random.txt file
   and better explains what this really does.

Both chunks of documentation now higlight the fact that the kernel entropy
pool is maintained by "rngd", and this driver has nothing directly to do with
that important task.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

smackfs: remove redundant lock, fix open(,O_RDWR)

Older smackfs was parsing MAC rules by characters, thus a need of locking
write sessions on open() was needed. This lock is no longer useful now since
each rule is handled by a single write() call.

This is also a bugfix since seq_open() was not called if an open() O_RDWR flag
was given, leading to a seq_read() without an initialized seq_file, thus an
Oops.

Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com>
Reported-by: Jonathan Corbet <corbet@lwn.net>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

markers: remove ACCESS_ONCE

As Paul pointed out, the ACCESS_ONCE are not needed because we already have
the explicit surrounding memory barriers.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Mason <mmlnx@us.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: David Smith <dsmith@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Adrian Bunk <adrian.bunk@movial.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

markers: update preempt_disable. call_rcu, rcu_barrier comments

Add comments requested by Andrew.

Updated comments about synchronize_sched(). Since we use call_rcu and
rcu_barrier now, these comments were out of sync with the code.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Mason <mmlnx@us.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: David Smith <dsmith@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Adrian Bunk <adrian.bunk@movial.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: fix boundary checking in free_bootmem_core

With numa enabled, some callers could have a range of memory on one node
but try to free that on other node.  This can cause some pages to be
freed wrongly.

For example: when we try to allocate 128g boot ram early for
gart/swiotlb, and free that range later so gart/swiotlb can get some
range afterwards.

With this patch, we don't need to care which node holds the range, just
loop to call free_bootmem_node for all online nodes.

This patch makes free_bootmem_core() more robust by trimming the sidx
and eidx according the ram range that the node has.

And make the free_bootmem_core handle this out of range case.  We could
use bdata_list to make sure the range can be freed for sure.  So next
time, we don't need to loop online nodes and could use free_bootmem
directly.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mtd: memory corruption in block2mtd.c

The block2mtd driver (drivers/mtd/devices/block2mtd.c) will kfree an on-stack
pointer when handling an invalid argument line (e.g.
block2mtd=/dev/loop0,xxx).

The kfree was added some time ago when "name" was dynamically allocated.

Signed-off-by: Ingo van Lil <inguin@gmx.de>
Acked-by: Joern Engel <joern@lazybastard.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: <stable@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel-parameters.txt: document memmap option better

Provide example for memmap exclude option (it is slightly strange and
non-trivial) and provide nice small HOWTO for people with bad memory.

Signed-off-by: Jan-Simon Moeller <dl9pf@gmx.de>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[POWERPC] mpc5200: Fix incorrect compatible string for the mdio node

The MDIO node in the lite5200b.dts file needs to also claim compatibility
with the older mpc5200 chip. Otherwise the driver won't find the device.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>

libata: improve HPA error handling

There's no point in retrying and eventually failing device detection
when the device rejects READ_NATIVE_MAX[_EXT]. Disable HPA unlocking
if READ_NATIVE_MAX[_EXT] is rejected as done when SET_MAX[_EXT] is
rejected.

This allows some old drives to work even if they aren't blacklisted.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

libata: assume no device is attached if both IDENTIFYs are aborted

This is to fix bugzilla #10254. QSI cdrom attached to pata_sis as
secondary master appears as phantom device for the slave.
Interestingly, instead of not setting DRQ after IDENTIFY which
triggers NODEV_HINT, it aborts both IDENTIFY and IDENTIFY PACKET which
makes EH retry.

Modify EH such that it assumes no device is attached if both flavors
of IDENTIFY are aborted by the device. There really isn't much point
in retrying when the device actively aborts the commands.

While at it, convert NODEV detection message to ata_dev_printk() to
help debugging obscure detection problems.

This problem was reported by Jan Bücken.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Jan Bücken <jb.faq@gmx.de>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

pata_it821x: use raw nbytes in check_atapi_dma

pata_it821x needs to look at raw request size in check_atapi_dma().

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

libata: implement ata_qc_raw_nbytes()

Implement ata_qc_raw_nbytes() which determines the raw user-requested
size of a PC command.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Fix Oops with TQM5200 on TQM5200
  [POWERPC] mpc5200: Fix null dereference if bestcomm fails to initialize
  [POWERPC] mpc5200-fec: Fix possible NULL dereference in mdio driver
  [POWERPC] Fix crash in init_ipic_sysfs on efika
  [POWERPC] Don't use 64k pages for ioremap on pSeries

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  [SPARC64]: exec PT_DTRACE
  [SPARC64]: Use shorter list_splice_init() for brevity.
  [SPARC64]: Remove most limitations to kernel image size.

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  sch_htb: fix "too many events" situation
  connector: convert to single-threaded workqueue
  [ATM]: When proc_create() fails, do some error handling work and return -ENOMEM.
  [SUNGEM]: Fix NAPI assertion failure.
  BNX2X: prevent ethtool from setting port type
  [9P] net/9p/trans_fd.c: remove unused variable
  [IPV6] net/ipv6/ndisc.c: remove unused variable
  [IPV4] fib_trie: fix warning from rcu_assign_poinger
  [TCP]: Let skbs grow over a page on fast peers
  [DLCI]: Fix tiny race between module unload and sock_ioctl.
  [SCTP]: Fix build warnings with IPV6 disabled.
  [IPV4]: Fix null dereference in ip_defrag

SVCRDMA: Use only 1 RDMA read scatter entry for iWARP adapters

The iWARP protocol limits RDMA read requests to a single scatter
entry.  NFS/RDMA has code in rdma_read_max_sge() that is supposed to
limit the sge_count for RDMA read requests to 1, but the code to do
that is inside an #ifdef RDMA_TRANSPORT_IWARP block.  In the mainline
kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a
preprocessor #define, so the #ifdef'ed code is never compiled.

In my test of a kernel build with -j8 on an NFS/RDMA mount, this
problem eventually leads to trouble starting with:

    svcrdma: Error posting send = -22
    svcrdma : RDMA_READ error = -22

and things go downhill from there.

The trivial fix is to delete the #ifdef guard.  The check seems to be
a remnant of when the NFS/RDMA code was not merged and needed to
compile against multiple kernel versions, although I don't think it
ever worked as intended.  In any case now that the code is upstream
there's no need to test whether the RDMA_TRANSPORT_IWARP constant is
defined or not.

Without this patch, my kernel build on an NFS/RDMA mount using NetEffect
adapters quickly and 100% reproducibly failed with an error like:

    ld: final link failed: Software caused connection abort

With the patch applied I was able to complete a kernel build on the
same setup.

(Tom Tucker says this is "actually an _ancient_ remnant when it had to
compile against iWARP vs. non-iWARP enabled OFA trees.")

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86-32: Pass the full resource data to ioremap()

It appears that 64-bit PCI resources cannot possibly ever have worked on
x86-32 even when the RESOURCES_64BIT config option was set, because any
driver that tried to [pci_]ioremap() the resource would have been unable
to do so because the high 32 bits would have been silently dropped on
the floor by the ioremap() routines that only used "unsigned long".

Change them to use "resource_size_t" instead, which properly encodes the
whole 64-bit resource data if RESOURCES_64BIT is enabled.

Acked-by: H. Peter Anvin <hpa@kernel.org>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Don't 'printk()' while holding xtime lock for writing

The printk() can deadlock because it can wake up klogd(), and
task enqueueing will try to read the time in order to set a hrtimer.

Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Debugged-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[POWERPC] Update some defconfigs

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>

[POWERPC] Fix Oops with TQM5200 on TQM5200

The "bestcomm-core" driver defines its of_match table as follows

static struct of_device_id mpc52xx_bcom_of_match[] = {
{ .type = "dma-controller", .compatible = "fsl,mpc5200-bestcomm", },
{ .type = "dma-controller", .compatible = "mpc5200-bestcomm", },
{},
};

so while registering the driver, the driver's probe function won't be
called, because the device tree node doesn't have a device_type
property. Thus the driver's bcom_engine structure won't be allocated.
Referencing this structure later causes observed Oops.

Checking bcom_eng pointer for NULL before referencing data pointed
by it prevents oopsing, but fec driver still doesn't work (because
of the lost bestcomm match and resulted task allocation failure).
Actually the compatible property exists and should match and so
the fec driver should work.

This removes .type = "dma-controller" from the bestcomm driver's
mpc52xx_bcom_of_match table to solve the problem.

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[POWERPC] mpc5200: Fix null dereference if bestcomm fails to initialize

If the bestcomm initialization fails, calls to the task allocate
function should fail gracefully instead of oopsing with a NULL deref.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[POWERPC] mpc5200-fec: Fix possible NULL dereference in mdio driver

If the reg property is missing from the phy node (unlikely, but possible),
then the kernel will oops with a NULL pointer dereference. This fixes
it by checking the pointer first.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[POWERPC] Fix crash in init_ipic_sysfs on efika

The global primary_ipic in arch/powerpc/sysdev/ipic.c can remain NULL
if ipic_init() fails, which will happen on machines that don't have an
ipic interrupt controller. init_ipic_sysfs() will crash in that case.

Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>

[POWERPC] Don't use 64k pages for ioremap on pSeries

On pSeries, the hypervisor doesn't let us map in the eHEA ethernet
adapter using 64k pages, and thus the ehea driver will fail if 64k
pages are configured. This works around the problem by always
using 4k pages for ioremap on pSeries (but not on other platforms).
A better fix would be to check whether the partition could ever
have an eHEA adapter, and only force 4k pages if it could, but this
will do for 2.6.25.

This is based on an earlier patch by Tony Breeds.

Signed-off-by: Paul Mackerras <paulus@samba.org>

[SPARC64]: exec PT_DTRACE

The PT_DTRACE flag is meaningless and obsolete.
Don't touch it.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[SPARC64]: Use shorter list_splice_init() for brevity.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>

sch_htb: fix "too many events" situation

HTB is event driven algorithm and part of its work is to apply
scheduled events at proper times. It tried to defend itself from
livelock by processing only limited number of events per dequeue.
Because of faster computers some users already hit this hardcoded
limit.

This patch limits processing up to 2 jiffies (why not 1 jiffie ?
because it might stop prematurely when only fraction of jiffie
remains).

Signed-off-by: Martin Devera <devik@cdi.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

connector: convert to single-threaded workqueue

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

We don't need one cqueue thread for each CPU. cqueue is used for
receiving userspace datagrams, which are very rare and thus will
happily live with a single queue.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

[ATM]: When proc_create() fails, do some error handling work and return -ENOMEM.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

kbuild: soften modpost checks when doing cross builds

The module alias support in the kernel have a consistency
check where it is checked that the size of a structure
in the kernel and on the build host are the same.
For cross builds this check does not make sense so detect
when we do cross builds and silently skip the check in these
situations.
This fixes a build bug for a wireless driver when cross building
for arm.

Acked-by: Michael Buesch <mb@bu3sch.de>
Tested-by: Gordon Farquharson <gordonfarquharson@gmail.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: stable@kernel.org

[SUNGEM]: Fix NAPI assertion failure.

As reported by Johannes Berg:

I started getting this warning with recent kernels:

[ 773.908927] ------------[ cut here ]------------
[ 773.908954] Badness at net/core/dev.c:2204
...

If we loop more than once in gem_poll(), we'll
use more than the real budget in our gem_rx()
calls, thus eventually trigger the caller's
assertions in net_rx_action().

Subtract "work_done" from "budget" for the second
arg to gem_rx() to fix the bug.

Signed-off-by: David S. Miller <davem@davemloft.net>

BNX2X: prevent ethtool from setting port type

On 10GBaseT boards setting the type to TP will cause the driver to try
to configure 1GBaseT.
Since there are currently no boards that support setting of the port
type, disable this for now.

Signed-off-by: Eliezer Tamir <eliezert@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[9P] net/9p/trans_fd.c: remove unused variable

The variable cb is initialized but never used otherwise.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
type T;
identifier i;
constant C;
@@

(
extern T i;
|
- T i;
<+... when != i
- i = C;
...+>
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

[IPV6] net/ipv6/ndisc.c: remove unused variable

The variable hlen is initialized but never used otherwise.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
type T;
identifier i;
constant C;
@@

(
extern T i;
|
- T i;
<+... when != i
- i = C;
...+>
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

[IPV4] fib_trie: fix warning from rcu_assign_poinger

This gets rid of a warning caused by the test in rcu_assign_pointer.
I tried to fix rcu_assign_pointer, but that devolved into a long set
of discussions about doing it right that came to no real solution.
Since the test in rcu_assign_pointer for constant NULL would never
succeed in fib_trie, just open code instead.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
RDMA/nes: Fix MSS calculation on RDMA path

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
  Revert "ide-tape: schedule driver for removal after 6 months"
  ide: mark "hdx=remap" and "hdx=remap63" kernel parameters as obsoleted
  ide: mark "hdx=[driver_name]" and "hdx=scsi" kernel parameters as obsoleted
  ide: Documentation/ide/ide.txt fixes
  ide: mark special "ide0=" kernel parameters as obsoleted
  ide: remove commented out entries from ide_pio_blacklist[]

Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
  [CIFS] Fix mem leak on dfs referral
  [CIFS] file create with acl support enabled is slow
  [CIFS] Fix mtime on cp -p when file data cached but written out too late
  [CIFS] Fix build problem
  [CIFS] cifs: replace remaining __FUNCTION__ occurrences
  [CIFS]  DFS patch that connects inode with dfs handling ops

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
x86: revert: reserve dma32 early for gart

Change pagemap output format to allow for future reporting of huge pages

Change pagemap output format to allow for future reporting of huge pages.

(Format comment and minor cleanups: mpm@selenic.com)

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mmc: use sysfs groups to handle conditional attributes

Suppressing uevents turned out to be a bad idea as it screws up the
order of events, making user space very confused. Change the system to
use sysfs groups instead.

This is a regression that, for some odd reason, has gone unnoticed for
some time. It confuses hal so that the block devices (which have the
mmc device as a parent) are not registered. End result being that
desktop magic when cards are inserted won't work.

Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

PNP: increase the number of PnP memory resources from 12 to 24

Increase the number of PnP memory resources from 12 to 24.

This removes an "exceeded the max num of mem resources" warning on boot. I
also noticed the reservation of two more iomem ranges on the computer on
which this was tested.

Signed-off-by: Darren Salt <linux@youmustbejoking.demon.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ISAPNP: fix limits of logical device register set

PNP_MAX_MEM and PNP_MAX_PORT are mainly used to size tables of PNP
device resources.  In 2.6.24, we increased their values to accomodate
ACPI devices that have many resources:

                 2.6.23    2.6.24
                 ------    ------
  PNP_MAX_MEM       4         12
  PNP_MAX_PORT      8         40

However, ISAPNP also used these constants as the size of parts of the
logical device register set.  This register set is fixed by hardware,
so increasing the constants meant that we were reading and writing
unintended parts of the register set.

This patch changes ISAPNP to use the correct register set sizes (the
same values we used prior to 2.6.24).

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

[CIFS] Fix mem leak on dfs referral

Signed-off-by: Igor Mammedov <niallain@gmail.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>

[TCP]: Let skbs grow over a page on fast peers

While testing the virtio-net driver on KVM with TSO I noticed
that TSO performance with a 1500 MTU is significantly worse
compared to the performance of non-TSO with a 16436 MTU.  The
packet dump shows that most of the packets sent are smaller
than a page.

Looking at the code this actually is quite obvious as it always
stop extending the packet if it's the first packet yet to be
sent and if it's larger than the MSS.  Since each extension is
bound by the page size, this means that (given a 1500 MTU) we're
very unlikely to construct packets greater than a page, provided
that the receiver and the path is fast enough so that packets can
always be sent immediately.

The fix is also quite obvious.  The push calls inside the loop
is just an optimisation so that we don't end up doing all the
sending at the end of the loop.  Therefore there is no specific
reason why it has to do so at MSS boundaries.  For TSO, the
most natural extension of this optimisation is to do the pushing
once the skb exceeds the TSO size goal.

This is what the patch does and testing with KVM shows that the
TSO performance with a 1500 MTU easily surpasses that of a 16436
MTU and indeed the packet sizes sent are generally larger than
16436.

I don't see any obvious downsides for slower peers or connections,
but it would be prudent to test this extensively to ensure that
those cases don't regress.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

x86: revert: reserve dma32 early for gart

Revert

commit f62f1fc9ef94f74fda2b456d935ba2da69fa0a40
Author: Yinghai Lu <yhlu.kernel@gmail.com>
Date: Fri Mar 7 15:02:50 2008 -0800

x86: reserve dma32 early for gart

The patch has a dependency on bootmem modifications which are not .25
material that late in the -rc cycle. The problem which is addressed by
the patch is limited to machines with 256G and more memory booted with
NUMA disabled. This is not a .25 regression and the audience which is
affected by this problem is very limited, so it's safer to do the
revert than pulling in intrusive bootmem changes right now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Revert "ide-tape: schedule driver for removal after 6 months"

This reverts commit d48567dd43868b3d2e1fcc33ee76dc2d38a1ddeb.

Borislav is working on ide-tape "light" version instead.

Cc: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: mark "hdx=remap" and "hdx=remap63" kernel parameters as obsoleted

Mark "hdx=remap" and "hdx=remap63" kernel parameters as obsoleted
(they are layering violation and should be dealt with in the same
way as done by libata - device-mapper should be used instead).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: mark "hdx=[driver_name]" and "hdx=scsi" kernel parameters as obsoleted

Mark "hdx=[driver_name]" and "hdx=scsi" kernel parameters as obsoleted
(nowadays device-driver binding can be changed at runtime through sysfs
and it can also be dealt with using per device driver parameters).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: Documentation/ide/ide.txt fixes

* "hdx=cyls,heads,sects,wpcom,irq" should be "hdx=cyls,heads,sects".

* "hdx=" is for "x" from 'a' to 'u', "idex=" is for "x" from '0' to '9'.

* "idex=noautotune" is long gone.

* Obsoleted "ide0=" parameters were already removed from the documentation.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: mark special "ide0=" kernel parameters as obsoleted

Mark "ide0=ali14xx|cmd640_vlb|dtc2278|ht6560b|qd65xx|umc8672" kernel
parameters as obsoleted (per host driver replacements have been available
for a long time).

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

ide: remove commented out entries from ide_pio_blacklist[]

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

[SPARC64]: Remove most limitations to kernel image size.

Currently kernel images are limited to 8MB in size, and this causes
problems especially when enabling features that take up a lot of
kernel image space such as lockdep.

The code now will align the kernel image size up to 4MB and map that
many locked TLB entries.  So, the only practical limitation is the
number of available locked TLB entries which is 16 on Cheetah and 64
on pre-Cheetah sparc64 cpus.  Niagara cpus don't actually have hw
locked TLB entry support.  Rather, the hypervisor transparently
provides support for "locked" TLB entries since it runs with physical
addressing and does the initial TLB miss processing.

Fully utilizing this change requires some help from SILO, a patch for
which will be submitted to the maintainer.  Essentially, SILO will
only currently map up to 8MB for the kernel image and that needs to be
increased.

Note that neither this patch nor the SILO bits will help with network
booting.  The openfirmware code will only map up to a certain amount
of kernel image during a network boot and there isn't much we can to
about that other than to implemented a layered network booting
facility.  Solaris has this, and calls it "wanboot" and we may
implement something similar at some point.

Signed-off-by: David S. Miller <davem@davemloft.net>

[DLCI]: Fix tiny race between module unload and sock_ioctl.

This is a narrow pedantry :) but the dlci_ioctl_hook check and call
should not be parted with the mutex lock.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

[SCTP]: Fix build warnings with IPV6 disabled.

Introduced by 270637abff0cdf848b910b9f96ad342e1da61c66
("[SCTP]: Fix a race between module load and protosw access")

Reported by Gabriel C:

In file included from net/sctp/sm_statetable.c:50:
include/net/sctp/sctp.h: In function 'sctp_v6_pf_init':
include/net/sctp/sctp.h:392: warning: 'return' with a value, in function returning void
In file included from net/sctp/sm_statefuns.c:62:
include/net/sctp/sctp.h: In function 'sctp_v6_pf_init':
include/net/sctp/sctp.h:392: warning: 'return' with a value, in function returning void
...

Signed-off-by: David S. Miller <davem@davemloft.net>

[IPV4]: Fix null dereference in ip_defrag

Been seeing occasional panics in my testing of 2.6.25-rc in ip_defrag.
Offending line in ip_defrag is here:

net = skb->dev->nd_net

where dev is NULL. Bisected the problem down to commit
ac18e7509e7df327e30d6e073a787d922eaf211d ([NETNS][FRAGS]: Make the
inet_frag_queue lookup work in namespaces).

Below patch (idea from Patrick McHardy) fixes the problem for me.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDMA/nes: Fix MSS calculation on RDMA path

Fix the calculation of the MSS for RDMA connections: we need to
allow space in frames for a VLAN tag too.

Signed-off-by: Chien Tung <ctung@neteffect.com>
Signed-off-by: Glenn Streiff <gstreiff@neteffect.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>

ocfs2: MAINTAINERS update

Change my e-mail address, add Joel as new co-maintainer.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel:
  sched: add arch_update_cpu_topology hook.
  sched: add exported arch_reinit_sched_domains() to header file.
  sched: remove double unlikely from schedule()
  sched: cleanup old and rarely used 'debug' features.

x86_64: free_bootmem should take phys

so use nodedata_phys directly.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

sync_bitops: fix wrong comments [Bug 10247]

Fix wrong function name and references to non-x86 architectures.

Signed-off-by: Matti Linnanvuori mattilinnanvuori@yahoo.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: trim mtrr don't close gap for resource allocation.

fix the bug reported here:

http://bugzilla.kernel.org/show_bug.cgi?id=10232

use update_memory_range() instead of add_memory_range() directly
to avoid closing the gap.

( the new code only affects and runs on systems where the MTRR
workaround triggers. )

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: fix reboot problem with Dell Optiplex 745, 0KW626 board

we have seen a little problem in rebooting Dell Optiplex 745 with the
0KW626 board. Here is a small patch enabling reboot with this board,
which forces the default reboot path it into the BIOS reboot mode.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: fix fault_msg nul termination

The fault_msg text is not explictly nul terminated now in startup
assembly. Do so by converting .ascii to .asciz.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: fix long standing bug with usb after hibernation with 4GB ram

aperture_64.c takes a piece of memory and makes it into iommu
window... but such window may not be saved by swsusp -- that leads to
oops during hibernation.

Signed-off-by: Pavel Machek <pavel@suse.cz>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: hpet clock enable quirk on nVidia nForce 430

this patch allows hpet=force on nVidia nForce 430 southbridge.
This patch was tested by me on my old Asus A8N-VM CSM (where bios does not
support hpet and does not advertise it via acpi entry). My nForce430 version:
lspci -nn | grep LPC
00:0a.0 ISA bridge [0601]: nVidia Corporation MCP51 LPC Bridge [10de:0260]
(rev a2)

Kernel 2.6.24.3 after patching and using hpet=force reports this:
dmesg | grep -i hpet
Kernel command line: root=/dev/sda8 ro vga=773 video=vesafb:mtrr:4,ywrap
vt.default_utf8=0 hpet=force
Force enabled HPET at base address 0xfed00000
hpet clockevent registered
Time: hpet clocksource has been installed.

grep -i hpet /proc/timer_list
Clock Event Device: hpet
set_next_event: hpet_legacy_next_event
set_mode: hpet_legacy_set_mode

grep Clock /proc/timer_list (before patching)
Clock Event Device: pit
Clock Event Device: lapic

grep Clock /proc/timer_list (after patching)
Clock Event Device: hpet
Clock Event Device: lapic

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: reserve dma32 early for gart

a system with 256 GB of RAM, when NUMA is disabled crashes the
following way:

Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Cannot allocate aperture memory hole (ffff8101c0000000,65536K)
Kernel panic - not syncing: Not enough memory for aperture
Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33

Call Trace:
[<ffffffff84037c62>] panic+0xb2/0x190
[<ffffffff840381fc>] ? release_console_sem+0x7c/0x250
[<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90
[<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50
[<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680
[<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310
[<ffffffff84506a2f>] ? _etext+0x0/0x1
[<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40
[<ffffffff847ac795>] mem_init+0x45/0x1a0
[<ffffffff8479ff35>] start_kernel+0x295/0x380
[<ffffffff8479f1c2>] _sinittext+0x1c2/0x230

the root cause is : memmap PMD is too big,
[ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0
almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G.

solution will be:
1. make memmap allocation get memory above 4G...
2. reserve some dma32 range early before we try to set up memmap for all.
and release that before pci_iommu_alloc, so gart or swiotlb could get some
range under 4g limit for sure.

the patch is using method 2.
because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP

will get
Your BIOS doesn't leave a aperture memory hole
Please enable the IOMMU option in the BIOS setup
This costs you 64 MB of RAM
Mapping aperture over 65536 KB of RAM @ 4000000
Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init)

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: add the DFF (Desktop Form Factor) Dell Optiplex 745 to the reboot errata list

We recently got some of the "Desktop Form Factor" Optiplex 745's in. I
noticed that there's an entry for the SFF one's, but the BIOS model number
of the DFF differs from that of the SFF. We have been reliably
experiencing the same (as far as I can tell) reboot bug as the SFF boxes.

Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86/visws: fix printk format warnings

Fix visws printk format warnings:

/local/linsrc/linux-2.6.24-git15/arch/x86/mach-visws/traps.c:50: warning: format '%#lx' expects type 'long unsigned int', but argument 2 has type 'u32'
/local/linsrc/linux-2.6.24-git15/arch/x86/mach-visws/traps.c:50: warning: format '%#lx' expects type 'long unsigned int', but argument 3 has type 'u32'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: fix {clear,copy}_user_page() declarations in page.h

Clean up: eliminate some compiler noise on x86 when building with strict
warnings enabled, introduced by commit 345b904c.

In file included from include2/asm/thread_info_64.h:12,
                 from include2/asm/thread_info.h:4,
                 from
/home/cel/src/linux/nfs-2.6/include/linux/thread_info.h:35,
                 from
/home/cel/src/linux/nfs-2.6/include/linux/preempt.h:9,
                 from
/home/cel/src/linux/nfs-2.6/include/linux/spinlock.h:49,
                 from /home/cel/src/linux/nfs-2.6/include/linux/mmzone.h:7,
                 from /home/cel/src/linux/nfs-2.6/include/linux/gfp.h:4,
                 from /home/cel/src/linux/nfs-2.6/include/linux/slab.h:14,
                 from /home/cel/src/linux/nfs-2.6/fs/nfsd/nfs4acl.c:40:
include2/asm/page.h:55: warning: `inline' is not at beginning of
declaration
include2/asm/page.h:61: warning: `inline' is not at beginning of
declaration

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: cast cmpxchg and cmpxchg_local result for 386 and 486

mm/slub.c: In function 'slab_alloc':
mm/slub.c:1637: warning: assignment makes pointer from integer without a cast
mm/slub.c:1637: warning: assignment makes pointer from integer without a cast
mm/slub.c: In function 'slab_free':
mm/slub.c:1796: warning: assignment makes pointer from integer without a cast
mm/slub.c:1796: warning: assignment makes pointer from integer without a cast

A cast is needed in the 386 and 486 code because the type is a pointer. In
every other integer case the original cmpxchg code (and the cmpxchg_local
which has been copied from it) worked fine, but since we touch a pointer,
the type needs to be casted in the cmpxchg_local and cmpxchg macros.

The more recent code (586+) does not have this problem (the cast is already
there).

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: tight online check in setup_per_cpu_areas

when numa disabled I got this compile warning:

arch/x86/kernel/setup64.c: In function setup_per_cpu_areas:
arch/x86/kernel/setup64.c:147: warning: the address of
contig_page_data will always evaluate as true

it seems we missed checking if the node is online before we try to refer
NODE_DATA. Fix it.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

x86: fix dma_alloc_pages

memory-less node support:

this patch uses updated dev_to_node, because dev_to_node already makes sure
it returns an online node.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

sched: add arch_update_cpu_topology hook.

Will be called each time the scheduling domains are rebuild.
Needed for architectures that don't have a static cpu topology.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: add exported arch_reinit_sched_domains() to header file.

Needed so it can be called from outside of sched.c.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: remove double unlikely from schedule()

Combine two unlikely's

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

sched: cleanup old and rarely used 'debug' features.

TREE_AVG and APPROX_AVG are initial task placement policies that have been
disabled for a long while.. time to remove them.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Fix atomic backoff limit.

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
  [NET] ifb: set separate lockdep classes for queue locks
  [IPV6] KCONFIG: Fix description about IPV6_TUNNEL.
  [TCP]: Fix shrinking windows with window scaling
  netpoll: zap_completion_queue: adjust skb->users counter
  bridge: use time_before() in br_fdb_cleanup()
  [TG3]: Fix build warning on sparc32.
  MAINTAINERS: bluez-devel is subscribers-only
  audit: netlink socket can be auto-bound to pid other than current->pid (v2)
  [NET]: Fix permissions of /proc/net
  [SCTP]: Fix a race between module load and protosw access
  [NETFILTER]: ipt_recent: sanity check hit count
  [NETFILTER]: nf_conntrack_h323: logical-bitwise & confusion in process_setup()
  [RT2X00] drivers/net/wireless/rt2x00/rt2x00dev.c: remove dead code, fix warning
  [IPV4]: esp_output() misannotations
  [8021Q]: vlan_dev misannotations
  xfrm: ->eth_proto is __be16
  [IPV4]: ipv4_is_lbcast() misannotations
  [SUNRPC]: net/* NULL noise
  [SCTP]: fix misannotated __sctp_rcv_asconf_lookup()
  [PKT_SCHED]: annotate cls_u32
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6.25

* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6.25:
  sh: Use relative paths for mach/cpu symlinks.
  SH: Use newer, non-deprecated __SPIN_LOCK_UNLOCKED macro.
  sh: Fix more user header breakage from sh64 integration.
  sh: Fix uImage build error.
  sh: Fix up the timer IRQ definition for SH7203.
  sh: Fix up the address error exception handler for SH-2.
  serial: sh-sci: Fix fifo stall on SH7760/SH7780/SH7785 SCIF.

sh: Use relative paths for mach/cpu symlinks.

When building the kernel without passing the O= command line parameter
there's no point to use absolute paths for them.

Usually relative paths are preferred because they survive directory
moves, work across networked file systems and chrooted environments.

Absolute paths are still used if an output directory is given.

Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

SH: Use newer, non-deprecated __SPIN_LOCK_UNLOCKED macro.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

[NET] ifb: set separate lockdep classes for queue locks

[   10.536424] =======================================================
[   10.536424] [ INFO: possible circular locking dependency detected ]
[   10.536424] 2.6.25-rc3-devel #3
[   10.536424] -------------------------------------------------------
[   10.536424] swapper/0 is trying to acquire lock:
[   10.536424]  (&dev->queue_lock){-+..}, at: [<c0299b4a>]
dev_queue_xmit+0x175/0x2f3
[   10.536424]
[   10.536424] but task is already holding lock:
[   10.536424]  (&p->tcfc_lock){-+..}, at: [<f8a67154>] tcf_mirred+0x20/0x178
[act_mirred]
[   10.536424]
[   10.536424] which lock already depends on the new lock.

lockdep warns of locking order while using ifb with sch_ingress and
act_mirred: ingress_lock, tcfc_lock, queue_lock (usually queue_lock
is at the beginning). This patch is only to tell lockdep that ifb is
a different device (e.g. from eth) and has its own pair of queue
locks. (This warning is a false-positive in common scenario of using
ifb; yet there are possible situations, when this order could be
dangerous; lockdep should warn in such a case.) (With suggestions by
David S. Miller)

Reported-and-tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>

[IPV6] KCONFIG: Fix description about IPV6_TUNNEL.

Based on notice from "Colin" <colins@sjtu.edu.cn>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

[TCP]: Fix shrinking windows with window scaling

When selecting a new window, tcp_select_window() tries not to shrink
the offered window by using the maximum of the remaining offered window
size and the newly calculated window size. The newly calculated window
size is always a multiple of the window scaling factor, the remaining
window size however might not be since it depends on rcv_wup/rcv_nxt.
This means we're effectively shrinking the window when scaling it down.

The dump below shows the problem (scaling factor 2^7):

- Window size of 557 (71296) is advertised, up to 3111907257:

IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...>

- New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
below the last end:

IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...>

The number 40 results from downscaling the remaining window:

3111907257 - 3111841425 = 65832
65832 / 2^7 = 514
65832 % 2^7 = 40

If the sender uses up the entire window before it is shrunk, this can have
chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
will notice that the window has been shrunk since tcp_wnd_end() is before
tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
This will fail the receivers checks in tcp_sequence() however since it
is before it's tp->rcv_wup, making it respond with a dupack.

If both sides are in this condition, this leads to a constant flood of
ACKs until the connection times out.

Make sure the window is never shrunk by aligning the remaining window to
the window scaling factor.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netpoll: zap_completion_queue: adjust skb->users counter

zap_completion_queue() retrieves skbs from completion_queue where they have
zero skb->users counter. Before dev_kfree_skb_any() it should be non-zero
yet, so it's increased now.

Reported-and-tested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: use time_before() in br_fdb_cleanup()

In br_fdb_cleanup() next_timer and this_timer are in jiffies, so they
should be compared using the time_after() macro.

Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it>
Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

[TG3]: Fix build warning on sparc32.

Sparc MAC address support should be protected consistently
with CONFIG_SPARC, but there was a stray CONFIG_SPARC64
case.

Bump driver version and release date.

Reported by Andrew Morton.

Signed-off-by: David S. Miller <davem@davemloft.net>