git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/log

intel-iommu: Don't use identity mapping for PCI devices behind bridges

Our current strategy for pass-through mode is to put all devices into
the 1:1 domain at startup (which is before we know what their dma_mask
will be), and only _later_ take them out of that domain, if it turns out
that they really can't address all of memory.

However, when there are a bunch of PCI devices behind a bridge, they all
end up with the same source-id on their DMA transactions, and hence in
the same IOMMU domain. This means that we _can't_ easily move them from
the 1:1 domain into their own domain at runtime, because there might be DMA
in-flight from their siblings.

So we have to adjust our pass-through strategy: For PCI devices not on
the root bus, and for the bridges which will take responsibility for
their transactions, we have to start up _out_ of the 1:1 domain, just in
case.

This fixes the BUG() we see when we have 32-bit-capable devices behind a
PCI-PCI bridge, and use the software identity mapping.

It does mean that we might end up using 'normal' mapping mode for some
devices which could actually live with the faster 1:1 mapping -- but
this is only for PCI devices behind bridges, which presumably aren't the
devices for which people are most concerned about performance.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: Use iommu_should_identity_map() at startup time too.

At boot time, the dma_mask won't have been set on any devices, so we
assume that all devices will be 64-bit capable (and thus get a 1:1 map).

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: No mapping for non-PCI devices

This should fix kernel.org bug #11821, where the dcdbas driver makes up
a platform device and then uses dma_alloc_coherent() on it, in an
attempt to get memory < 4GiB.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: Restore DMAR_BROKEN_GFX_WA option for broken graphics drivers

We need to give people a little more time to fix the broken drivers.
Re-introduce this, but tied in properly with the 'iommu=pt' support this
time. Change the config option name and make it default to 'no' too.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: Add iommu_should_identity_map() function

We do this twice, and it's about to get more complicated. This makes the
code slightly clearer about what it's doing, too.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: Fix reattaching of devices to identity mapping domain

When we reattach a device to the si_domain (because it's been removed
from a VM), we weren't calling domain_context_mapping() to actually tell
the hardware about that.

We should really put the call to domain_context_mapping() into
domain_add_dev_info() -- we never call the latter without also doing the
former, and we can keep the error paths simple that way. But that's a
cleanup which can wait for 2.6.32 now.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: Don't set identity mapping for bypassed graphics devices

We should check iommu_dummy() _first_, because that means it's attached
to an iommu that we've just disabled completely. At the moment, we might
try to put the device into the identity mapping domain.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: Fix dma vs. mm page confusion with aligned_nrpages()

The aligned_nrpages() function rounds up to the next VM page, but
returns its result as a number of DMA pages.

Purely theoretical except on IA64, which doesn't boot with VT-d right
now anyway.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

Merge git://git.infradead.org/iommu-2.6

* git://git.infradead.org/iommu-2.6: (38 commits)
  intel-iommu: Don't keep freeing page zero in dma_pte_free_pagetable()
  intel-iommu: Introduce first_pte_in_page() to simplify PTE-setting loops
  intel-iommu: Use cmpxchg64_local() for setting PTEs
  intel-iommu: Warn about unmatched unmap requests
  intel-iommu: Kill superfluous mapping_lock
  intel-iommu: Ensure that PTE writes are 64-bit atomic, even on i386
  intel-iommu: Make iommu=pt work on i386 too
  intel-iommu: Performance improvement for dma_pte_free_pagetable()
  intel-iommu: Don't free too much in dma_pte_free_pagetable()
  intel-iommu: dump mappings but don't die on pte already set
  intel-iommu: Combine domain_pfn_mapping() and domain_sg_mapping()
  intel-iommu: Introduce domain_sg_mapping() to speed up intel_map_sg()
  intel-iommu: Simplify __intel_alloc_iova()
  intel-iommu: Performance improvement for domain_pfn_mapping()
  intel-iommu: Performance improvement for dma_pte_clear_range()
  intel-iommu: Clean up iommu_domain_identity_map()
  intel-iommu: Remove last use of PHYSICAL_PAGE_MASK, for reserving PCI BARs
  intel-iommu: Make iommu_flush_iotlb_psi() take pfn as argument
  intel-iommu: Change aligned_size() to aligned_nrpages()
  intel-iommu: Clean up intel_map_sg(), remove domain_page_mapping()
  ...

x86: add boundary check for 32bit res before expand e820 resource to alignment

fix hang with HIGHMEM_64G and 32bit resource. According to hpa and
Linus, use (resource_size_t)-1 to fend off big ranges.

Analyzed by hpa

Reported-and-tested-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: fix power-of-2 round_up/round_down macros

These macros had two bugs:
- the type of the mask was not correctly expanded to the full size of
   the argument being expanded, resulting in possible loss of high bits
   when mixing types.
- the alignment argument was evaluated twice, despite the macro looking
   like a fancy function (but it really does need to be a macro, since
   it works on arbitrary integer types)

Noticed by Peter Anvin, and with a fix that is a modification of his
suggestion (bug noticed by Yinghai Lu).

Cc: Peter Anvin <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

intel-iommu: Don't keep freeing page zero in dma_pte_free_pagetable()

Check dma_pte_present() and only free the page if there _is_ one.
Kind of surprising that there was no warning about this.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: Introduce first_pte_in_page() to simplify PTE-setting loops

On Wed, 2009-07-01 at 16:59 -0700, Linus Torvalds wrote:
> I also _really_ hate how you do
>
> (unsigned long)pte >> VTD_PAGE_SHIFT ==
> (unsigned long)first_pte >> VTD_PAGE_SHIFT

Kill this, in favour of just looking to see if the incremented pte
pointer has 'wrapped' onto the next page. Which means we have to check
it _after_ incrementing it, not before.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

FRV: Add basic performance counter support

Add basic performance counter support to the FRV arch.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

FRV: Implement atomic64_t

Implement atomic64_t and its ops for FRV. Tested with the following patch:

diff --git a/arch/frv/kernel/setup.c b/arch/frv/kernel/setup.c
index 55e4fab..086d50d 100644
--- a/arch/frv/kernel/setup.c
+++ b/arch/frv/kernel/setup.c
@@ -746,6 +746,52 @@ static void __init parse_cmdline_early(char *cmdline)

} /* end parse_cmdline_early() */

+static atomic64_t xxx;
+
+static void test_atomic64(void)
+{
+ atomic64_set(&xxx, 0x12300000023LL);
+
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x12300000023LL);
+ mb();
+ if (atomic64_inc_return(&xxx) != 0x12300000024LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x12300000024LL);
+ mb();
+ if (atomic64_sub_return(0x36900000050LL, &xxx) != -0x2460000002cLL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != -0x2460000002cLL);
+ mb();
+ if (atomic64_dec_return(&xxx) != -0x2460000002dLL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != -0x2460000002dLL);
+ mb();
+ if (atomic64_add_return(0x36800000001LL, &xxx) != 0x121ffffffd4LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x121ffffffd4LL);
+ mb();
+ if (atomic64_cmpxchg(&xxx, 0x123456789abcdefLL, 0x121ffffffd4LL) != 0x121ffffffd4LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x121ffffffd4LL);
+ mb();
+ if (atomic64_cmpxchg(&xxx, 0x121ffffffd4LL, 0x123456789abcdefLL) != 0x121ffffffd4LL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0x123456789abcdefLL);
+ mb();
+ if (atomic64_xchg(&xxx, 0xabcdef123456789LL) != 0x123456789abcdefLL)
+ BUG();
+ mb();
+ BUG_ON(atomic64_read(&xxx) != 0xabcdef123456789LL);
+ mb();
+}
+
/*****************************************************************************/
/*
*
@@ -845,6 +891,8 @@ void __init setup_arch(char **cmdline_p)
// asm volatile("movgs %0,timerd" :: "r"(10000000));
// __set_HSR(0, __get_HSR(0) | HSR0_ETMD);

+ test_atomic64();
+
} /* end setup_arch() */

#if 0

Note that this doesn't cover all the trivial wrappers, but does cover all the
substantial implementations.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

intel-iommu: Use cmpxchg64_local() for setting PTEs

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

intel-iommu: Warn about unmatched unmap requests

This would have found the bug in i386 pci_unmap_addr() a long time ago.
We shouldn't just silently return without doing anything.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: LCDC dcache flush for deferred io
  sh: Fix compiler error and include the definition of IS_ERR_VALUE
  sh: re-add LCDC fbdev support to the Migo-R defconfig
  sh: fix se7724 ceu names
  sh: ms7724se: Enable sh_eth in defconfig.
  arch/sh/boards/mach-se/7206/io.c: Remove unnecessary semicolons
  sh: ms7724se: Add sh_eth support
  nommu: provide follow_pfn().
  sh: Kill off unused DEBUG_BOOTMEM symbol.
  perf_counter tools: add cpu_relax()/rmb() definitions for sh.
  sh64: Hook up page fault events for software perf counters.
  sh: Hook up page fault events for software perf counters.
  sh: make set_perf_counter_pending() static inline.
  clocksource: sh_tmu: Make undefined TCOR behaviour less undefined.

intel-iommu: Kill superfluous mapping_lock

Since we're using cmpxchg64() anyway (because that's the only way to do
an atomic 64-bit store on i386), we might as well ditch the extra
locking and just use cmpxchg64() to ensure that we don't add the page
twice.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

sh: LCDC dcache flush for deferred io

Since writenotify on uncached vmas is unsupported in 2.6.31,
live with cached framebuffer memory in the deferred io
case for now and flush the dcache before forcing refresh.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Magnus damm <damm@igel.co.jp>

sh: Fix compiler error and include the definition of IS_ERR_VALUE

When arch/sh/include/asm/syscall_32.h is included from a file that
doesn't also include linux/err.h the following error is produced,

In file included from /home/matt/src/kernels/sh-2.6/arch/sh/include/asm/syscall.h:5,
from kernel/trace/trace_syscalls.c:3:
/home/matt/src/kernels/sh-2.6/arch/sh/include/asm/syscall_32.h: In function 'syscall_get_error':
/home/matt/src/kernels/sh-2.6/arch/sh/include/asm/syscall_32.h:28: error: implicit declaration of function 'IS_ERR_VALUE'
make[2]: *** [kernel/trace/trace_syscalls.o] Error 1
make[1]: *** [kernel/trace] Error 2
make: *** [kernel] Error 2

Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

kernel-doc: move ignoring kmemcheck

Somehow I managed to generate a diff that put these 2 lines
into the wrong function: should have been in dump_struct()
instead of in dump_enum().

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.infradead.org/mtd-2.6

* git://git.infradead.org/mtd-2.6:
  mtd: nand: fix build failure and incorrect return from omap_wait()
  mtd: Use BLOCK_NIL consistently in NFTL/INFTL
  mtd: m25p80 timeout too short for worst-case m25p16 devices
  mtd: atmel_nand: Fix typo s/parititions/partitions/
  mtd: cmdlineparts: Use 64-bit format when printing a debug message.
  mtd: maps: Remove BUS_ID_SIZE from integrator_flash
  jffs2: fix another potential leak on error path in scan.c

intel-iommu: Ensure that PTE writes are 64-bit atomic, even on i386

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: invalidation reverse calls
  fuse: allow umask processing in userspace
  fuse: fix bad return value in fuse_file_poll()
  fuse: fix return value of fuse_dev_write()

Fix iommu address space allocation

This fixes kernel.org bug #13584. The IOVA code attempted to optimise
the insertion of new ranges into the rbtree, with the unfortunate result
that some ranges just didn't get inserted into the tree at all. Then
those ranges would be handed out more than once, and things kind of go
downhill from there.

Introduced after 2.6.25 by ddf02886cbe665d67ca750750196ea5bf524b10b
("PCI: iova RB tree setup tweak").

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: mark gross <mgross@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Fix pci_unmap_addr() et al on i386.

We can run a 32-bit kernel on boxes with an IOMMU, so we need
pci_unmap_addr() etc. to work -- without it, drivers will leak mappings.

To be honest, this whole thing looks like it's more pain than it's
worth; I'm half inclined to remove the no-op #else case altogether.

But this is the minimal fix, which just does the right thing if
CONFIG_DMAR is set.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org [ for 2.6.30 ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

elf: fix one check-after-use

Check before use it.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

intel-iommu: Make iommu=pt work on i386 too

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cfq-iosched: remove redundant check for NULL cfqq in cfq_set_request()
  blocK: Restore barrier support for md and probably other virtual devices.
  block: get rid of queue-private command filter
  block: Create bip slabs with embedded integrity vectors
  cfq-iosched: get rid of the need for __GFP_NOFAIL in cfq_find_alloc_queue()
  cfq-iosched: move cfqq initialization out of cfq_find_alloc_queue()
  Trivial typo fixes in Documentation/block/data-integrity.txt.

Merge branch 'for-linus' of git://neil.brown.name/md

* 'for-linus' of git://neil.brown.name/md:
  md: use interruptible wait when duration is controlled by userspace.
  md/raid5: suspend shouldn't affect read requests.
  md: tidy up error paths in md_alloc
  md: fix error path when duplicate name is found on md device creation.
  md: avoid dereferencing NULL pointer when accessing suspend_* sysfs attributes.
  md: Use new topology calls to indicate alignment and I/O sizes

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits)
  Revert "ipv4: arp announce, arp_proxy and windows ip conflict verification"
  igb: return PCI_ERS_RESULT_DISCONNECT on permanent error
  e1000e: io_error_detected callback should return PCI_ERS_RESULT_DISCONNECT
  e1000: return PCI_ERS_RESULT_DISCONNECT on permanent error
  e1000: fix unmap bug
  igb: fix unmap length bug
  ixgbe: fix unmap length bug
  ixgbe: Fix link capabilities during adapter resets
  ixgbe: Fix device capabilities of 82599 single speed fiber NICs.
  ixgbe: Fix SFP log messages
  usbnet: Remove private stats structure
  usbnet: Use netdev stats structure
  smsc95xx: Use netdev stats structure
  rndis_host: Use netdev stats structure
  net1080: Use netdev stats structure
  dm9601: Use netdev stats structure
  cdc_eem: Use netdev stats structure
  ipv4: Fix fib_trie rebalancing, part 3
  bnx2x: Fix the behavior of ethtool when ONBOOT=no
  sctp: xmit sctp packet always return no route error
  ...

kmemleak: Fix scheduling-while-atomic bug

One of the kmemleak changes caused the following
scheduling-while-holding-the-tasklist-lock regression on x86:

BUG: sleeping function called from invalid context at mm/kmemleak.c:795
in_atomic(): 1, irqs_disabled(): 0, pid: 1737, name: kmemleak
2 locks held by kmemleak/1737:
#0:  (scan_mutex){......}, at: [<c10c4376>] kmemleak_scan_thread+0x45/0x86
#1:  (tasklist_lock){......}, at: [<c10c3bb4>] kmemleak_scan+0x1a9/0x39c
Pid: 1737, comm: kmemleak Not tainted 2.6.31-rc1-tip #59266
Call Trace:
[<c105ac0f>] ? __debug_show_held_locks+0x1e/0x20
[<c102e490>] __might_sleep+0x10a/0x111
[<c10c38d5>] scan_yield+0x17/0x3b
[<c10c3970>] scan_block+0x39/0xd4
[<c10c3bc6>] kmemleak_scan+0x1bb/0x39c
[<c10c4331>] ? kmemleak_scan_thread+0x0/0x86
[<c10c437b>] kmemleak_scan_thread+0x4a/0x86
[<c104d73e>] kthread+0x6e/0x73
[<c104d6d0>] ? kthread+0x0/0x73
[<c100959f>] kernel_thread_helper+0x7/0x10
kmemleak: 834 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

The bit causing it is highly dubious:

static void scan_yield(void)
{
        might_sleep();

        if (time_is_before_eq_jiffies(next_scan_yield)) {
                schedule();
                next_scan_yield = jiffies + jiffies_scan_yield;
        }
}

It called deep inside the codepath and in a conditional way,
and that is what crapped up when one of the new scan_block()
uses grew a tasklist_lock dependency.

This minimal patch removes that yielding stuff and adds the
proper cond_resched().

The background scanning thread could probably also be reniced
to +10.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cfq-iosched: remove redundant check for NULL cfqq in cfq_set_request()

With the changes for falling back to an oom_cfqq, we never fail
to find/allocate a queue in cfq_get_queue(). So remove the check.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

blocK: Restore barrier support for md and probably other virtual devices.

The next_ordered flag is only meaningful for devices that use __make_request.
So move the test against next_ordered out of generic code and in to
__make_request

Since this test was added, barriers have not worked on md or any
devices that don't use __make_request and so don't bother to set
next_ordered. (dm explicitly sets something other than
QUEUE_ORDERED_NONE since
commit 99360b4c18f7675b50d283301d46d755affe75fd
but notes in the comments that it is otherwise meaningless).

Cc: Ken Milmore <ken.milmore@googlemail.com>
Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: get rid of queue-private command filter

The initial patches to support this through sysfs export were broken
and have been if 0'ed out in any release. So lets just kill the code
and reclaim some space in struct request_queue, if anyone would later
like to fixup the sysfs bits, the git history can easily restore
the removed bits.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

block: Create bip slabs with embedded integrity vectors

This patch restores stacking ability to the block layer integrity
infrastructure by creating a set of dedicated bip slabs.  Each bip slab
has an embedded bio_vec array at the end.  This cuts down on memory
allocations and also simplifies the code compared to the original bvec
version.  Only the largest bip slab is backed by a mempool.  The pool is
contained in the bio_set so stacking drivers can ensure forward
progress.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@carl.(none)>

cfq-iosched: get rid of the need for __GFP_NOFAIL in cfq_find_alloc_queue()

Setup an emergency fallback cfqq that we allocate at IO scheduler init
time. If the slab allocation fails in cfq_find_alloc_queue(), we'll just
punt IO to that cfqq instead. This ensures that cfq_find_alloc_queue()
never fails without having to ensure free memory.

On cfqq lookup, always try to allocate a new cfqq if the given cfq io
context has the oom_cfqq assigned. This ensures that we only temporarily
punt to this shared queue.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

cfq-iosched: move cfqq initialization out of cfq_find_alloc_queue()

We're going to be needing that init code outside of that function
to get rid of the __GFP_NOFAIL in cfqq allocation.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

Trivial typo fixes in Documentation/block/data-integrity.txt.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

sh: re-add LCDC fbdev support to the Migo-R defconfig

Re-add LCDC fbdev support to the Migo-R defconfig.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

sh: fix se7724 ceu names

Use "ceu0" and "ceu1" as CEU names instead of "ceu".
This fixes "memchunk" kernel command line selection
on the solution engine 7724 board.

With this patch applied use "memchunk.ceu0=1m" or
"memchunk.ceu1=1m" on kernel command line to override
physically memory size to one meg for CEU0 or CEU1.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

md: use interruptible wait when duration is controlled by userspace.

User space can set various limits on an md array so that resync waits
when it gets to a certain point, or so that I/O is blocked for a short
while.
When md is waiting against one of these limit, it should use an
interruptible wait so as not to add to the load average, and so are
not to trigger a warning if the wait goes on for too long.

Signed-off-by: NeilBrown <neilb@suse.de>

md/raid5: suspend shouldn't affect read requests.

md allows write to regions on an array to be suspended temporarily.
This allows user-space to participate is aspects of reshape.
In particular, data can be copied with not risk of a race.
We should not be blocking read requests though, so don't.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>

Revert "ipv4: arp announce, arp_proxy and windows ip conflict verification"

This reverts commit 73ce7b01b4496a5fbf9caf63033c874be692333f.

After discovering that we don't listen to gratuitious arps in 2.6.30
I tracked the failure down to this commit.

The patch makes absolutely no sense. RFC2131 RFC3927 and RFC5227.
are all in agreement that an arp request with sip == 0 should be used
for the probe (to prevent learning) and an arp request with sip == tip
should be used for the gratitous announcement that people can learn
from.

It appears the author of the broken patch got those two cases confused
and modified the code to drop all gratuitous arp traffic. Ouch!

Cc: stable@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: return PCI_ERS_RESULT_DISCONNECT on permanent error

PCI drivers that implement the io_error_detected callback should return
PCI_ERS_RESULT_DISCONNECT if the state passed in is
pci_channel_io_perm_failure. This patch fixes the issue for igb.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: io_error_detected callback should return PCI_ERS_RESULT_DISCONNECT

on permanent failure

PCI drivers that implement the io_error_detected callback
should return PCI_ERS_RESULT_DISCONNECT if the state
passed in is pci_channel_io_perm_failure. This state is not
checked in many of the network drivers.

This patch fixes the omission in the e1000e driver.

Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: return PCI_ERS_RESULT_DISCONNECT on permanent error

PCI drivers that implement the io_error_detected callback
should return PCI_ERS_RESULT_DISCONNECT if the state
passed in is pci_channel_io_perm_failure. This state is
not checked in many of the network drivers.

The patch fixes the omission in the e1000 driver.

Based on Mike Mason's similar patch for e1000e.

Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
CC: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: fix unmap bug

as reported by kerneloops.org

[  121.781161] ------------[ cut here ]------------
[  121.781171] WARNING: at lib/dma-debug.c:793 check_unmap+0x14e/0x577()
[  121.781173] Hardware name: S5520HC
[  121.781177] e1000 0000:0a:00.0: DMA-API: device driver tries to free DMA
memory it has not allocated [device address=0x00000001d688b0fa] [size=1522
bytes]
[  121.781180] Modules linked in: e1000 mdio  dca [last unloaded: ixgbe]
[  121.781187] Pid: 4793, comm: bash Tainted: P 2.6.30-master-06161113 #3
[  121.781190] Call Trace:
[  121.781195]  [<ffffffff8123056f>] ? check_unmap+0x14e/0x577
[  121.781201]  [<ffffffff81057a19>] warn_slowpath_common+0x77/0x8f
[  121.781205]  [<ffffffff81057ae1>] warn_slowpath_fmt+0x9f/0xa1
[  121.781212]  [<ffffffff81477ce2>] ? _spin_lock_irqsave+0x3f/0x49
[  121.781216]  [<ffffffff8122fa97>] ? get_hash_bucket+0x28/0x33
[  121.781220]  [<ffffffff8123056f>] check_unmap+0x14e/0x577
[  121.781225]  [<ffffffff810e4f48>] ? check_bytes_and_report+0x38/0xcb
[  121.781230]  [<ffffffff81230bbf>] debug_dma_unmap_page+0x80/0x92
[  121.781234]  [<ffffffff8122e549>] ? unmap_single+0x1a/0x4e
[  121.781239]  [<ffffffff813901e1>] ? __kfree_skb+0x74/0x78
[  121.781250]  [<ffffffffa00662ef>] pci_unmap_single+0x64/0x6d [e1000]
[  121.781259]  [<ffffffffa0066344>] e1000_clean_rx_ring+0x4c/0xbf [e1000]
[  121.781268]  [<ffffffffa00663df>] e1000_clean_all_rx_rings+0x28/0x36 [e1000]
[  121.781277]  [<ffffffffa0067464>] e1000_down+0x138/0x141 [e1000]
[  121.781286]  [<ffffffffa00681c2>] __e1000_shutdown+0x6b/0x198 [e1000]
[  121.781296]  [<ffffffffa0068405>] e1000_suspend+0x17/0x50 [e1000]
[  121.781301]  [<ffffffff81237665>] pci_legacy_suspend+0x3b/0xbe
[  121.781305]  [<ffffffff81237bc6>] pci_pm_suspend+0x3e/0xf1
[  121.781310]  [<ffffffff812eaf1c>] pm_op+0x57/0xde
[  121.781314]  [<ffffffff812eb444>] dpm_suspend_start+0x31e/0x470
[  121.781319]  [<ffffffff810877da>] suspend_devices_and_enter+0x3e/0x1a2
[  121.781323]  [<ffffffff81087a0f>] enter_state+0xd1/0x127
[  121.781327]  [<ffffffff8108717a>] state_store+0xa7/0xc9
[  121.781332]  [<ffffffff81221843>] kobj_attr_store+0x17/0x19
[  121.781336]  [<ffffffff8113c01e>] sysfs_write_file+0xe5/0x121
[  121.781341]  [<ffffffff810ed165>] vfs_write+0xab/0x105
[  121.781344]  [<ffffffff810ed279>] sys_write+0x47/0x6d
[  121.781349]  [<ffffffff81027aab>] system_call_fastpath+0x16/0x1b
[  121.781352] ---[ end trace 97bacaaac2ed7786 ]---

Fix is to correctly zero out internal ->dma value when unmapping
and make sure never to unmap unless there specifically was a mapping done.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: fix unmap length bug

driver was mixing NET_IP_ALIGN count bytes in map/unmap calls
unevenly. Only map the bytes that the hardware might dma into

also fix unmap related bug where ->dma was not being cleared
after unmap

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: fix unmap length bug

This patch addresses three WARN_ON statements from DMA-API debug code

ixgbe is mapping more than it unmaps, reduce the length of the map call and
remove the "used once" local variable.

found by Joerg Roedel <joerg.roedel@amd.com> in 2.6.30, so is a candidate
for -stable.

in addition, fix missing ->dma = 0 after unmap to prevent double free with
pci_unmap_single

and lastly, don't unmap (half) pages that aren't mapped.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Fix link capabilities during adapter resets

Adapter link advertisement capabilities were not persistent during
adapter resets. While configuring multispeed fiber link check for
phy autoneg_advertised settings before overwriting with default
link capabilities

Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Fix device capabilities of 82599 single speed fiber NICs.

82599 single speed fiber modules only support 10G/Full. Return
proper device capabilities while querrying the adapter and error
while changing device advertisement/speed/duplex capabilities.

Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Fix SFP log messages

We had a wide range of log messages for the same sort of SFP
failure. This patch makes them all more similar and less
confusing along with converting them to dev_err.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

usbnet: Remove private stats structure

Now that nothing uses the private stats structure we can remove it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

usbnet: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

smsc95xx: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

rndis_host: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net1080: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

dm9601: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

cdc_eem: Use netdev stats structure

Now that netdev has its own stats structure we should use that
instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

md: tidy up error paths in md_alloc

As the recent bug in md_alloc showed, having a single exit path for
unlocking and putting is a good idea. So restructure md_alloc to have
a single mutex_unlock and mddev_put, and use gotos where necessary.

Found-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: fix error path when duplicate name is found on md device creation.

When an md device is created by name (rather than number) we need to
check that the name is not already in use. If this check finds a
duplicate, we return an error without dropping the lock or freeing
the newly create mddev.
This patch fixes that.

Cc: stable@kernel.org
Found-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6

* 'kmemleak' of git://linux-arm.org/linux-2.6:
  kmemleak: Inform kmemleak about pid_hash
  kmemleak: Do not warn if an unknown object is freed
  kmemleak: Do not report new leaked objects if the scanning was stopped
  kmemleak: Slightly change the policy on newly allocated objects
  kmemleak: Do not trigger a scan when reading the debug/kmemleak file
  kmemleak: Simplify the reports logged by the scanning thread
  kmemleak: Enable task stacks scanning by default
  kmemleak: Allow the early log buffer to be configurable.

Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm

* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm table: fix blk_stack_limits arg to use bytes not sectors
dm exception store: really fix type lookup

Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits)
  perf report: Add --symbols parameter
  perf report: Add --comms parameter
  perf report: Add --dsos parameter
  perf_counter tools: Adjust only prelinked symbol's addresses
  perf_counter: Provide a way to enable counters on exec
  perf_counter tools: Reduce perf stat measurement overhead/skew
  perf stat: Use percentages for scaling output
  perf_counter, x86: Update x86_pmu after WARN()
  perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
  perf stat: Improve output
  perf stat: Fix multi-run stats
  perf stat: Add -n/--null option to run without counters
  perf_counter tools: Remove dead code
  perf_counter: Complete counter swap
  perf report: Print sorted callchains per histogram entries
  perf_counter tools: Prepare a small callchain framework
  perf record: Fix unhandled io return value
  perf_counter tools: Add alias for 'l1d' and 'l1i'
  perf-report: Add bare minimum PERF_EVENT_READ parsing
  perf-report: Add modes for inherited stats and no-samples
  ...

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  Add Fenghua Yu as temporary co-maintainer for ia64
  [IA64] address compiler warnings perfmon.c/salinfo.c
  [IA64] Remove unnecessary semicolons
  [IA64] sprintf should not be used with same source & destination address

MN10300: Wire up new syscalls

Wire up new syscalls rt_tgsigqueueinfo and perf_counter_open.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

FRV: Wire up new syscalls

Wire up new syscalls rt_tgsigqueueinfo and perf_counter_open.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hostfs: set maximum filesize in superblock for proper LFS support

Maximum file size for hostfs mounts defaults to 2GB, so bigger files cannot be
read/written through hostfs. This patch initializes the maximum file size to
MAX_LFS_SIZE.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13531

Signed-off-by: Wolfgang Illmeyer <wolfgang@illmeyer.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

floppy: fix lock imbalance

A crappy macro prevents us unlocking on a fail path.

Expand the macro and unlock appropriatelly.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bfin: delay IRQ registration until driver is ready

Make sure we do not actually request the RTC IRQ until the device driver
is fully ready to handle and process any interrupt. This way a spurious
interrupt won't crash the system (which may happen if the bootloader was
poking the RTC right before booting Linux).

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

atyfb: fix alignment for block writes

Block writes require 64 byte alignment. Since block writes could be used
with SGRAM or WRAM also refine the memory type detection to check for
either type before deciding to use the 64 byte alignment.

Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

atyfb: fix HP OmniBook 500 reboot hang

Apparently HP OmniBook 500's BIOS doesn't like the way atyfb reprograms
the hardware. The BIOS will simply hang after a reboot. Fix the problem
by restoring the hardware to it's original state on reboot.

Signed-off-by: Ville Syrjala <syrjala@sci.fi>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gpio: pl061: fix IRQ handling for GPIOs >= PL061_GPIO_NR

IRQ handling is wrong for any GPIO >= PL061_GPIO_NR.

Fix this by implementing and using a proper .to_irq method.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gpio: pl061: fix probe error handling code

Note that IRQ has not been initialized when kmalloc() fails.

Also, use DECLARE_BITMAP() to make the code clearer.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: only clear node_states for 64bit

Nathan reported that

| commit 73d60b7f747176dbdff826c4127d22e1fd3f9f74
| Author: Yinghai Lu <yinghai@kernel.org>
| Date:   Tue Jun 16 15:33:00 2009 -0700
|
|    page-allocator: clear N_HIGH_MEMORY map before we set it again
|
|    SRAT tables may contains nodes of very small size.  The arch code may
|    decide to not activate such a node.  However, currently the early boot
|    code sets N_HIGH_MEMORY for such nodes.  These nodes therefore seem to be
|    active although these nodes have no present pages.
|
|    For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too

unintentionally and incorrectly clears the cpuset.mems cgroup attribute on
an i386 kvm guest, meaning that cpuset.mems can not be used.

Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
and need to do save/restore for that in find_zone_movable_pfn

Reported-by: Nathan Lynch <ntl@pobox.com>
Tested-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cpusets: document adding/removing cpus to cpuset elaborately

By writing a tasks's pid to the file, a process adds that task to that
cgroup/cpuset. But to add a cpu/mem to a cpuset, the new list of cpus
should be written to the cpuset.mems file which would replace the old list
of cpus. Make this clearer in the documentation.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: prevent balance_dirty_pages() from doing too much work

balance_dirty_pages can overreact and move all of the dirty pages to
writeback unnecessarily.

balance_dirty_pages makes its decision to throttle based on the number of
dirty plus writeback pages that are over the calculated limit,so it will
continue to move pages even when there are plenty of pages in writeback
and less than the threshold still dirty.

This allows it to overshoot its limits and move all the dirty pages to
writeback while waiting for the drives to catch up and empty the writeback
list.

A simple fio test easily demonstrates this problem.

fio --name=f1 --directory=/disk1 --size=2G -rw=write --name=f2 --directory=/disk2 --size=1G --rw=write --startdelay=10

This is the simplest fix I could find, but I'm not entirely sure that it
alone will be enough for all cases. But it certainly is an improvement on
my desktop machine writing to 2 disks.

Do we need something more for machines with large arrays where
bdi_threshold * number_of_drives is greater than the dirty_ratio ?

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

bsdacct: fix access to invalid filp in acct_on()

The file opened in acct_on and freshly stored in the ns->bacct struct can
be closed in acct_file_reopen by a concurrent call after we release
acct_lock and before we call mntput(file->f_path.mnt).

Record file->f_path.mnt in a local variable and use this variable only.

Signed-off-by: Renaud Lottiaux <renaud.lottiaux@kerlabs.com>
Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: STARFIRE/DURALAN update

Ion's cs.columbia.edu email address no longer works.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Ion Badulescu <ionut@badula.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kernel/resource.c: fix sign extension in reserve_setup()

When the 32-bit signed quantities get assigned to the u64 resource_size_t,
they are incorrectly sign-extended.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13253
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9905

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reported-by: Leann Ogasawara <leann@ubuntu.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Reported-by: <pablomme@googlemail.com>
Tested-by: <pablomme@googlemail.com>
Cc: <stable@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

spi: bitbang bugfix in message setup

Bugfix to spi_bitbang infrastructure: make sure to always set transfer
parameters on the first pass through the message's per-transfer loop.
This can matter with drivers that replace the per-word or per-buffer
transfer primitives, on busses with multiple SPI devices.

Previously, this could have started messages using the settings left after
previous messages. The problem was observed when a high speed chip
(m25p80 type flash) was running very slowly because a low speed device
(avr8 microcontroller) had previously used the bus. Similar faults could
have driven the low speed device too fast, or used an unexpected word
size.

Acked-by: Steven A. Falco <sfalco@harris.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fbdev: add mutex for fb_mmap locking

Add a mutex to avoid a circular locking problem between the mm layer
semaphore and fbdev ioctl mutex through the fb_mmap() call.

Also, add mutex to all places where smem_start and smem_len fields change
so the mutex inside the fb_mmap() is actually used.  Changing of these
fields before calling the framebuffer_register() are not mutexed.

This is 2.6.31 material.  It removes one lockdep (fb_mmap() and
register_framebuffer()) but there is still another one (fb_release() and
register_framebuffer()).  It also cleans up handling of the smem_start and
smem_len fields used by mutexed section of the fb_mmap().

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

spi: add spi_master flag word

Add a new spi_master.flags word listing constraints relevant to that
controller.  Define the first constraint bit: a half duplex restriction.
Include that constraint in the OMAP1 MicroWire controller driver.

Have the mmc_spi host be the first customer of this flag.  Its coding
relies heavily on full duplex transfers, so it must fail when the
underlying controller driver won't perform them.

(The spi_write_then_read routine could use it too: use the
temporarily-withdrawn full-duplex speedup unless this flag is set, in
which case the existing code applies.  Similarly, any spi_master
implementing only SPI_3WIRE should set the flag.)

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

spi: new spi->mode bits

Add two new spi_device.mode bits to accomodate more protocol options, and
pass them through to usermode drivers:

* SPI_NO_CS ... a second 3-wire variant, where the chipselect
   line is removed instead of a data line; transfers are still
   full duplex.

   This obviously has STRONG protocol implications since the
   chipselect transitions can't be used to synchronize state
   transitions with the SPI master.

* SPI_READY ... defines open drain signal that's pulled low
   to pause the clock.  This defines a 5-wire variant (normal
   4-wire SPI plus READY) and two 4-wire variants (READY plus
   each of the 3-wire flavors).

   Such hardware flow control can be a big win.  There are ADC
   converters and flash chips that expose READY signals, but not
   many host controllers support it today.

The spi_bitbang code should be changed to use SPI_NO_CS instead of its
current nonportable hack.  That's a mode most hardware can easily support
(unlike SPI_READY).

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: "Paulraj, Sandeep" <s-paulraj@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dmapools: protect page_list walk in show_pools()

show_pools() walks the page_list of a pool w/o protection against the list
modifications in alloc/free. Take pool->lock to avoid stomping into
nirvana.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ext2: return -EIO not -ESTALE on directory traversal through deleted inode

ext2_iget() returns -ESTALE if invoked on a deleted inode, in order to
report errors to NFS properly. However, in ext[234]_lookup(), this
-ESTALE can be propagated to userspace if the filesystem is corrupted such
that a directory entry references a deleted inode. This leads to a
misleading error message - "Stale NFS file handle" - and confusion on the
part of the admin.

The bug can be easily reproduced by creating a new filesystem, making a
link to an unused inode using debugfs, then mounting and attempting to ls
-l said link.

This patch thus changes ext2_lookup to return -EIO if it receives -ESTALE
from ext2_iget(), as ext2 does for other filesystem metadata corruption;
and also invokes the appropriate ext*_error functions when this case is
detected.

Signed-off-by: Bryan Donlan <bdonlan@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

elf: limit max map count to safe value

With ELF, at generating coredump, some more headers other than used
vmas are added.

When max_map_count == 65536, a core generated by following kinds of
code can be unreadable because the number of ELF's program header is
written in 16bit in Ehdr (please see elf.h) and the number overflows.

==
... = mmap(); (munmap, mprotect, etc...)
if (failed)
abort();
==

This can happen in mmap/munmap/mprotect/etc...which calls split_vma().

I think 65536 is not safe as _default_ and reduce it to 65530 is good
for avoiding unexpected corrupted core.

Anyway, max_map_count can be enlarged by sysctl if a user is brave..

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Jakub Jelinek <jakub@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

edac: add DDR3 memory type for MPC85xx EDAC

Since some new MPC85xx SOCs support DDR3 memory now, so add DDR3 memory
type for MPC85xx EDAC.

Signed-off-by: Yang Shi <yang.shi@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

parport/serial: add support for NetMos 9901 Multi-IO card

Add support for the PCI-Express NetMos 9901 Multi-IO card.

0001:06:00.0 Serial controller [0700]: NetMos Technology Device [9710:9901] (prog-if 02 [16550])
        Subsystem: Device [a000:1000]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 65
        Region 0: I/O ports at 0030 [size=8]
        Region 1: Memory at 80105000 (32-bit, non-prefetchable) [size=4K]
        Region 4: Memory at 80104000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: serial
        Kernel modules: 8250_pci

0001:06:00.1 Serial controller [0700]: NetMos Technology Device [9710:9901] (prog-if 02 [16550])
        Subsystem: Device [a000:1000]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin B routed to IRQ 65
        Region 0: I/O ports at 0020 [size=8]
        Region 1: Memory at 80103000 (32-bit, non-prefetchable) [size=4K]
        Region 4: Memory at 80102000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: serial
        Kernel modules: 8250_pci

0001:06:00.2 Parallel controller [0701]: NetMos Technology Device [9710:9901] (prog-if 03 [IEEE1284])
        Subsystem: Device [a000:2000]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin C routed to IRQ 65
        Region 0: I/O ports at 0010 [size=8]
        Region 1: I/O ports at <unassigned>
        Region 2: Memory at 80101000 (32-bit, non-prefetchable) [size=4K]
        Region 4: Memory at 80100000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: <access denied>
        Kernel driver in use: parport_pc
        Kernel modules: parport_pc

[   16.760181] PCI parallel port detected: 416c:0100, I/O at 0x812010(0x0), IRQ 65
[   16.760225] parport0: PC-style at 0x812010, irq 65 [PCSPP,TRISTATE,EPP]
[   16.851842] serial 0001:06:00.0: enabling device (0004 -> 0007)
[   16.883776] 0001:06:00.0: ttyS0 at I/O 0x812030 (irq = 65) is a ST16650V2
[   16.893832] serial 0001:06:00.1: enabling device (0004 -> 0007)
[   16.926537] 0001:06:00.1: ttyS1 at I/O 0x812020 (irq = 65) is a ST16650V2

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gcov: fix documentation

Commonly available versions of cp and tar don't work well with special
files created using seq_file. Mention this problem in the gcov
documentation and update the helper script example to work around these
problems.

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

alpha: fix percpu build breakage

alpha percpu access requires custom SHIFT_PERCPU_PTR() definition for
modules to work around addressing range limitation.  This is done via
generating inline assembly using C preprocessing which forces the
assembler to generate external reference.  This happens behind the
compiler's back and makes the compiler think that static percpu variables
in modules are unused.

This used to be worked around by using __unused attribute for percpu
variables which prevent the compiler from omitting the variable; however,
recent declare/definition attribute unification change broke this as
__used can't be used for declaration.  Also, in the process,
PER_CPU_ATTRIBUTES definition in alpha percpu.h got broken.

This patch adds PER_CPU_DEF_ATTRIBUTES which is only used for definitions
and make alpha use it to add __used for percpu variables in modules.  This
also fixes the PER_CPU_ATTRIBUTES double definition bug.

Signed-off-by: Tejun Heo <tj@kernel.org>
Tested-by: maximilian attems <max@stro.at>
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fbdev: work around old compiler bug

When building with a 4.1.x compiler on powerpc64 (at least) we get this
error:

drivers/video/logo/logo_linux_mono.c:81: error: logo_linux_mono causes a section type conflict

This was introduced by commit ae52bb2384f721562f15f719de1acb8e934733cb
("fbdev: move logo externs to header file").  This is a partial revert of
that commit sufficient to not hit the compiler bug.

Also convert _clut arrays from __initconst to __initdata.

Sam said:

  Al analysed this some time ago.  When we say something is const then
  _sometimes_ gcc annotate the section as const(?) - sometimes not.  So if
  we have two variables/functions annotated __*const and gcc decides to
  annotate the section const only in one case we get a section type
  conflict.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: update EDAC-I82975X

As per Ranganathan's request.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: Arvind R. <arvind@jetztechnologies.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gcov: fix __ctors_start alignment

The ctors section for each object file is eight byte aligned (on 64 bit).
However the __ctors_start symbol starts at an arbitrary address dependent
on the size of the previous sections.

Therefore the linker may add some zeroes after __ctors_start to make sure
the ctors contents are properly aligned.  However the extra zeroes at the
beginning aren't expected by the code.  When walking the functions
pointers contained in there and extra zeroes are added this may result in
random jumps.  So make sure that the __ctors_start symbol is always
aligned as well.

Fixes this crash on an allyesconfig on s390:

[    0.582482] Kernel BUG at 0000000000000012 [verbose debug info unavailable]
[    0.582489] illegal operation: 0001 [#1] SMP DEBUG_PAGEALLOC
[    0.582496] Modules linked in:
[    0.582501] CPU: 0 Tainted: G        W  2.6.31-rc1-dirty #273
[    0.582506] Process swapper (pid: 1, task: 000000003f218000, ksp: 000000003f2238e8)
[    0.582510] Krnl PSW : 0704200180000000 0000000000000012 (0x12)
[    0.582518]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
[    0.582524] Krnl GPRS: 0000000000036727 0000000000000010 0000000000000001 0000000000000001
[    0.582529]            00000000001dfefa 0000000000000000 0000000000000000 0000000000000040
[    0.582534]            0000000001fff0f0 0000000001790628 0000000002296048 0000000002296048
[    0.582540]            00000000020c438e 0000000001786000 0000000002014a66 000000003f223e60
[    0.582553] Krnl Code:>0000000000000012: 0000                unknown
[    0.582559]            0000000000000014: 0000                unknown
[    0.582564]            0000000000000016: 0000                unknown
[    0.582570]            0000000000000018: 0000                unknown
[    0.582575]            000000000000001a: 0000                unknown
[    0.582580]            000000000000001c: 0000                unknown
[    0.582585]            000000000000001e: 0000                unknown
[    0.582591]            0000000000000020: 0000                unknown
[    0.582596] Call Trace:
[    0.582599] ([<0000000002014a46>] kernel_init+0x622/0x7a0)
[    0.582607]  [<0000000000113e22>] kernel_thread_starter+0x6/0xc
[    0.582615]  [<0000000000113e1c>] kernel_thread_starter+0x0/0xc
[    0.582621] INFO: lockdep is turned off.
[    0.582624] Last Breaking-Event-Address:
[    0.582627]  [<0000000002014a64>] kernel_init+0x640/0x7a0

Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

eventfd: revised interface and cleanups

Change the eventfd interface to de-couple the eventfd memory context, from
the file pointer instance.

Without such change, there is no clean way to racely free handle the
POLLHUP event sent when the last instance of the file* goes away. Also,
now the internal eventfd APIs are using the eventfd context instead of the
file*.

This patch is required by KVM's IRQfd code, which is still under
development.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Gregory Haskins <ghaskins@novell.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

md: avoid dereferencing NULL pointer when accessing suspend_* sysfs attributes.

If we try to modify one of the md/ sysfs files
suspend_lo or suspend_hi
when the array is not active, we dereference a NULL.
Protect against that.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>

md: Use new topology calls to indicate alignment and I/O sizes

Switch MD over to the new disk_stack_limits() function which checks for
aligment and adjusts preferred I/O sizes when stacking.

Also indicate preferred I/O sizes where applicable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

perf report: Add --symbols parameter

So that we can filter by symbol name.

The 'pfunct' utility in the 'dwarves' package can be used to
create a file with the functions one wants.

Example:

[acme@doppio pahole]$ pfunct /usr/lib/debug/usr/lib64/libdw-0.141.so.debug | grep dwarf > /tmp/dwarf.symbols
[acme@doppio pahole]$ wc -l /tmp/dwarf.symbols
93 /tmp/dwarf.symbols
[acme@doppio pahole]$ head -3 /tmp/dwarf.symbols
dwfl_addrdwarf
dwfl_module_getdwarf
dwfl_getdwarf
[acme@doppio pahole]$ perf report --sort comm,dso,symbol --comms pahole --dsos /usr/lib64/libdw-0.141.so --symbols file:///tmp/dwarf.symbols

    33.99%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_tag
    29.07%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_decl_file
    27.71%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_getsrclines
     4.54%            pahole  /usr/lib64/libdw-0.141.so  0x00000000007400
     3.93%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_decl_line
     0.46%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_getlocation
     0.18%            pahole  /usr/lib64/libdw-0.141.so  [.] __libdwarf_next_prime
     0.13%            pahole  /usr/lib64/libdw-0.141.so  [.] dwarf_diecu

[acme@doppio pahole]$

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1246399282-20934-4-git-send-email-acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>