dcache: Translating dentry into pathname without taking rename_lock
The __dentry_path locking was changed and the variable error was
intended to be moved outside of the loop. Unfortunately the inner
declaration of error was not removed. Resulting in a version of
__dentry_path that will never return an error.
Remove the problematic inner declaration of error and allow
__dentry_path to return errors once again.
Cc: stable@vger.kernel.org Cc: Waiman Long <Waiman.Long@hp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
vfs: Is mounted should be testing mnt_ns for NULL or error.
A bug was introduced with the is_mounted helper function in
commit f7a99c5b7c8bd3d3f533c8b38274e33f3da9096e
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Sat Jun 9 00:59:08 2012 -0400
get rid of ->mnt_longterm
it's enough to set ->mnt_ns of internal vfsmounts to something
distinct from all struct mnt_namespace out there; then we can
just use the check for ->mnt_ns != NULL in the fast path of
mntput_no_expire()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The intent was to test if the real_mount(vfsmount)->mnt_ns was
NULL_OR_ERR but the code is actually testing real_mount(vfsmount)
and always returning true.
The result is d_absolute_path returning paths it should be hiding.
Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
So far I've had one ACK for this, and no other comments. So I think it
is probably time to send this via some suitable tree. I'm guessing that
the vfs tree would be the most appropriate route, but not sure that
there is one at the moment (don't see anything recent at kernel.org)
so in that case I think -mm is the "back up plan". Al, please let me
know if you will take this?
Steve.
---------------------
Following on from the "Re: [PATCH v3] vfs: fix a bug when we do some dio
reads with append dio writes" thread on linux-fsdevel, this patch is my
current version of the fix proposed as option (b) in that thread.
Removing the i_size test from the direct i/o read path at vfs level
means that filesystems now have to deal with requests which are beyond
i_size themselves. These I've divided into three sets:
a) Those with "no op" ->direct_IO (9p, cifs, ceph)
These are obviously not going to be an issue
b) Those with "home brew" ->direct_IO (nfs, fuse)
I've been told that NFS should not have any problem with the larger
i_size, however I've added an extra test to FUSE to duplicate the
original behaviour just to be on the safe side.
c) Those using __blockdev_direct_IO()
These call through to ->get_block() which should deal with the EOF
condition correctly. I've verified that with GFS2 and I believe that
Zheng has verified it for ext4. I've also run the test on XFS and it
passes both before and after this change.
The part of the patch in filemap.c looks a lot larger than it really is
- there are only two lines of real change. The rest is just indentation
of the contained code.
There remains a test of i_size though, which was added for btrfs. It
doesn't cause the other filesystems a problem as the test is performed
after ->direct_IO has been called. It is possible that there is a race
that does matter to btrfs, however this patch doesn't change that, so
its still an overall improvement.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When using the per-superblock xattr handlers permission checking is
done by the generic code. hfsplus just needs to check for the magic
osx attribute not to leak into protected namespaces.
Also given that the code was obviously copied from JFS the proper
attribution was missing.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Remove the boilerplate code to marshall and unmarhall ACL objects into
xattrs and operate on the posix_acl objects directly. Also move all
the ACL handling code into nfs?acl.c where it belongs.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
nfs: use generic posix ACL infrastructure for v3 Posix ACLs
This causes a small behaviour change in that we don't bother to set
ACLs on file creation if the mode bit can express the access permissions
fully, and thus behaving identical to local filesystems.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Also don't bother to set up a .get_acl method for symlinks as we do not
support access control (ACLs or even mode bits) for symlinks in Linux,
and create inodes with the proper mode instead of fixing it up later.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Oleg Nesterov [Mon, 13 Jan 2014 15:48:40 +0000 (16:48 +0100)]
fs: factor out common code in fget_light() and fget_raw_light()
Apart from FMODE_PATH check fget_light() and fget_raw_light() are
identical, shift the code into the new helper, __fget_light(fd, mask).
Saves 208 bytes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Oleg Nesterov [Sat, 11 Jan 2014 18:19:32 +0000 (19:19 +0100)]
introduce __fcheck_files() to fix rcu_dereference_check_fdtable(), kill rcu_my_thread_group_empty()
rcu_dereference_check_fdtable() looks very wrong,
1. rcu_my_thread_group_empty() was added by 844b9a8707f1 "vfs: fix
RCU-lockdep false positive due to /proc" but it doesn't really
fix the problem. A CLONE_THREAD (without CLONE_FILES) task can
hit the same race with get_files_struct().
And otoh rcu_my_thread_group_empty() can suppress the correct
warning if the caller is the CLONE_FILES (without CLONE_THREAD)
task.
2. files->count == 1 check is not really right too. Even if this
files_struct is not shared it is not safe to access it lockless
unless the caller is the owner.
Otoh, this check is sub-optimal. files->count == 0 always means
it is safe to use it lockless even if files != current->files,
but put_files_struct() has to take rcu_read_lock(). See the next
patch.
This patch removes the buggy checks and turns fcheck_files() into
__fcheck_files() which uses rcu_dereference_raw(), the "unshared"
callers, fget_light() and fget_raw_light(), can use it to avoid
the warning from RCU-lockdep.
fcheck_files() is trivially reimplemented as rcu_lockdep_assert()
plus __fcheck_files().
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 22 Nov 2013 06:53:47 +0000 (01:53 -0500)]
afs: get rid of junk in fs/afs/proc.c
kill pointless method instances and don't bother with ->owner - it's
ignored for procfs files anyway, make use of remove_proc_subtree() for
removal, get rid of cell->proc_dir.
Al Viro [Thu, 12 Dec 2013 04:07:51 +0000 (23:07 -0500)]
btrfs: sanitize BTRFS_IOC_FILE_EXTENT_SAME
* don't assume that ->dest_count won't change between copy_from_user()
and memdup_user()
* use fdget instead of fget
* don't bother comparing superblocks when we'd already compared vfsmounts
* get rid of excessive goto
* use file_inode() instead of open-coding the sucker
Al Viro [Wed, 11 Dec 2013 00:48:58 +0000 (19:48 -0500)]
qnx4: clean qnx4_fill_super() up
* pass on-disk superblock to qnx4_chkroot() explicitly
* don't leave stale (and unused) pointers in qnx4_super_block
* free stuff in ->kill_sb(); ->put_super() becomes empty and dies
* simplify failure exits
Ilia Mirkin [Sun, 19 Jan 2014 15:30:32 +0000 (10:30 -0500)]
drm/nouveau/mxm: fix null deref on load
Since commit 61b365a505d6 ("drm/nouveau: populate master subdev pointer
only when fully constructed"), the nouveau_mxm(bios) call will return
NULL, since it's still being called from the constructor. Instead, pass
the mxm pointer via the unused data field.
See https://bugs.freedesktop.org/show_bug.cgi?id=73791
Reported-by: Andreas Reis <andreas.reis@gmail.com> Tested-by: Andreas Reis <andreas.reis@gmail.com> Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 20 Jan 2014 01:18:13 +0000 (17:18 -0800)]
Merge tag 'acpi-3.13-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull last-minute ACPI fix from Rafael Wysocki:
"This reverts a commit that causes the Alan Cox' ASUS T100TA to "crash
and burn" during boot if the Baytrail pinctrl driver is compiled in"
* tag 'acpi-3.13-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs"
Linus Torvalds [Sun, 19 Jan 2014 21:06:51 +0000 (13:06 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
- an s2ram related fix on AMD systems
- a perf fault handling bug that is relatively old but which has become
much easier to trigger in v3.13 after commit e00b12e64be9 ("perf/x86:
Further optimize copy_from_user_nmi()")
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h
x86, mm, perf: Allow recursive faults from interrupts
Revert "ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs"
This reverts commit f6308b36c411 (ACPI: Add BayTrail SoC GPIO and LPSS
ACPI IDs), because it causes the Alan Cox' ASUS T100TA to "crash and
burn" during boot if the Baytrail pinctrl driver is compiled in.
Fixes: f6308b36c411 (ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs) Reported-by: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Requested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
1) The value choosen for the new SO_MAX_PACING_RATE socket option on
parisc was very poorly choosen, let's fix it while we still can.
From Eric Dumazet.
2) Our generic reciprocal divide was found to handle some edge cases
incorrectly, part of this is encoded into the BPF as deep as the JIT
engines themselves. Just use a real divide throughout for now.
From Eric Dumazet.
3) Because the initial lookup is lockless, the TCP metrics engine can
end up creating two entries for the same lookup key. Fix this by
doing a second lookup under the lock before we actually create the
new entry. From Christoph Paasch.
4) Fix scatter-gather list init in usbnet driver, from Bjørn Mork.
5) Fix unintended 32-bit truncation in cxgb4 driver's bit shifting.
From Dan Carpenter.
6) Netlink socket dumping uses the wrong socket state for timewait
sockets. Fix from Neal Cardwell.
7) Fix netlink memory leak in ieee802154_add_iface(), from Christian
Engelmayer.
8) Multicast forwarding in ipv4 can overflow the per-rule reference
counts, causing all multicast traffic to cease. Fix from Hannes
Frederic Sowa.
9) via-rhine needs to stop all TX queues when it resets the device,
from Richard Weinberger.
10) Fix RDS per-cpu accesses broken by the this_cpu_* conversions. From
Gerald Schaefer.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
s390/bpf,jit: fix 32 bit divisions, use unsigned divide instructions
parisc: fix SO_MAX_PACING_RATE typo
ipv6: simplify detection of first operational link-local address on interface
tcp: metrics: Avoid duplicate entries with the same destination-IP
net: rds: fix per-cpu helper usage
e1000e: Fix compilation warning when !CONFIG_PM_SLEEP
bpf: do not use reciprocal divide
be2net: add dma_mapping_error() check for dma_map_page()
bnx2x: Don't release PCI bars on shutdown
net,via-rhine: Fix tx_timeout handling
batman-adv: fix batman-adv header overhead calculation
qlge: Fix vlan netdev features.
net: avoid reference counter overflows on fib_rules in multicast forwarding
dm9601: add USB IDs for new dm96xx variants
MAINTAINERS: add virtio-dev ML for virtio
ieee802154: Fix memory leak in ieee802154_add_iface()
net: usbnet: fix SG initialisation
inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets
cxgb4: silence shift wrapping static checker warning
Heiko Carstens [Fri, 17 Jan 2014 08:37:15 +0000 (09:37 +0100)]
s390/bpf,jit: fix 32 bit divisions, use unsigned divide instructions
The s390 bpf jit compiler emits the signed divide instructions "dr" and "d"
for unsigned divisions.
This can cause problems: the dividend will be zero extended to a 64 bit value
and the divisor is the 32 bit signed value as specified A or X accumulator,
even though A and X are supposed to be treated as unsigned values.
The divide instrunctions will generate an exception if the result cannot be
expressed with a 32 bit signed value.
This is the case if e.g. the dividend is 0xffffffff and the divisor either 1
or also 0xffffffff (signed: -1).
To avoid all these issues simply use unsigned divide instructions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 16 Jan 2014 19:15:12 +0000 (11:15 -0800)]
parisc: fix SO_MAX_PACING_RATE typo
SO_MAX_PACING_RATE definition on parisc got a typo.
Its not too late to fix it, before 3.13 is official.
Fixes: 62748f32d501 ("net: introduce SO_MAX_PACING_RATE") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: simplify detection of first operational link-local address on interface
In commit 1ec047eb4751e3 ("ipv6: introduce per-interface counter for
dad-completed ipv6 addresses") I build the detection of the first
operational link-local address much to complex. Additionally this code
now has a race condition.
Replace it with a much simpler variant, which just scans the address
list when duplicate address detection completes, to check if this is
the first valid link local address and send RS and MLD reports then.
Fixes: 1ec047eb4751e3 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses") Reported-by: Jiri Pirko <jiri@resnulli.us> Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Paasch [Thu, 16 Jan 2014 19:01:21 +0000 (20:01 +0100)]
tcp: metrics: Avoid duplicate entries with the same destination-IP
Because the tcp-metrics is an RCU-list, it may be that two
soft-interrupts are inside __tcp_get_metrics() for the same
destination-IP at the same time. If this destination-IP is not yet part of
the tcp-metrics, both soft-interrupts will end up in tcpm_new and create
a new entry for this IP.
So, we will have two tcp-metrics with the same destination-IP in the list.
This patch checks twice __tcp_get_metrics(). First without holding the
lock, then while holding the lock. The second one is there to confirm
that the entry has not been added by another soft-irq while waiting for
the spin-lock.
Fixes: 51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerald Schaefer [Thu, 16 Jan 2014 15:54:48 +0000 (16:54 +0100)]
net: rds: fix per-cpu helper usage
commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu
handling for rds. chpfirst is the result of __this_cpu_read(), so it is
an absolute pointer and not __percpu. Therefore, __this_cpu_write()
should not operate on chpfirst, but rather on cache->percpu->first, just
like __this_cpu_read() did before.
Linus Torvalds [Sat, 18 Jan 2014 01:29:36 +0000 (17:29 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fixes from Eric Biederman:
"This is a set of 3 regression fixes.
This fixes /proc/mounts when using "ip netns add <netns>" to display
the actual mount point.
This fixes a regression in clone that broke lxc-attach.
This fixes a regression in the permission checks for mounting /proc
that made proc unmountable if binfmt_misc was in use. Oops.
My apologies for sending this pull request so late. Al Viro gave
interesting review comments about the d_path fix that I wanted to
address in detail before I sent this pull request. Unfortunately a
bad round of colds kept from addressing that in detail until today.
The executive summary of the review was:
Al: Is patching d_path really sufficient?
The prepend_path, d_path, d_absolute_path, and __d_path family of
functions is a really mess.
Me: Yes, patching d_path is really sufficient. Yes, the code is mess.
No it is not appropriate to rewrite all of d_path for a regression
that has existed for entirely too long already, when a two line
change will do"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
vfs: Fix a regression in mounting proc
fork: Allow CLONE_PARENT after setns(CLONE_NEWPID)
vfs: In d_path don't call d_dname on a mount point
David S. Miller [Fri, 17 Jan 2014 01:16:43 +0000 (17:16 -0800)]
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included change:
- properly compute the batman-adv header overhead. Such
result is later used to initialize the hard_header_len
member of the soft-interface netdev object
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 17 Jan 2014 00:33:27 +0000 (11:33 +1100)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Revert "arm64: Fix memory shareability attribute for ioremap_wc/cache"
We noticed that it breaks ioremap (and earlyprintk) with 64K page
configuration"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Revert "arm64: Fix memory shareability attribute for ioremap_wc/cache"
Hugh Dickins [Thu, 16 Jan 2014 23:26:48 +0000 (15:26 -0800)]
percpu_counter: unbreak __percpu_counter_add()
Commit 74e72f894d56 ("lib/percpu_counter.c: fix __percpu_counter_add()")
looked very plausible, but its arithmetic was badly wrong: obvious once
you see the fix, but maddening to get there from the weird tmpfs ENOSPCs
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Ming Lei <tom.leiming@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Fan Du <fan.du@windriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mika Westerberg [Thu, 16 Jan 2014 12:39:39 +0000 (14:39 +0200)]
e1000e: Fix compilation warning when !CONFIG_PM_SLEEP
Commit 7509963c703b (e1000e: Fix a compile flag mis-match for
suspend/resume) moved suspend and resume hooks to be available when
CONFIG_PM is set. However, it can be set even if CONFIG_PM_SLEEP is not set
causing following warnings to be emitted:
drivers/net/ethernet/intel/e1000e/netdev.c:6178:12: warning:
‘e1000_suspend’ defined but not used [-Wunused-function]
drivers/net/ethernet/intel/e1000e/netdev.c:6185:12: warning:
‘e1000_resume’ defined but not used [-Wunused-function]
To fix this make the hooks to be available only when CONFIG_PM_SLEEP is set
and remove CONFIG_PM wrapping from driver ops because this is already
handled by SET_SYSTEM_SLEEP_PM_OPS() and SET_RUNTIME_PM_OPS().
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Dave Ertman <davidx.m.ertman@intel.com> Cc: Aaron Brown <aaron.f.brown@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The above commit breaks the mapping type for Device memory because
pgprot_default already contains a Normal memory type. pgprot_default is
also not initialised early enough for earlyprintk resulting in an
inconsistent memory mapping with 64K PAGE_SIZE configuration.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Will Deacon <will.deacon@arm.com> Acked-by: Will Deacon <will.deacon@arm.com>
Robert Richter [Wed, 15 Jan 2014 14:57:29 +0000 (15:57 +0100)]
perf/x86/amd/ibs: Fix waking up from S3 for AMD family 10h
On AMD family 10h we see following error messages while waking up from
S3 for all non-boot CPUs leading to a failed IBS initialization:
Enabling non-boot CPUs ...
smpboot: Booting Node 0 Processor 1 APIC 0x1
[Firmware Bug]: cpu 1, try to use APIC500 (LVT offset 0) for vector 0x400, but the register is already in use for vector 0xf9 on another cpu
perf: IBS APIC setup failed on cpu #1
process: Switch to broadcast mode on CPU1
CPU1 is up
...
ACPI: Waking up from system sleep state S3
Reason for this is that during suspend the LVT offset for the IBS
vector gets lost and needs to be reinialized while resuming.
The offset is read from the IBSCTL msr. On family 10h the offset needs
to be 1 as offset 0 is used for the MCE threshold interrupt, but
firmware assings it for IBS to 0 too. The kernel needs to reprogram
the vector. The msr is a readonly node msr, but a new value can be
written via pci config space access. The reinitialization is
implemented for family 10h in setup_ibs_ctl() which is forced during
IBS setup.
This patch fixes IBS setup after waking up from S3 by adding
resume/supend hooks for the boot cpu which does the offset
reinitialization.
Marking it as stable to let distros pick up this fix.
Peter Zijlstra [Fri, 10 Jan 2014 20:06:03 +0000 (21:06 +0100)]
x86, mm, perf: Allow recursive faults from interrupts
Waiman managed to trigger a PMI while in a emulate_vsyscall() fault,
the PMI in turn managed to trigger a fault while obtaining a stack
trace. This triggered the sig_on_uaccess_error recursive fault logic
and killed the process dead.
Fix this by explicitly excluding interrupts from the recursive fault
logic.
Reported-and-Tested-by: Waiman Long <waiman.long@hp.com> Fixes: e00b12e64be9 ("perf/x86: Further optimize copy_from_user_nmi()") Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140110200603.GJ7572@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Thu, 16 Jan 2014 01:33:21 +0000 (08:33 +0700)]
Merge branches 'sched-urgent-for-linus' and 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler and timer fixes from Ingo Molnar:
"Contains a fix for a scheduler bug that manifested itself as a 3D
performance regression and a crash fix for the ARM Cadence TTC clock
driver"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Calculate effective load even if local weight is 0
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource: cadence_ttc: Fix mutex taken inside interrupt context
Linus Torvalds [Thu, 16 Jan 2014 01:31:55 +0000 (08:31 +0700)]
Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Two fixes from lockdep coverage of seqlocks, which fix deadlocks on
lockdep-enabled ARM systems"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched_clock: Disable seqlock lockdep usage in sched_clock()
seqlock: Use raw_ prefix instead of _no_lockdep
Linus Torvalds [Thu, 16 Jan 2014 01:26:44 +0000 (08:26 +0700)]
Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Fix attribute length problem in coretemp driver"
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (coretemp) Fix truncated name of alarm attributes
Eric Dumazet [Wed, 15 Jan 2014 14:50:07 +0000 (06:50 -0800)]
bpf: do not use reciprocal divide
At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
were not correct. (off by one in some cases)
http://www.wireshark.org/~darkjames/reciprocal-buggy.c
He could also show this with BPF:
http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
The reciprocal divide in linux kernel is not generic enough,
lets remove its use in BPF, as it is not worth the pain with
current cpus.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Daniel Borkmann <dxchgb@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Matt Evans <matt@ozlabs.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Wed, 15 Jan 2014 10:05:30 +0000 (12:05 +0200)]
bnx2x: Don't release PCI bars on shutdown
The bnx2x driver in its pci shutdown() callback releases its pci bars (in the
same manner it does during its pci remove() callback).
During a system reboot while VFs are enabled, its possible for the VF's remove
to be called (as a result of pci_disable_sriov()) after its shutdown callback
has already finished running; This will cause a paging request fault as the VF
tries to access the pci bar which it has previously released, crashing the
system.
This patch further differentiates the shutdown and remove callbacks, preventing the
pci release procedures from being called during shutdown.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
rhine_reset_task() misses to disable the tx scheduler upon reset,
this can lead to a crash if work is still scheduled while we're resetting
the tx queue.
Fixes:
[ 93.591707] BUG: unable to handle kernel NULL pointer dereference at 0000004c
[ 93.595514] IP: [<c119d10d>] rhine_napipoll+0x491/0x6
Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: David S. Miller <davem@davemloft.net>
Batman-adv prepends a full ethernet header in addition to its own
header. This has to be reflected in the MTU calculation, especially
since the value is used to set dev->hard_header_len.
Reported-by: cmsv <cmsv@wirelesspt.net> Reported-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Andrew Jones [Wed, 15 Jan 2014 12:39:59 +0000 (13:39 +0100)]
kvm: x86: fix apic_base enable check
Commit e66d2ae7c67bd moved the assignment
vcpu->arch.apic_base = value above a condition with
(vcpu->arch.apic_base ^ value), causing that check
to always fail. Use old_value, vcpu->arch.apic_base's
old value, in the condition instead.
Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Wed, 15 Jan 2014 08:42:11 +0000 (15:42 +0700)]
Merge branch 'akpm' (incoming from Andrew)
Merge patches from Andrew Morton:
"Six fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
lib/percpu_counter.c: fix __percpu_counter_add()
crash_dump: fix compilation error (on MIPS at least)
mm: fix crash when using XFS on loopback
MIPS: fix blast_icache32 on loongson2
MIPS: fix case mismatch in local_r4k_flush_icache_range()
nilfs2: fix segctor bug that causes file system corruption
Linus Torvalds [Wed, 15 Jan 2014 08:07:36 +0000 (15:07 +0700)]
Merge tag 'md/3.13-fixes' of git://neil.brown.name/md
Pull late md fixes from Neil Brown:
"Half a dozen md bug fixes.
All of these fix real bugs the people have hit, and are tagged for
-stable. Sorry they are late .... Christmas holidays and all that.
Hopefully they can still squeak into 3.13"
* tag 'md/3.13-fixes' of git://neil.brown.name/md:
md: fix problem when adding device to read-only array with bitmap.
md/raid10: fix bug when raid10 recovery fails to recover a block.
md/raid5: fix a recently broken BUG_ON().
md/raid1: fix request counting bug in new 'barrier' code.
md/raid10: fix two bugs in handling of known-bad-blocks.
md/raid5: Fix possible confusion when multiple write errors occur.
Linus Torvalds [Wed, 15 Jan 2014 08:06:14 +0000 (15:06 +0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"One nouveau regression fix on older cards, i915 black screen fixes,
and a revert for a strange G33 intel problem"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau: fix null ptr dereferences on some boards
Revert "drm: copy mode type in drm_mode_connector_list_update()"
drm/i915/bdw: make sure south port interrupts are enabled properly v2
drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
drm/i915: fix DDI PLLs HW state readout code
Ming Lei [Wed, 15 Jan 2014 01:56:42 +0000 (17:56 -0800)]
lib/percpu_counter.c: fix __percpu_counter_add()
__percpu_counter_add() may be called in softirq/hardirq handler (such
as, blk_mq_queue_exit() is typically called in hardirq/softirq handler),
so we need to call this_cpu_add()(irq safe helper) to update percpu
counter, otherwise counts may be lost.
This fixes the problem that 'rmmod null_blk' hangs in blk_cleanup_queue()
because of miscounting of request_queue->mq_usage_counter.
This patch is the v1 of previous one of "lib/percpu_counter.c:
disable local irq when updating percpu couter", and takes Andrew's
approach which may be more efficient for ARCHs(x86, s390) that
have optimized this_cpu_add().
Signed-off-by: Ming Lei <tom.leiming@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Fan Du <fan.du@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mikulas Patocka [Wed, 15 Jan 2014 01:56:40 +0000 (17:56 -0800)]
mm: fix crash when using XFS on loopback
Commit 8456a648cf44 ("slab: use struct page for slab management") causes
a crash in the LVM2 testsuite on PA-RISC (the crashing test is
fsadm.sh). The testsuite doesn't crash on 3.12, crashes on 3.13-rc1 and
later.
Kernel panic - not syncing: Bad Address (null pointer deref?)
Commit 8456a648cf44 changes the page structure so that the slab
subsystem reuses the page->mapping field.
The crash happens in the following way:
* XFS allocates some memory from slab and issues a bio to read data
into it.
* the bio is sent to the loopback device.
* lo_receive creates an actor and calls splice_direct_to_actor.
* lo_splice_actor copies data to the target page.
* lo_splice_actor calls flush_dcache_page because the page may be
mapped by userspace. In that case we need to flush the kernel cache.
* flush_dcache_page asks for the list of userspace mappings, however
that page->mapping field is reused by the slab subsystem for a
different purpose. This causes the crash.
Note that other architectures without coherent caches (sparc, arm, mips)
also call page_mapping from flush_dcache_page, so they may crash in the
same way.
This patch fixes this bug by testing if the page is a slab page in
page_mapping and returning NULL if it is.
The patch also fixes VM_BUG_ON(PageSlab(page)) that could happen in
earlier kernels in the same scenario on architectures without cache
coherence when CONFIG_DEBUG_VM is enabled - so it should be backported
to stable kernels.
In the old kernels, the function page_mapping is placed in
include/linux/mm.h, so you should modify the patch accordingly when
backporting it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: John David Anglin <dave.anglin@bell.net>] Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Acked-by: Pekka Enberg <penberg@kernel.org> Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aaro Koskinen [Wed, 15 Jan 2014 01:56:38 +0000 (17:56 -0800)]
MIPS: fix blast_icache32 on loongson2
Commit 14bd8c082016 ("MIPS: Loongson: Get rid of Loongson 2 #ifdefery
all over arch/mips") failed to add Loongson2 specific blast_icache32
functions. Fix that.
The patch fixes the following crash seen with 3.13-rc1:
Huacai Chen [Wed, 15 Jan 2014 01:56:37 +0000 (17:56 -0800)]
MIPS: fix case mismatch in local_r4k_flush_icache_range()
Currently, Loongson-2 call protected_blast_icache_range() and others
call protected_loongson23_blast_icache_range(), but I think the correct
behavior should be the opposite. BTW, Loongson-3's cache-ops is
compatible with MIPS64, but not compatible with Loongson-2. So, rename
xxx_loongson23_yyy things to xxx_loongson2_yyy.
The patch fixes early boot hang with 3.13-rc1, introduced in commit 14bd8c082016 ("MIPS: Loongson: Get rid of Loongson 2 #ifdefery all over
arch/mips").
Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Acked-by: John Crispin <blogic@openwrt.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andreas Rohner [Wed, 15 Jan 2014 01:56:36 +0000 (17:56 -0800)]
nilfs2: fix segctor bug that causes file system corruption
There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean. It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.
The problem shows itself with the following kernel log message:
nilfs_sufile_do_cancel_free: segment 6533 must be clean
Usually a few hours later the file system gets corrupted:
The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.
This is what happens:
1. The cleaner starts the segment construction
2. nilfs_segctor_collect is called
3. sc_stage is on NILFS_ST_SUFILE and segments are freed
4. sc_stage is on NILFS_ST_DAT current segment is full
5. nilfs_segctor_extend_segments is called, which
allocates a new segment
6. The new segment is one of the segments freed in step 3
7. nilfs_sufile_cancel_freev is called and produces an error message
8. Loop around and the collection starts again
9. sc_stage is on NILFS_ST_SUFILE and segments are freed
including the newly allocated segment, which will contain active
data and can be allocated at a later time
10. A few hours later another segment construction allocates the
segment and causes file system corruption
This can be prevented by simply reordering the statements. If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Wed, 15 Jan 2014 06:39:30 +0000 (07:39 +0100)]
Merge branch 'clockevents/3.13-fixes' of git://git.linaro.org/people/daniel.lezcano/linux into timers/urgent
Pull clock driver fix from Daniel Lezcano:
" * Soren Brinkmann fixed the cadence_ttc driver where a call to
clk_get_rate happens in an interrupt context. More precisely in an IPI
when the broadcast timer is initialized for each cpu in the cpuidle
driver. "
net: avoid reference counter overflows on fib_rules in multicast forwarding
Bob Falken reported that after 4G packets, multicast forwarding stopped
working. This was because of a rule reference counter overflow which
freed the rule as soon as the overflow happend.
This patch solves this by adding the FIB_LOOKUP_NOREF flag to
fib_rules_lookup calls. This is safe even from non-rcu locked sections
as in this case the flag only implies not taking a reference to the rule,
which we don't need at all.
Rules only hold references to the namespace, which are guaranteed to be
available during the call of the non-rcu protected function reg_vif_xmit
because of the interface reference which itself holds a reference to
the net namespace.
Fixes: f0ad0860d01e47 ("ipv4: ipmr: support multiple tables") Fixes: d1db275dd3f6e4 ("ipv6: ip6mr: support multiple tables") Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Thomas Graf <tgraf@suug.ch> Cc: Julian Anastasov <ja@ssi.bg> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Korsgaard [Sun, 12 Jan 2014 22:15:51 +0000 (23:15 +0100)]
dm9601: add USB IDs for new dm96xx variants
A number of new dm96xx variants now exist.
Reported-by: Joseph Chang <joseph_chang@davicom.com.tw> Signed-off-by: Peter Korsgaard <peter@korsgaard.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since virtio is an OASIS standard draft now, virtio implementation
discussions are taking place on the virtio-dev OASIS mailing list.
Update MAINTAINERS.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Delvare [Tue, 14 Jan 2014 14:59:55 +0000 (15:59 +0100)]
hwmon: (coretemp) Fix truncated name of alarm attributes
When the core number exceeds 9, the size of the buffer storing the
alarm attribute name is insufficient and the attribute name is
truncated. This causes libsensors to skip these attributes as the
truncated name is not recognized.
Reported-by: Andreas Hollmann <hollmann@in.tum.de> Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: stable@vger.kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Stephen Warren [Mon, 13 Jan 2014 21:29:04 +0000 (14:29 -0700)]
i2c: Re-instate body of i2c_parent_is_i2c_adapter()
The body of i2c_parent_is_i2c_adapter() is currently guarded by
I2C_MUX. It should be CONFIG_I2C_MUX instead.
Among potentially other problems, this resulted in i2c_lock_adapter()
only locking I2C mux child adapters, and not the parent adapter. In
turn, this could allow inter-mingling of mux child selection and I2C
transactions, which could result in I2C transactions being directed to
the wrong I2C bus, and possibly even switching between busses in the
middle of a transaction.
One concrete issue caused by this bug was corrupted HDMI EDID reads
during boot on the NVIDIA Tegra Seaboard system, although this only
became apparent in recent linux-next, when the boot timing was changed
just enough to trigger the race condition.
Fixes: 3923172b3d70 ("i2c: reduce parent checking to a NOOP in non-I2C_MUX case") Cc: Phil Carmody <phil.carmody@partner.samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Bjørn Mork [Fri, 10 Jan 2014 22:10:17 +0000 (23:10 +0100)]
net: usbnet: fix SG initialisation
Commit 60e453a940ac ("USBNET: fix handling padding packet")
added an extra SG entry in case padding is necessary, but
failed to update the initialisation of the list. This can
cause list traversal to fall off the end of the list,
resulting in an oops.
Fixes: 60e453a940ac ("USBNET: fix handling padding packet") Reported-by: Thomas Kear <thomas@kear.co.nz> Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Tested-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Fri, 10 Jan 2014 20:34:45 +0000 (15:34 -0500)]
inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets
Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT
and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock
(not just TIME_WAIT), and for such sockets the tw_substate field holds
the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2.
This brings the inet_diag state-matching code in line with the field
it uses to populate idiag_state. This is also analogous to the info
exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state
and get_timewait4_sock() exports tw->tw_substate.
Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state
fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi
state time-wait" would also return sockets in state TCP_FIN_WAIT2.
This is an old bug that predates 05dbc7b ("tcp/dccp: remove twchain").
Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
NeilBrown [Wed, 11 Dec 2013 23:13:33 +0000 (10:13 +1100)]
md: fix problem when adding device to read-only array with bitmap.
If an array is started degraded, and then the missing device
is found it can be re-added and a minimal bitmap-based recovery
will bring it fully up-to-date.
If the array is read-only a recovery would not be allowed.
But also if the array is read-only and the missing device was
present very recently, then there could be no need for any
recovery at all, so we simply include the device in the read-only
array without any recovery.
However... if the missing device was removed a little longer ago
it could be missing some updates, but if a bitmap is present it will
be conditionally accepted pending a bitmap-based update. We don't
currently detect this case properly and will include that old
device into the read-only array with no recovery even though it really
needs a recovery.
This patch keeps track of whether a bitmap-based-recovery is really
needed or not in the new Bitmap_sync rdev flag. If that is set,
then the device will not be added to a read-only array.
added code to the "cannot recover this block" path to record a bad
block rather than fail the whole recovery.
Unfortunately this new case was placed *after* r10bio was freed rather
than *before*, yet it still uses r10bio.
This is will crash with a null dereference.
So move the freeing of r10bio down where it is safe.
simplified a BUG_ON, but removed too much so now it sometimes fires
when it shouldn't.
When the STRIPE_EXPANDING flag is set, the stripe_head might be on a
special list while multiple stripe_heads are collected, or it might
not be on any list, even a 'free' list when the refcount is zero. As
long as STRIPE_EXPANDING is set, it will be found and added back to a
list eventually.
So both of the BUG_ONs which test for the ->lru being empty or not
need to avoid the case where STRIPE_EXPANDING is set.
The patch which broke this was marked for -stable, so this patch needs
to be applied to any branch that received 6d183de4
Fixes: 6d183de4077191d1201283a9035ce57a9b05254d Cc: stable@vger.kernel.org (any release to which above was applied) Signed-off-by: NeilBrown <neilb@suse.de>
NeilBrown [Tue, 14 Jan 2014 00:56:14 +0000 (11:56 +1100)]
md/raid1: fix request counting bug in new 'barrier' code.
The new iobarrier implementation in raid1 (which keeps normal writes
and resync activity separate) counts every request what is not before
the current resync point in either next_window_requests or
current_window_requests.
It flags that the request is counted by setting ->start_next_window.
allow_barrier follows this model exactly and decrements one of the
*_window_requests if and only if ->start_next_window is set.
However wait_barrier(), which increments *_window_requests uses a
slightly different test for setting -.start_next_window (which is set
from the return value of this function).
So there is a possibility of the counts getting out of sync, and this
leads to the resync hanging.
So change wait_barrier() to return a non-zero value in exactly the
same cases that it increments *_window_requests.
But was introduced in 3.13-rc1.
Reported-by: Bruno Wolff III <bruno@wolff.to>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68061 Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761 Cc: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>