David S. Miller [Fri, 23 Dec 2011 22:33:03 +0000 (17:33 -0500)]
netlink: Undo const marker in netlink_is_kernel().
We can't do this without propagating the const to nlk_sk()
too, otherwise:
net/netlink/af_netlink.c: In function ‘netlink_is_kernel’:
net/netlink/af_netlink.c:103:2: warning: passing argument 1 of ‘nlk_sk’ discards ‘const’ qualifier from pointer target type [enabled by default]
net/netlink/af_netlink.c:96:36: note: expected ‘struct sock *’ but argument is of type ‘const struct sock *’
Signed-off-by: David S. Miller <davem@davemloft.net>
The new netem loss model is configured with nested netlink messages.
This code is being overly strict about sizes, and is easily confused
by padding (or possible future expansion). Also message
for gemodel is incorrect.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 23 Dec 2011 05:19:20 +0000 (05:19 +0000)]
sch_hfsc: report backlog information
Add backlog (byte count) information in hfsc classes and qdisc, so that
"tc -s" can report it to user, instead of 0 values :
qdisc hfsc 1: root refcnt 6 default 20
Sent 45141660 bytes 30545 pkt (dropped 0, overlimits 91751 requeues 0)
rate 1492Kbit 126pps backlog 103226b 74p requeues 0
...
class hfsc 1:20 parent 1:1 leaf 1201: rt m1 0bit d 0us m2 400000bit ls m1 0bit d 0us m2 200000bit
Sent 49534912 bytes 33519 pkt (dropped 0, overlimits 0 requeues 0)
backlog 81822b 56p requeues 0
period 23 work 49451576 bytes rtwork 13277552 bytes level 0
...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: John A. Sullivan III <jsullivan@opensourcedevel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
allan [Thu, 22 Dec 2011 20:38:51 +0000 (20:38 +0000)]
drivers/net/usb/asix: fixed asix_get_wol reported wrong wol status issue
Fixed the asix_get_wol() routine reported wrong wol status issue.
Signed-off-by: Allan Chou <allan@asix.com.tw> Tested-by: Eugene <elubarsky@gmail.com>; Allan Chou <allan@asix.com.tw> Signed-off-by: David S. Miller <davem@davemloft.net>
Krishna Gudipati [Thu, 22 Dec 2011 13:30:19 +0000 (13:30 +0000)]
bna: Add debugfs interface.
Change details:
- Add debugfs support to obtain firmware trace, saved firmware trace on
an IOC crash, driver info and read/write to registers.
- debugfs hierarchy:
bna/pci_dev:<pci_name>
where the pci_name corresponds to the one under /sys/bus/pci/drivers/bna
- Following are the new debugfs entries added:
fwtrc: collect current firmware trace.
fwsave: collect last saved fw trace as a result of firmware crash.
regwr: write one word to chip register
regrd: read one or more words from chip register.
drvinfo: collect the driver information.
Signed-off-by: Krishna Gudipati <kgudipat@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Krishna Gudipati [Thu, 22 Dec 2011 13:29:45 +0000 (13:29 +0000)]
bna: Added flash sub-module and ethtool eeprom entry points.
Change details:
- The patch adds flash sub-module to the bna driver.
- Added ethtool set_eeprom() and get_eeprom() entry points to
support flash partition read/write operations.
Signed-off-by: Krishna Gudipati <kgudipat@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the following warning raised
when compile:
WARNING: modpost: missing MODULE_LICENSE()
in drivers/net/ethernet/stmicro/stmmac/stmmac.o
Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
"! --connbytes 23:42" should match if the packet/byte count is not in range.
As there is no explict "invert match" toggle in the match structure,
userspace swaps the from and to arguments
(i.e., as if "--connbytes 42:23" were given).
However, "what <= 23 && what >= 42" will always be false.
Change things so we use "||" in case "from" is larger than "to".
This change may look like it breaks backwards compatibility when "to" is 0.
However, older iptables binaries will refuse "connbytes 42:0",
and current releases treat it to mean "! --connbytes 0:42",
so we should be fine.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After a follow up discussion with Michal, it was agreed it would
be better to leave the kmem controller with just the tcp files,
deferring the behavior of the other general memory.kmem.* files
for a later time, when more caches are controlled. This is because
generic kmem files are not used by tcp accounting and it is
not clear how other slab caches would fit into the scheme.
We are reverting the original commit so we can track the reference.
Part of the patch is kept, because it was used by the later tcp
code. Conflicts are shown in the bottom. init/Kconfig is removed from
the revert entirely.
Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: Paul Menage <paul@paulmenage.org> CC: Greg Thelen <gthelen@google.com> CC: Johannes Weiner <jweiner@redhat.com> CC: David S. Miller <davem@davemloft.net>
Conflicts:
Documentation/cgroups/memory.txt
mm/memcontrol.c Signed-off-by: David S. Miller <davem@davemloft.net>
This is caused by bridge netfilter special dst_entry (fake_rtable), a
special shared entry, where attaching an inetpeer makes no sense.
Problem is present since commit 87c48fa3b46 (ipv6: make fragment
identifications less predictable)
Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and
__ip_select_ident() fallback to the 'no peer attached' handling.
Reported-by: Chris Boot <bootc@bootc.net> Tested-by: Chris Boot <bootc@bootc.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Thu, 22 Dec 2011 02:05:07 +0000 (02:05 +0000)]
mqprio: Avoid panic if no options are provided
Userspace may not provide TCA_OPTIONS, in fact tc currently does
so not do so if no arguments are specified on the command line.
Return EINVAL instead of panicing.
Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 21 Dec 2011 20:00:32 +0000 (20:00 +0000)]
bridge: provide a mtu() method for fake_dst_ops
Commit 618f9bc74a039da76 (net: Move mtu handling down to the protocol
depended handlers) forgot the bridge netfilter case, adding a NULL
dereference in ip_fragment().
Reported-by: Chris Boot <bootc@bootc.net> CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 22 Dec 2011 20:59:47 +0000 (12:59 -0800)]
Merge branch 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
* 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: Fix usb/isp1760 build on sparc
usb: gadget: epautoconf: do not change number of streams
usb: dwc3: core: fix cached revision on our structure
usb: musb: fix reset issue with full speed device
Stephen Rothwell [Thu, 22 Dec 2011 06:03:29 +0000 (17:03 +1100)]
ipv4: using prefetch requires including prefetch.h
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: Add a flow_cache_flush_deferred function
ipv4: reintroduce route cache garbage collector
net: have ipconfig not wait if no dev is available
sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd
asix: new device id
davinci-cpdma: fix locking issue in cpdma_chan_stop
sctp: fix incorrect overflow check on autoclose
r8169: fix Config2 MSIEnable bit setting.
llc: llc_cmsg_rcv was getting called after sk_eat_skb.
net: bpf_jit: fix an off-one bug in x86_64 cond jump target
iwlwifi: update SCD BC table for all SCD queues
Revert "Bluetooth: Revert: Fix L2CAP connection establishment"
Bluetooth: Clear RFCOMM session timer when disconnecting last channel
Bluetooth: Prevent uninitialized data access in L2CAP configuration
iwlwifi: allow to switch to HT40 if not associated
iwlwifi: tx_sync only on PAN context
mwifiex: avoid double list_del in command cancel path
ath9k: fix max phy rate at rate control init
nfc: signedness bug in __nci_request()
iwlwifi: do not set the sequence control bit is not needed
Linus Torvalds [Thu, 22 Dec 2011 02:29:05 +0000 (18:29 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: atmel/ac97c: using software reset instead hardware reset if not available
Linus Torvalds [Thu, 22 Dec 2011 02:28:52 +0000 (18:28 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
mfd: Include linux/io.h to jz4740-adc
mfd: Use request_threaded_irq for twl4030-irq instead of irq_set_chained_handler
mfd: Base interrupt for twl4030-irq must be one-shot
mfd: Handle tps65910 clear-mask correctly
mfd: add #ifdef CONFIG_DEBUG_FS guard for ab8500_debug_resources
mfd: Fix twl-core oops while calling twl_i2c_* for unbound driver
mfd: include linux/module.h for ab5500-debugfs
mfd: Update wm8994 active device checks for WM1811
mfd: Set tps6586x bits if new value is different from the old one
mfd: Set da903x bits if new value is different from the old one
mfd: Set adp5520 bits if new value is different from the old one
mfd: Add missed free_irq in da903x_remove
Dave Kleikamp [Wed, 21 Dec 2011 17:05:48 +0000 (11:05 -0600)]
vfs: __read_cache_page should use gfp argument rather than GFP_KERNEL
lockdep reports a deadlock in jfs because a special inode's rw semaphore
is taken recursively. The mapping's gfp mask is GFP_NOFS, but is not
used when __read_cache_page() calls add_to_page_cache_lru().
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'for-greg' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus
* 'for-greg' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb:
usb: gadget: epautoconf: do not change number of streams
usb: dwc3: core: fix cached revision on our structure
usb: musb: fix reset issue with full speed device
usb/isp1760: Let OF bindings depend on general CONFIG_OF instead of PPC_OF .
To be able to use the driver on other OF-aware architectures, too.
And add necessary OF related #includes to fix compilation error.
Signed-off-by: Joachim Foerster <joachim.foerster@missinglinkelectronics.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
enabled the build on all CONFIG_OF architectures, but it cannot do
this.
This driver depends upon CONFIG_OF_IRQ but not all CONFIG_OF platforms
support that infrastructure, in particular Sparc does not so the
build fails.
Please push a patch like the following to Linus so that this code only
gets built where it actually should.
--------------------
usb/isp1760: Add missing CONFIG_OF_IRQ dependency on OF code.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Steffen Klassert [Wed, 21 Dec 2011 21:48:08 +0000 (16:48 -0500)]
net: Add a flow_cache_flush_deferred function
flow_cach_flush() might sleep but can be called from
atomic context via the xfrm garbage collector. So add
a flow_cache_flush_deferred() function and use this if
the xfrm garbage colector is invoked from within the
packet path.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 21 Dec 2011 20:47:16 +0000 (15:47 -0500)]
ipv4: reintroduce route cache garbage collector
Commit 2c8cec5c10b (ipv4: Cache learned PMTU information in inetpeer)
removed IP route cache garbage collector a bit too soon, as this gc was
responsible for expired routes cleanup, releasing their neighbour
reference.
As pointed out by Robert Gladewitz, recent kernels can fill and exhaust
their neighbour cache.
Reintroduce the garbage collection, since we'll have to wait our
neighbour lookups become refcount-less to not depend on this stuff.
Reported-by: Robert Gladewitz <gladewitz@gmx.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the PCI support (as EXPERIMENTAL)
this has been also tested on XLINX XC2V3000 FF1152AMT0221 D1215994A VIRTEX FPGA board.
To support the PCI bus the main part has been reworked
and both the platform and the PCI specific parts have
been moved into different files.
Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 21 Dec 2011 03:30:11 +0000 (03:30 +0000)]
sch_sfq: rehash queues in perturb timer
A known Out Of Order (OOO) problem hurts SFQ when timer changes
perturbation value, since all new packets delivered to SFQ enqueue might
end on different slots than previous in-flight packets.
With round robin delivery, we can thus deliver packets in a different
order.
Since SFQ is limited to small amount of in-flight packets, we can rehash
packets so that this OOO problem is fixed.
This rehashing is performed only if internal flow classifier is in use.
We now store in skb->cb[] the "struct flow_keys" so that we dont call
skb_flow_dissect() again while rehashing.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 21 Dec 2011 02:39:37 +0000 (18:39 -0800)]
Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
mtd: plat_ram: call mtd_device_register only if partition data exists
mtd: pxa2xx-flash.c: It used to fall back to provided table.
mtd: gpmi: add missing include 'module.h'
mtd: ndfc: fix typo in structure dereference
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
Linus Torvalds [Tue, 20 Dec 2011 19:42:38 +0000 (11:42 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time/clocksource: Fix kernel-doc warnings
rtc: m41t80: Workaround broken alarm functionality
rtc: Expire alarms after the time is set.
Linus Torvalds [Tue, 20 Dec 2011 19:41:17 +0000 (11:41 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
Linus Torvalds [Tue, 20 Dec 2011 19:40:48 +0000 (11:40 -0800)]
Merge branch 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
Linus Torvalds [Tue, 20 Dec 2011 19:31:56 +0000 (11:31 -0800)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a regression in nfs_file_llseek()
NFSv4: Do not accept delegated opens when a delegation recall is in effect
NFSv4: Ensure correct locking when accessing the 'lock_states' list
NFSv4.1: Ensure that we handle _all_ SEQUENCE status bits.
NFSv4: Don't error if we handled it in nfs4_recovery_handle_error
SUNRPC: Ensure we always bump the backlog queue in xprt_free_slot
SUNRPC: Fix the execution time statistics in the face of RPC restarts
Linus Torvalds [Tue, 20 Dec 2011 19:31:44 +0000 (11:31 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
vmwgfx: Clip cliprects against screen boundaries in present and dirty
vmwgfx: Resend the cursor after legacy modeset
vmwgfx: Do better culling of presents
vmwgfx: Refactor kms code to use vmw_user_lookup_handle helper
vmwgfx: Add helper function to get surface or dmabuf
vmwgfx: Refactor cursor update
vmwgfx: Remove dmabuf check in present ioctl
vmwgfx: Use the revised fifo hw version register when present
Before waiting (predefined value 120s), check that at least one device
was successfully brought up. Otherwise (e.g. buggy bootloader
which does not set the MAC address) there is no point in waiting
for carrier.
Cc: Micha Nelissen <micha@neli.hopto.org> Cc: Holger Brunck <holger.brunck@keymile.com> Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Einar Lueck [Mon, 19 Dec 2011 22:56:36 +0000 (22:56 +0000)]
qeth: recovery through asynchronous delivery
If recovery is triggered in presence of pending asynchronous
deliveries of storage blocks we do a forced cleanup after
the corresponding tasklets are completely stopped and trigger
appropriate notifications for the correspondingerror state.
Signed-off-by: Einar Lueck <elelueck@de.ibm.com> Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Frank Blaschka [Mon, 19 Dec 2011 22:56:35 +0000 (22:56 +0000)]
qeth: improve recovery during resource shortage
In case there are no system resources to run a recovery we have to clear
recovery bitmasks so a further automatic or manual driven recovery can
fix up the device.
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 19 Dec 2011 22:56:34 +0000 (22:56 +0000)]
netiucv: allow multiple interfaces to same peer
The NETIUCV device driver allows to connect a Linux guest on z/VM to
another z/VM guest based on the z/VM communication facility IUCV.
Multiple output paths to different guests are possible, as well as
multiple input paths from different guests.
With this feature, you can configure multiple point-to-point NETIUCV
interfaces between your Linux on System z instance and another z/VM
guest.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The patch forbids start of a recovery once qeth shutdown is running.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 19 Dec 2011 22:56:32 +0000 (22:56 +0000)]
qeth: suspicious rcu_dereference_check in recovery
qeth layer3 recovery invokes its set_multicast_list function, which
invokes function __vlan_find_dev_deep requiring rcu_read_lock or
rtnl lock. This causes kernel messages:
The patch makes sure the rtnl lock is held once qeth recovery invokes
its set_multicast_list function.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 19 Dec 2011 22:56:31 +0000 (22:56 +0000)]
af_iucv: get rid of state IUCV_SEVERED
af_iucv differs unnecessarily between state IUCV_SEVERED and
IUCV_DISCONN. This patch removes state IUCV_SEVERED.
While simplifying af_iucv, this patch removes the 2nd invocation of
cpcmd as well.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 19 Dec 2011 22:56:30 +0000 (22:56 +0000)]
af_iucv: remove unused timer infrastructure
af_iucv contains timer infrastructure which is not exploited.
This patch removes the timer related code parts.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 19 Dec 2011 22:56:29 +0000 (22:56 +0000)]
af_iucv: release reference to HS device
For HiperSockets transport skbs sent are bound to one of the
available HiperSockets devices. Add missing release of reference to
a HiperSockets device before freeing an skb.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 19 Dec 2011 22:56:28 +0000 (22:56 +0000)]
af_iucv: accelerate close for HS transport
Closing an af_iucv socket may wait for confirmation of outstanding
send requests. This patch adds confirmation code for the new
HiperSockets transport.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Mon, 19 Dec 2011 22:56:27 +0000 (22:56 +0000)]
af_iucv: support ancillary data with HS transport
The AF_IUCV address family offers support for ancillary data.
This patch enables usage of ancillary data with the new
HiperSockets transport.
Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Mon, 19 Dec 2011 04:11:40 +0000 (04:11 +0000)]
sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd
When checking whether a DATA chunk fits into the estimated rwnd a
full sizeof(struct sk_buff) is added to the needed chunk size. This
quickly exhausts the available rwnd space and leads to packets being
sent which are much below the PMTU limit. This can lead to much worse
performance.
The reason for this behaviour was to avoid putting too much memory
pressure on the receiver. The concept is not completely irational
because a Linux receiver does in fact clone an skb for each DATA chunk
delivered. However, Linux also reserves half the available socket
buffer space for data structures therefore usage of it is already
accounted for.
When proposing to change this the last time it was noted that this
behaviour was introduced to solve a performance issue caused by rwnd
overusage in combination with small DATA chunks.
Trying to reproduce this I found that with the sk_buff overhead removed,
the performance would improve significantly unless socket buffer limits
are increased.
The following numbers have been gathered using a patched iperf
supporting SCTP over a live 1 Gbit ethernet network. The -l option
was used to limit DATA chunk sizes. The numbers listed are based on
the average of 3 test runs each. Default values have been used for
sk_(r|w)mem.
Yevgeny Petrilin [Mon, 19 Dec 2011 21:53:38 +0000 (21:53 +0000)]
mlx4_en: FIX: Setting default_qpn before using it
When UDP RSS is enabled, we use same QPN for TCP and UDP ranges
The bug is that the default_qpn was used base UDP qpn before it
was set.
Fixes bug introduced in commit: 1202d460b1df3a77fda66f4ba5f90ae3527dd42f
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
binary_sysctl() calls sysctl_getname() which allocates from names_cache
slab usin __getname()
The matching function to free the name is __putname(), and not putname()
which should be used only to match getname() allocations.
This is because when auditing is enabled, putname() calls audit_putname
*instead* (not in addition) to __putname(). Then, if a syscall is in
progress, audit_putname does not release the name - instead, it expects
the name to get released when the syscall completes, but that will happen
only if audit_getname() was called previously, i.e. if the name was
allocated with getname() rather than the naked __getname(). So,
__getname() followed by putname() ends up leaking memory.
Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Eric Paris <eparis@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
where the oom score computation was divided into several steps and it's no
longer computed as one expression in unsigned long(rss, swapents, nr_pte
are unsigned long), where the result value assigned to points(int) is in
range(1..1000). So there could be an int overflow while computing
176 points *= 1000;
and points may have negative value. Meaning the oom score for a mem hog task
will be one.
196 if (points <= 0)
197 return 1;
For example:
[ 3366] 0 3366 3539048024303939 5 0 0 oom01
Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child
Here the oom1 process consumes more than 24303939(rss)*4096~=92GB physical
memory, but it's oom score is one.
In this situation the mem hog task is skipped and oom killer kills another and
most probably innocent task with oom score greater than one.
The points variable should be of type long instead of int to prevent the
int overflow.
Haogang Chen [Tue, 20 Dec 2011 01:11:56 +0000 (17:11 -0800)]
nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
There is a potential integer overflow in nilfs_ioctl_clean_segments().
When a large argv[n].v_nmembs is passed from the userspace, the subsequent
call to vmalloc() will allocate a buffer smaller than expected, which
leads to out-of-bound access in nilfs_ioctl_move_blocks() and
lfs_clean_segments().
The following check does not prevent the overflow because nsegs is also
controlled by the userspace and could be very large.
if (argv[n].v_nmembs > nsegs * nilfs->ns_blocks_per_segment)
goto out_free;
This patch clamps argv[n].v_nmembs to UINT_MAX / argv[n].v_size, and
returns -EINVAL when overflow.
David Rientjes [Tue, 20 Dec 2011 01:11:52 +0000 (17:11 -0800)]
cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty
nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
new set of allowed cpuset nodes where the two nodemasks, as a result of
the remap, are now disjoint.
c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread. This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.
This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes. This was addressed by 89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind. To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.
This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.
Also moves the check for nodemask intersection inside task_lock() so that
tsk->mems_allowed cannot change. This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk->mempolicy.
Reported-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Paul Menage <paul@paulmenage.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Fri, 9 Dec 2011 03:27:55 +0000 (11:27 +0800)]
mfd: Include linux/io.h to jz4740-adc
Include linux/io.h to fix below build error:
CC drivers/mfd/jz4740-adc.o
drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_irq_demux':
drivers/mfd/jz4740-adc.c:73: error: implicit declaration of function 'readb'
drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_set_enabled':
drivers/mfd/jz4740-adc.c:110: error: implicit declaration of function 'writeb'
drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_set_config':
drivers/mfd/jz4740-adc.c:146: error: implicit declaration of function 'readl'
drivers/mfd/jz4740-adc.c:151: error: implicit declaration of function 'writel'
drivers/mfd/jz4740-adc.c: In function 'jz4740_adc_probe':
drivers/mfd/jz4740-adc.c:249: error: implicit declaration of function 'ioremap_nocache'
drivers/mfd/jz4740-adc.c:249: warning: assignment makes pointer from integer without a cast
drivers/mfd/jz4740-adc.c:289: warning: passing argument 3 of 'mfd_add_devices' discards qualifiers from pointer target type
include/linux/mfd/core.h:93: note: expected 'struct mfd_cell *' but argument is of type 'const struct mfd_cell *'
drivers/mfd/jz4740-adc.c:299: error: implicit declaration of function 'iounmap'
make[2]: *** [drivers/mfd/jz4740-adc.o] Error 1
make[1]: *** [drivers/mfd] Error 2
make: *** [drivers] Error 2
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
NeilBrown [Sat, 26 Nov 2011 20:17:41 +0000 (07:17 +1100)]
mfd: Use request_threaded_irq for twl4030-irq instead of irq_set_chained_handler
irq_set_chained_handler sets 'desc->handle_irq'.
However this irq is called by handle_nested_irq from handle_twl4030_pih,
and that uses action->thread_fn.
So the handled set with irq_set_chained_handler is never called.
So change to use request_threaded_irq instead - that sets the correct field.
Tested on GTA04 Phoenux.
Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
NeilBrown [Sat, 26 Nov 2011 20:17:41 +0000 (07:17 +1100)]
mfd: Base interrupt for twl4030-irq must be one-shot
As the interrupt source is only cleared by the threaded interrupt
service routine, we need to make the base interrupt IRQF_ONESHOT.
Without this, the first interrupt from the TWL4030 cause the CPU to
enter an infinite loop trying to handle to interrupt but never
clearing it.
Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: Felipe Contreras <felipe.contreras@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Axel Lin [Mon, 7 Nov 2011 03:20:09 +0000 (11:20 +0800)]
mfd: include linux/module.h for ab5500-debugfs
Include linux/module.h to fix below build error:
CC drivers/mfd/ab5500-debugfs.o
drivers/mfd/ab5500-debugfs.c:571: error: 'THIS_MODULE' undeclared here (not in a function)
make[2]: *** [drivers/mfd/ab5500-debugfs.o] Error 1
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Axel Lin [Mon, 31 Oct 2011 06:24:30 +0000 (14:24 +0800)]
mfd: Set tps6586x bits if new value is different from the old one
It does not make sense to write new value only when all the bit_mask
bits are zero.
We need to write new value if the bit mask fields of new value is
not equal to old value.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Axel Lin [Mon, 31 Oct 2011 06:23:03 +0000 (14:23 +0800)]
mfd: Set da903x bits if new value is different from the old one
It does not make sense to write new value only when all the bit_mask
bits are zero.
We need to write new value if the bit mask fields of new value is
not equal to old value.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Axel Lin [Mon, 31 Oct 2011 03:00:06 +0000 (11:00 +0800)]
mfd: Set adp5520 bits if new value is different from the old one
Current code checks if all the bit_mask bits are all zero is wrong.
We need to write new value if the bit mask fields of new value is
not equal to old value.
Signed-off-by: Axel Lin <axel.lin@gmail.com> Acked-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Dmitry Kasatkin [Mon, 5 Dec 2011 11:17:41 +0000 (13:17 +0200)]
evm: key must be set once during initialization
On multi-core systems, setting of the key before every caclculation,
causes invalid HMAC calculation for other tfm users, because internal
state (ipad, opad) can be invalid before set key call returns.
It needs to be set only once during initialization.
Rusty Russell [Mon, 19 Dec 2011 14:08:01 +0000 (14:08 +0000)]
module_param: make bool parameters really bool (net & drivers/net)
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.
It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.
(Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).
Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Rusty Russell [Thu, 15 Dec 2011 03:04:50 +0000 (13:34 +1030)]
mmc: vub300: fix type of firmware_rom_wait_states module parameter
You didn't mean this to be a bool.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Tony Olech <tony.olech@elandigitalsystems.com> Cc: <stable@kernel.org> Signed-off-by: Chris Ball <cjb@laptop.org>
Ohad Ben-Cohen [Mon, 19 Dec 2011 23:51:38 +0000 (15:51 -0800)]
Revert "mmc: enable runtime PM by default"
When SDIO runtime PM was originally introduced, we immediately faced
two regressions with two different chipsets, and in response decided
not to enable it by default.
With the recent work on the 8686 we hoped we found all the gotchas,
so 08da834 did make sense (at least experimentally).
Unfortunately we now see that some setups out there still refuse to
work when SDIO runtime PM is enabled by default (see
http://www.spinics.net/lists/linux-mmc/msg11161.html), and obviously
we can't live with these kind of regressions.
Linus Torvalds [Mon, 19 Dec 2011 23:13:53 +0000 (15:13 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/qib: Correct sense on freectxts increment and decrement
RDMA/cma: Verify private data length
IB/mlx4: Fix shutdown crash accessing a non-existent bitmap