Yi Zou [Fri, 16 Mar 2012 23:08:11 +0000 (23:08 +0000)]
net: do not do gso for CHECKSUM_UNNECESSARY in netif_needs_gso
This is related to fixing the bug of dropping FCoE frames when disabling tx ip
checksum by 'ethtool -K ethx tx off'. The FCoE protocol stack driver would
use CHECKSUM_UNNECESSARY on tx path instead of CHECKSUM_PARTIAL (as indicated in
the 2/2 of this series). To do so, netif_needs_gso() has to be changed here to
not do gso for both CHECKSUM_PARTIAL and CHECKSUM_UNNECESSARY.
Ref. to original discussion thread:
http://patchwork.ozlabs.org/patch/146567/
Signed-off-by: Yi Zou <yi.zou@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 16 Mar 2012 23:07:34 +0000 (23:07 +0000)]
ixgbe: Fix issues with SR-IOV loopback when flow control is disabled
This patch allows us to avoid a Tx hang when SR-IOV is enabled. This hang
can be triggered by sending small packets at a rate that was triggering Rx
missed errors from the adapter while the internal Tx switch and at least
one VF are enabled.
This was all due to the fact that under heavy stress the Rx FIFO never
drained below the flow control high water mark. This resulted in the Tx
FIFO being head of line blocked due to the fact that it relies on the flow
control high water mark to determine when it is acceptable for the Tx to
place a packet in the Rx FIFO.
The resolution for this is to set the FCRTH value to the RXPBSIZE - 32 so
that even if the ring is almost completely full we can still place Tx
packets on the Rx ring and drop incoming Rx traffic if we do not have
sufficient space available in the Rx FIFO.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Mon, 19 Mar 2012 21:27:06 +0000 (17:27 -0400)]
net/hyperv: Fix the code handling tx busy
Instead of dropping the packet, we keep the skb buffer, and return
NETDEV_TX_BUSY to let upper layer retry send. This will not cause
endless loop, because the host is taking data away from ring buffer,
and we have called the stop_queue before returning NETDEV_TX_BUSY.
The stop_queue was called in the function netvsc_send() in file
netvsc.c, then it returns to rndis_filter_send(), which returns to
netvsc_start_xmit() in file netvsc_drv.c. So the NETDEV_TX_BUSY is
indeed returned AFTER queue is stopped.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher [Sat, 18 Feb 2012 07:08:14 +0000 (07:08 +0000)]
ixgbe: fix namespace issues when FCoE/DCB is not enabled
Resolve namespace issues when FCoE or DCB is not enabled.
The issue is with certain configurations we end up with namespace
problems. A simple example:
ixgbe_main.c
- defines func A()
- uses func A()
ixgbe_fcoe.c
- uses func A()
ixgbe.h
- has prototype for func A()
For default (FCoE included) all is good. But when it isn't the namespace
checker complains about how func A() could be static.
To resolve this, created a ixgbe_lib file to contain functions used
by DCB/FCoE and their helper functions so that they are always in
namespace whether or not DCB/FCoE is enabled.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Eric Dumazet [Sun, 18 Mar 2012 11:07:47 +0000 (11:07 +0000)]
tcp: reduce out_of_order memory use
With increasing receive window sizes, but speed of light not improved
that much, out of order queue can contain a huge number of skbs, waiting
to be moved to receive_queue when missing packets can fill the holes.
Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
sk_buff)) to store regular (MTU <= 1500) frames. This makes highly
probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in
many cases.
When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true
latency killer and cpu cache blower.
Doing the coalescing attempt each time we add a frame in ofo queue
permits to keep memory use tight and in many cases avoid the
tcp_collapse() thing later.
Tested on various wireless setups (b43, ath9k, ...) known to use big skb
truesize, this patch removed the "packets collapsed in receive queue due
to low socket buffer" I had before.
This also reduced average memory used by tcp sockets.
With help from Neal Cardwell.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 18 Mar 2012 11:06:44 +0000 (11:06 +0000)]
tcp: introduce tcp_data_queue_ofo
Split tcp_data_queue() in two parts for better readability.
tcp_data_queue_ofo() is responsible for queueing incoming skb into out
of order queue.
Change code layout so that the skb_set_owner_r() is performed only if
skb is not dropped.
This is a preliminary patch before "reduce out_of_order memory use"
following patch.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 18 Mar 2012 10:33:44 +0000 (10:33 +0000)]
bnx2x: consistent statistics for old FW
Previously applied patch making the bnx2x statistics consistent
did not apply to old FWs. This remedies it, extending the consistent
behaviour to all drivers.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Reported-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 18 Mar 2012 10:33:43 +0000 (10:33 +0000)]
bnx2x: changed iscsi/fcoe mac init and macros
This includes changes in macros to better distinguish between the two
protocols, and slightly changed the way their macs are set.
Notice this file contains string print lines with more than 80 characters,
as to not break prints.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 18 Mar 2012 10:33:41 +0000 (10:33 +0000)]
bnx2x: changed initial dcb configuration
The changes were mostly made to enable back-to-back data flow with dcb.
Other changes were simply deemed as a better 'clean' initial configuration.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 18 Mar 2012 10:33:40 +0000 (10:33 +0000)]
bnx2x: removed dcb unused code
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 18 Mar 2012 10:33:39 +0000 (10:33 +0000)]
bnx2x: reduced sparse warnings
This patch reduces sparse warnings in the bnx2x code,
mostly by changing functions into static and changing
initialization of structures.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merav Sicron [Sun, 18 Mar 2012 10:33:38 +0000 (10:33 +0000)]
bnx2x: revised driver prints
We've revised driver prints, changing the mask of existing prints
to allow better control over the debug messages, added prints to
error scenarios, removed unnecessary prints and corrected some spelling.
Please note that this patch contains lines with over 80 characters,
as string messages were kept in a single line.
Signed-off-by: Merav Sicron <meravs@broadcom.com> Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 18 Mar 2012 10:33:37 +0000 (10:33 +0000)]
bnx2x: added 'likely' to fast-path skb existence
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Sat, 11 Feb 2012 07:18:57 +0000 (07:18 +0000)]
ixgbe: Correct flag values set by ixgbe_fix_features
This patch replaces the variable name data with the variable name features
for ixgbe_fix_features and ixgbe_set_features. This helps to make some
issues more obvious such as the fact that we were disabling Rx VLAN tag
stripping when we should have been forcing it to be enabled when DCB is
enabled.
In addition there was deprecated code present that was disabling the LRO
flag if we had the itr value set too low. I have updated this logic so
that we will now allow the LRO flag to be set, but will not enable RSC
until the rx-usecs value is high enough to allow enough time for Rx packet
coalescing.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:51:53 +0000 (07:51 +0000)]
ixgbe: Add support for enabling UDP RSS via the ethtool rx-flow-hash command
This patch adds support for enabling or disabling UDP RSS via the
ethtool -N rx-flow-hash command.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:51:47 +0000 (07:51 +0000)]
ixgbe: Whitespace cleanups
This patch contains several fixes for formatting in regards to whitespace.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:51:42 +0000 (07:51 +0000)]
ixgbe: Two minor fixes for RSS and FDIR set queues functions
This change fixes two minor issues. The first was the fact that we were
setting the return value to false twice in the set_rss_queues function.
The second is the fact that we should have been using "min_t(int," instead
of "min((int)" in set_fdir_queues.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:51:37 +0000 (07:51 +0000)]
ixgbe: drop err_eeprom tag which is at same location as err_sw_init
The err_eeprom and err_sw_init tags both go to the same location. So
instead of maintaining two tags this patch combines them so we only use
err_sw_init.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:51:27 +0000 (07:51 +0000)]
ixgbe: Move poll routine in order to improve readability
This change relocates the ixgbe_poll routine so it is right next to the
interrupt routine that schedules and calls it.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:51:22 +0000 (07:51 +0000)]
ixgbe: cleanup logic for the service timer and VF hang detection
This change just cleans up some of the logic in the service_timer function
so that we can avoid unnecessary swapping of the ready value between true to
false and back to true.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:51:16 +0000 (07:51 +0000)]
ixgbe: Update layout of ixgbe_ring structure to improve cache performance
This change makes it so that only the 2nd cache line in the ring structure
should see frequent updates. The advantage to this is that it should
reduce the amount of cross CPU cache bouncing since only the 2nd cache line
will be changing between most network transactions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:51:11 +0000 (07:51 +0000)]
ixgbe: Store Tx flags and protocol information to tx_buffer sooner
This change makes it so that we store the tx_flags and protocol information
to the tx_buffer_info structure sooner. This allows us to avoid unnecessary
read/write transactions since we are placing the data in the final location
earlier.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jason Baron [Fri, 16 Mar 2012 20:34:03 +0000 (16:34 -0400)]
Don't limit non-nested epoll paths
Commit 28d82dc1c4ed ("epoll: limit paths") that I did to limit the
number of possible wakeup paths in epoll is causing a few applications
to longer work (dovecot for one).
The original patch is really about limiting the amount of epoll nesting
(since epoll fds can be attached to other fds). Thus, we probably can
allow an unlimited number of paths of depth 1. My current patch limits
it at 1000. And enforce the limits on paths that have a greater depth.
This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578
Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking changes from David Miller:
"1) icmp6_dst_alloc() returns NULL instead of ERR_PTR() leading to
crashes, particularly during shutdown. Reported by Dave Jones and
fixed by Eric Dumazet.
2) hyperv and wimax/i2400m return NETDEV_TX_BUSY when they have
already freed the SKB, which causes crashes as to the caller this
means requeue the packet. Fixes from Eric Dumazet.
3) usbnet driver doesn't allocate the right amount of headroom on
fresh RX SKBs, fix from Eric Dumazet.
4) Fix regression in ip6_mc_find_dev_rcu(), as an RCU lookup it
abolutely should not take a reference to 'dev', this leads to
leaks. Fix from RonQing Li.
5) Fix netfilter ctnetlink race between delete and timeout expiration.
From Pablo Neira Ayuso.
6) Revert SFQ change which causes regressions, specifically queueing
to tail can lead to unavoidable flow starvation. From Eric
Dumazet.
7) Fix a memory leak and a crash on corrupt firmware files in bnx2x,
from Michal Schmidt."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
netfilter: ctnetlink: fix race between delete and timeout expiration
ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
net/hyperv: fix erroneous NETDEV_TX_BUSY use
net/usbnet: reserve headroom on rx skbs
bnx2x: fix memory leak in bnx2x_init_firmware()
bnx2x: fix a crash on corrupt firmware file
sch_sfq: revert dont put new flow at the end of flows
ipv6: fix icmp6_dst_alloc()
Linus Torvalds [Sat, 17 Mar 2012 16:54:16 +0000 (09:54 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar.
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools, x86: Build perf on older user-space as well
perf tools: Use scnprintf where applicable
perf tools: Incorrect use of snprintf results in SEGV
netfilter: ctnetlink: fix race between delete and timeout expiration
Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.
After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.
Tested-by: Kerin Millar <kerframil@gmail.com> Reported-by: Kerin Millar <kerframil@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 8 Feb 2012 07:51:06 +0000 (07:51 +0000)]
ixgbe: always write DMA for single_mapped value with skb
This change makes it so that we always write the DMA address for the skb
itself on the same tx_buffer struct that the skb is written on. This way
we don't need the MAPPED_AS_PAGE flag and we always know it will be the
first DMA value that we will have to unmap.
In addition I have found an issue in which we were leaking a DMA mapping if
the value happened to be 0 which is possible on some platforms. In order
to resolve that I have updated the transmit path to use the length instead
of the DMA mapping in order to determine if a mapping is actually present.
One other tweak in this patch is that it only writes the olinfo information
on the first descriptor. As it turns out it isn't necessary to write it
for anything but the first descriptor so there is no need to carry it
forward.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:51:01 +0000 (07:51 +0000)]
ixgbe: Write gso_segs and bytcount to the ring sooner
This change makes it so that gso_segs and bytecount are written to the ring
sooner. This helps to simplify the logic for the two since segmentation
offloads can now update them within their own function.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:50:56 +0000 (07:50 +0000)]
ixgbe: Place skb on first buffer_info structure to avoid using stack space
Instead of keeping a local copy of the skb on the stack for as long as long
as we do it makes sense to instead just place it on the first tx_buffer
structure so that we can save space on the stack and avoid unnecessary
read/write operations copying the pointer out of the stack and onto the
ring later.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:50:51 +0000 (07:50 +0000)]
ixgbe: Use packets to track Tx completions instead of a seperate value
A separate value was added to track Tx completions in order to determine if
the Tx unit was hung. However we can do the same thing using the number of
packets completed without having to add another stat to the Tx ring.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:50:45 +0000 (07:50 +0000)]
ixgbe: Modify setup of descriptor flags to avoid conditional jumps
This change makes it more likely that the descriptor flags setup will use
cmov instructions instead of conditional jumps when setting up the flags.
The advantage to this is that the code should just flow a bit more
smoothly.
To do this it is necessary to set the TX_FLAGS_CSUM bit in tx_flags when
doing TSO so that we also do the checksum in addition to the segmentation
offload.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:50:40 +0000 (07:50 +0000)]
ixgbe: Make certain that all frames fit minimum size requirements
This change makes certain that any packet we attempt to transmit will meet
minimum size requirements for the hardware.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 8 Feb 2012 07:50:35 +0000 (07:50 +0000)]
ixgbe: cleanup logic in ixgbe_change_mtu
This change is meant to just cleanup the logic in ixgbe_change_mtu since we
are making it unnecessarily complex due to a workaround required for 82599
when SR-IOV is enabled.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Sat, 3 Mar 2012 02:35:52 +0000 (02:35 +0000)]
ixgbe: Replace standard receive path with a page based receive
This patch replaces the existing Rx hot-path in the ixgbe driver with a new
implementation that is based on performing a double buffered receive. The
ixgbe driver already had something similar in place for its' packet split
path, however in that case we were still receiving the header for the
packet into the sk_buff. The big change here is the entire receive path
will receive into pages only, and then pull the header out of the page and
copy it into the sk_buff data. There are several motivations behind this
approach.
First, this allows us to avoid several cache misses as we were taking a
set of cache misses for allocating the sk_buff and then another set for
receiving data into the sk_buff. We are able to avoid these misses on
receive now as we allocate the sk_buff when data is available.
Second we are able to see a considerable performance gain when an IOMMU is
enabled because we are no longer unmapping every buffer on receive.
Instead we can delay the unmap until we are unable to use the page, and
instead we can simply call sync_single_range on the half of the page that
contains new data.
Finally we are able to drop a considerable amount of code from the driver
as we no longer have to support 2 different receive modes, packet split and
one buffer. This allows us to optimize the Rx path further since less
branching is required.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Tested-by: Stephen Ko <stephen.s.ko@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ben Greear [Thu, 8 Mar 2012 08:28:41 +0000 (08:28 +0000)]
ixgbe: Support RX-ALL feature flag.
This allows the NIC to receive all frames available, including
those with bad FCS, ethernet control frames, and more.
Tested by sending frames with bad FCS.
Signed-off-by: Ben Greear <greearb@candelatech.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ben Greear [Tue, 6 Mar 2012 09:42:04 +0000 (09:42 +0000)]
ixgbe: Support sending custom Ethernet FCS.
Including bad FCS, used generate frames with bad FCS
to test other system's handling of RX of bad packets.
Signed-off-by: Ben Greear <greearb@candelatech.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ben Greear [Tue, 6 Mar 2012 09:41:58 +0000 (09:41 +0000)]
igb: Support RX-ALL feature flag.
This allows the NIC to receive all frames available, including
those with bad FCS, un-matched vlans, ethernet control frames,
and more.
Tested by sending frames with bad FCS.
Signed-off-by: Ben Greear <greearb@candelatech.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ben Greear [Tue, 6 Mar 2012 09:41:53 +0000 (09:41 +0000)]
igb: Support sending custom Ethernet FCS.
Including bad FCS, used generate frames with bad FCS
to test other system's handling of RX of bad packets.
Signed-off-by: Ben Greear <greearb@candelatech.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Philipp Zabel [Thu, 15 Mar 2012 08:19:29 +0000 (08:19 +0000)]
net/irda: add clk_prepare/clk_unprepare to pxaficp_ir
This patch adds clk_prepare/clk_unprepare calls to the pxaficp_ir
driver by using the helper functions clk_prepare_enable and
clk_disable_unprepare.
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Cc: Eric Miao <eric.y.miao@gmail.com> Cc: Haojian Zhuang <haojian.zhuang@marvell.com> Cc: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Thu, 15 Mar 2012 05:25:58 +0000 (05:25 +0000)]
arp: allow arp processing to honor per interface arp_accept sysctl
I found recently that the arp_process function which handles all of our received
arp frames, is using IPV4_DEVCONF_ALL macro to check the state of the arp_process
flag. This seems wrong, as it implies that either none or all of the network
interfaces accept gratuitous arps. This patch corrects that, allowing
per-interface arp_accept configuration to deviate from the all setting. Note
this also brings us into line with the way the arp_filter setting is handled
during arp_process execution.
Tested this myself on my home network, and confirmed it works as expected.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Orishko [Wed, 14 Mar 2012 04:00:24 +0000 (04:00 +0000)]
usbnet: use netif_tx_wake_queue instead of netif_start_queue
If host is going to autosuspend function with two interfaces and
if IP packet has arrived in-between of two usbnet_suspend() callbacks,
i.e usbnet_resume() is called in-between, tx data flow is stopped.
When autosuspend timer expires and device is put to autosuspend
again, tx queue is waked up and data can be sent again.
This behavior might be repeated several times in a row.
Tested on Intel/ARM.
Reviewed-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Tested-by: Dmitry Tarnyagin <dmitry.tarnyagin@stericsson.com> Signed-off-by: Alexey Orishko <alexey.orishko@stericsson.com> Acked-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Santosh Nayak [Mon, 12 Mar 2012 22:58:24 +0000 (22:58 +0000)]
netxen: qlogic ethernet : Fix endian bug.
Change the datatype of "ip_addr" to __be32 as 'ip' should be in
big endian format.
Adapter needs "ip address" in big endian format stored at lower 32bit
of req.word[1]. netxen_config_ipaddr() now receives 'ip' in big endian
format. To satisfy adapter's need, use memcpy() to copy byte by byte
of 'ip' into lower 32bit of req.word[1].
Mac address and serial number of adapter need to be in little endian format.
Change the data type of the related variables to __le32 / __le64 or cast it
explicitly to __le32 / __le64 depending upon the requirement.
Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
RongQing.Li [Thu, 15 Mar 2012 22:54:14 +0000 (22:54 +0000)]
ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.
[ bug introduced in 96b52e61be1 (ipv6: mcast: RCU conversions) ]
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 17 Mar 2012 00:14:55 +0000 (17:14 -0700)]
Merge branch 'akpm' (more patches from Andrew)
Merge some more email patches from Andrew Morton:
"A couple of nilfs fixes"
* emailed from Andrew Morton <akpm@linux-foundation.org>:
nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
nilfs2: clamp ns_r_segments_percentage to [1, 99]
Ryusuke Konishi [Sat, 17 Mar 2012 00:08:39 +0000 (17:08 -0700)]
nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
According to the report from Slicky Devil, nilfs caused kernel oops at
nilfs_load_super_block function during mount after he shrank the
partition without resizing the filesystem:
Haogang Chen [Sat, 17 Mar 2012 00:08:38 +0000 (17:08 -0700)]
nilfs2: clamp ns_r_segments_percentage to [1, 99]
ns_r_segments_percentage is read from the disk. Bogus or malicious
value could cause integer overflow and malfunction due to meaningless
disk usage calculation. This patch reports error when mounting such
bogus volumes.
Linus Torvalds [Sat, 17 Mar 2012 00:04:02 +0000 (17:04 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull maintainer update from James Morris:
"Please pull this patch which adds Serge as maintainer of the
capabilities code, as discussed on lwn and the lsm list.
New capabilities must be signed off by the maintainer, and new uses of
any capabilities should at be cc'd to the maintainer."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
MAINTAINERS: Add Serge as maintainer of capabilities
Linus Torvalds [Sat, 17 Mar 2012 00:03:15 +0000 (17:03 -0700)]
Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming
Pull c6x bugfix from Mark Salter:
"Remove dead code from entry.S which causes a build failure when using
a newer assembler (v2.22 complains about it, v2.20 ignores it)."
* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
C6X: remove dead code from entry.S
Mark Salter [Fri, 16 Mar 2012 13:27:57 +0000 (09:27 -0400)]
C6X: remove dead code from entry.S
The ENDPROC() on sys_fadvise64_c6x() in arch/c6x/kernel/entry.S is
outside of the conditional block with the matching ENTRY() macro. This
leads a newer (v2.22 vs. v2.20) assembler to complain:
/tmp/ccGZBaPT.s: Assembler messages:
/tmp/ccGZBaPT.s: Error: .size expression for sys_fadvise64_c6x does not evaluate to a constant
The conditional block became dead code when c6x switched to generic
unistd.h and should be removed along with the offending ENDPROC().
Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: David Howells <dhowells@redhat.com>
Alexey Orishko [Wed, 14 Mar 2012 11:26:12 +0000 (11:26 +0000)]
cdc_ncm: avoid discarding datagrams in rx path
Changes:
- removed a limit for amount of datagrams for IN NTB
- using pointer to traverse NTB in rx_fixup()
- renamed "temp" to "len" in rx_fixup()
- do NTB sequence number check in rx path
Tested on Intel/ARM.
Reviewed-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Tested-by: Dmitry Tarnyagin <Dmitry.Tarnyagin@stericsson.com> Signed-off-by: Alexey Orishko <alexey.orishko@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Orishko [Wed, 14 Mar 2012 11:26:11 +0000 (11:26 +0000)]
cdc_ncm: fix MTU and max_datagram_size handling
Changes/fixes:
- inform device if max_datagram_size was changed by host
- max_datagram_size can't be bigger MTU in ETH func descr
- fix constants definitions to enable running CAIF service over NCM
Tested on Intel/ARM.
Reviewed-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Tested-by: Dmitry Tarnyagin <Dmitry.Tarnyagin@stericsson.com> Signed-off-by: Alexey Orishko <alexey.orishko@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Orishko [Wed, 14 Mar 2012 11:26:10 +0000 (11:26 +0000)]
cdc_ncm: reduce driver latency in the data path
Changes:
- use high resolution timer
latency became approx. 10-15 times less than before
- use taklet for sending remaining data in tx path
Tested on Intel/ARM.
Reviewed-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Tested-by: Dmitry Tarnyagin <Dmitry.Tarnyagin@stericsson.com> Signed-off-by: Alexey Orishko <alexey.orishko@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 14 Mar 2012 09:21:44 +0000 (09:21 +0000)]
wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
A driver start_xmit() method cannot free skb and return NETDEV_TX_BUSY,
since caller is going to reuse freed skb.
In fact netif_tx_stop_queue() / netif_stop_queue() is needed before
returning NETDEV_TX_BUSY or you can trigger a ksoftirqd fatal loop.
In case of memory allocation error, only safe way is to drop the packet
and return NETDEV_TX_OK
Also increments tx_dropped counter
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 14 Mar 2012 06:56:25 +0000 (06:56 +0000)]
net/usbnet: reserve headroom on rx skbs
network drivers should reserve some headroom on incoming skbs so that we
dont need expensive reallocations, eg forwarding packets in tunnels.
This NET_SKB_PAD padding is done in various helpers, like
__netdev_alloc_skb_ip_align() in this patch, combining NET_SKB_PAD and
NET_IP_ALIGN magic.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Oliver Neukum <oneukum@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Thu, 15 Mar 2012 14:08:29 +0000 (14:08 +0000)]
bnx2x: fix memory leak in bnx2x_init_firmware()
When cycling the interface down and up, bnx2x_init_firmware() knows that
the firmware is already loaded, but nevertheless it allocates certain
arrays anew (init_data, init_ops, init_ops_offsets, iro_arr). The old
arrays are leaked.
Fix the leaks by returning early if the firmware was already loaded.
Because if the firmware is loaded, so are the arrays.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Thu, 15 Mar 2012 14:08:28 +0000 (14:08 +0000)]
bnx2x: fix a crash on corrupt firmware file
If the requested firmware is deemed corrupt and then released, reset the
pointer to NULL in order to avoid double-freeing it in
bnx2x_release_firmware() or dereferencing it in bnx2x_init_firmware().
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 13 Mar 2012 18:04:25 +0000 (18:04 +0000)]
sch_sfq: revert dont put new flow at the end of flows
This reverts commit d47a0ac7b6 (sch_sfq: dont put new flow at the end of
flows)
As Jesper found out, patch sounded great but has bad side effects.
In stress situation, pushing new flows in front of the queue can prevent
old flows doing any progress. Packets can stay in SFQ queue for
unlimited amount of time.
It's possible to add heuristics to limit this problem, but this would
add complexity outside of SFQ scope.
A more sensible answer to Dave Taht concerns (who reported the issued I
tried to solve in original commit) is probably to use a qdisc hierarchy
so that high prio packets dont enter a potentially crowded SFQ qdisc.
Reported-by: Jesper Dangaard Brouer <jdb@comx.dk> Cc: Dave Taht <dave.taht@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 14 Mar 2012 20:18:32 +0000 (20:18 +0000)]
asix: asix_rx_fixup surgery to reduce skb truesizes
asix_rx_fixup() is complex, and does some unnecessary memory copies (at
least on x86 where NET_IP_ALIGN is 0)
Also, it tends to provide skbs with a big truesize (4096+256 with
MTU=1500) to upper stack, so incoming trafic consume a lot of memory and
I noticed early packet drops because we hit socket rcvbuf too fast.
Switch to a different strategy, using copybreak so that we provide nice
skbs to upper stack (including the NET_SKB_PAD to avoid future head
reallocations in some paths)
With this patch, I no longer see packets drops or tcp collapses on
various tcp workload with a AX88772 adapter.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Aurelien Jacobs <aurel@gnuage.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Trond Wuellner <trond@chromium.org> Cc: Grant Grundler <grundler@chromium.org> Cc: Paul Stewart <pstew@chromium.org> Reviewed-by: Grant Grundler <grundler@chromium.org> Reviewed-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 16 Mar 2012 00:16:22 +0000 (17:16 -0700)]
Merge branch 'akpm' (Andrew's patch-bomb)
Merge patches from Andrew Morton:
"Nine patches - some bug fixes and some MAINTAINERS fiddling."
* emailed from Andrew Morton <akpm@linux-foundation.org>:
drivers/video/backlight/s6e63m0.c: fix corruption storing gamma mode
MAINTAINERS: add entry for exynos mipi display drivers
MAINTAINERS: fix link to Gustavo Padovans tree
MAINTAINERS: add Johan to Bluetooth maintainers
MAINTAINERS: Gustavo has moved
prctl: use CAP_SYS_RESOURCE for PR_SET_MM option
rapidio/tsi721: fix bug in register offset definitions
MAINTAINERS: update ST's Mailing list for SPEAr
memcg: free mem_cgroup by RCU to fix oops
Linus Torvalds [Fri, 16 Mar 2012 00:14:35 +0000 (17:14 -0700)]
Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
Pull i2c subsystem fixes from Jean Delvare.
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c-algo-bit: Fix spurious SCL timeouts under heavy load
i2c-core: Comment says "transmitted" but means "received"
Linus Torvalds [Fri, 16 Mar 2012 00:13:39 +0000 (17:13 -0700)]
Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck.
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (zl6100) Enable interval between chip accesses for all chips
hwmon: (w83627ehf) Describe undocumented pwm attributes
hwmon: (w83627ehf) Fix temp2 source for W83627UHG
hwmon: (w83627ehf) Fix memory leak in probe function
hwmon: (w83627ehf) Fix writing into fan_stop_time for NCT6775F/NCT6776F
Linus Torvalds [Fri, 16 Mar 2012 00:07:25 +0000 (17:07 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm exynos/intel updates from Dave Airlie:
"Two minor updates from Jesse for Intel SNB fixes, and a few fixes from
Samsung for exynos. The pull req has Alan's commit in it since Intel
based their tree on my tree at that time, but it all seems fine wrt
merging."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm exynos: use drm_fb_helper_set_par directly
drm/exynos: Fix fb_videomode <-> drm_mode_modeinfo conversion
drm/exynos: fix runtime_pm fimd device state on probe
drm/exynos: use correct 'exynos-drm' name for platform device
drm/i915: support 32 bit BGR formats in sprite planes
drm/i915: fix color order for BGR formats on SNB
drm/gma500: Fix Cedarview boot failures in 3.3-rc
Linus Torvalds [Fri, 16 Mar 2012 00:06:05 +0000 (17:06 -0700)]
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"For 4 fixes for 3.3 (all trivial):
- uvc video driver: fixes a division by zero;
- davinci: add module.h to fix compilation;
- smsusb: fix the delivery system setting;
- smsdvb: the get_frontend implementation there is broken.
The smsdvb patch has 127 lines, but it is trivial: instead of
returning a cache of the set_frontend (with is wrong, as it doesn't
have the updated values for the data, and the implementation there is
buggy), it copies the information of the detected DVB parameters from
the smsdvb private structures into the corresponding DVBv5 struct
fields."
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] smsdvb: fix get_frontend
[media] smsusb: fix the default delivery system setting
[media] media: davinci: added module.h to resolve unresolved macros
[media] [FOR,v3.3] uvcvideo: Avoid division by 0 in timestamp calculation
Linus Torvalds [Fri, 16 Mar 2012 00:04:56 +0000 (17:04 -0700)]
Merge branch '3.3-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target fixes from Nicholas Bellinger:
"This series addresses two recently reported regression bugs related to
legacy SCSI reservation usage in target core, and iscsi-target
reservation conflict handling.
The second patch in particular addresses possible data-corruption with
SCSI reservations that is specific to iscsi-target fabric LUNs with
multiple client writers. Both patches need to go into v3.2 stable
ASAP, and the branch based on the last target-pending/3.3-rc-fixes
HEAD.
Again, thanks to Martin Svec for his help to identify and address this
regression bug with iscsi-target."
strict_strtoul() writes a long but ->gamma_mode only has space to store an
int, so on 64 bit systems we end up scribbling over ->gamma_table_count as
well. I've changed it to use kstrtouint() instead.
Donghwa Lee [Thu, 15 Mar 2012 22:17:11 +0000 (15:17 -0700)]
MAINTAINERS: add entry for exynos mipi display drivers
I'd like to add Inki Dae, Donghwa Lee and Kyungmin Park as maintainers
who developers for exynos mipi display drivers for
video/driver/exynos/exynos_mipi* and include/video/exynos_mipi*.
Signed-off-by: Donghwa Lee <dh09.lee@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Kukjin Kim <kgene.kim@samsung.com> Cc: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johan Hedberg [Thu, 15 Mar 2012 22:17:11 +0000 (15:17 -0700)]
MAINTAINERS: add Johan to Bluetooth maintainers
I've been coordinating Bluetooth patches in my tree for some time and
it's possible I'll do it in the future too, so add myself to the
Bluetooth sections as well as mention my tree there.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: "Gustavo F. Padovan" <padovan@profusion.mobi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Thu, 15 Mar 2012 22:17:10 +0000 (15:17 -0700)]
prctl: use CAP_SYS_RESOURCE for PR_SET_MM option
CAP_SYS_ADMIN is already overloaded left and right, so to have more
fine-grained access control use CAP_SYS_RESOURCE here.
The CAP_SYS_RESOUCE is chosen because this prctl option allows a current
process to adjust some fields of memory map descriptor which rather
represents what the process owns: pointers to code, data, stack
segments, command line, auxiliary vector data and etc.
Suggested-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio/tsi721: fix bug in register offset definitions
Fix indexed register offset definitions that use decimal (wrong) instead
of hexadecimal (correct) notation for indexing multipliers.
Incorrect definitions do not affect Tsi721 driver in its current default
configuration because it uses only IDB queue 0. Loss of inbound
doorbell functionality should be observed if queue other than 0 is used.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Chul Kim <chul.kim@idt.com> Cc: <stable@vger.kernel.org> [3.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Viresh Kumar [Thu, 15 Mar 2012 22:17:09 +0000 (15:17 -0700)]
MAINTAINERS: update ST's Mailing list for SPEAr
We have created a ST's Mailing list for SPEAr. This can be accessed
from non-st email ids. I want people to cc this list, when they have
changes specific to SPEAr. So, its better to get this updated in
MAINTAINERS file.
linux-arm-kernel@lists.infradead.org is also added for SPEAr.
Signed-off-by: Viresh Kumar <viresh.kumar@st.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Thu, 15 Mar 2012 22:17:07 +0000 (15:17 -0700)]
memcg: free mem_cgroup by RCU to fix oops
After fixing the GPF in mem_cgroup_lru_del_list(), three times one
machine running a similar load (moving and removing memcgs while
swapping) has oopsed in mem_cgroup_zone_nr_lru_pages(), when retrieving
memcg zone numbers for get_scan_count() for shrink_mem_cgroup_zone():
this is where a struct mem_cgroup is first accessed after being chosen
by mem_cgroup_iter().
Just what protects a struct mem_cgroup from being freed, in between
mem_cgroup_iter()'s css_get_next() and its css_tryget()? css_tryget()
fails once css->refcnt is zero with CSS_REMOVED set in flags, yes: but
what if that memory is freed and reused for something else, which sets
"refcnt" non-zero? Hmm, and scope for an indefinite freeze if refcnt is
left at zero but flags are cleared.
It's tempting to move the css_tryget() into css_get_next(), to make it
really "get" the css, but I don't think that actually solves anything:
the same difficulty in moving from css_id found to stable css remains.
But we already have rcu_read_lock() around the two, so it's easily fixed
if __mem_cgroup_free() just uses kfree_rcu() to free mem_cgroup.
However, a big struct mem_cgroup is allocated with vzalloc() instead of
kzalloc(), and we're not allowed to vfree() at interrupt time: there
doesn't appear to be a general vfree_rcu() to help with this, so roll
our own using schedule_work(). The compiler decently removes
vfree_work() and vfree_rcu() when the config doesn't need them.
Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ying Han <yinghan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>