WANG Cong [Thu, 6 May 2010 07:48:51 +0000 (00:48 -0700)]
bonding: make bonding support netpoll
Based on Andy's work, but I modified a lot.
Similar to the patch for bridge, this patch does:
1) implement the 2 methods to support netpoll for bonding;
2) modify netpoll during forwarding packets via bonding;
3) disable netpoll support of bonding when a netpoll-unabled device
is added to bonding;
4) enable netpoll support when all underlying devices support netpoll.
Cc: Andy Gospodarek <gospo@redhat.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Thu, 6 May 2010 07:48:24 +0000 (00:48 -0700)]
bridge: make bridge support netpoll
Based on the previous patch, make bridge support netpoll by:
1) implement the 2 methods to support netpoll for bridge;
2) modify netpoll during forwarding packets via bridge;
3) disable netpoll support of bridge when a netpoll-unabled device
is added to bridge;
4) enable netpoll support when all underlying devices support netpoll.
Cc: David Miller <davem@davemloft.net> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Thu, 6 May 2010 07:47:21 +0000 (00:47 -0700)]
netpoll: add generic support for bridge and bonding devices
This whole patchset is for adding netpoll support to bridge and bonding
devices. I already tested it for bridge, bonding, bridge over bonding,
and bonding over bridge. It looks fine now.
To make bridge and bonding support netpoll, we need to adjust
some netpoll generic code. This patch does the following things:
1) introduce two new priv_flags for struct net_device:
IFF_IN_NETPOLL which identifies we are processing a netpoll;
IFF_DISABLE_NETPOLL is used to disable netpoll support for a device
at run-time;
2) introduce one new method for netdev_ops:
->ndo_netpoll_cleanup() is used to clean up netpoll when a device is
removed.
3) introduce netpoll_poll_dev() which takes a struct net_device * parameter;
export netpoll_send_skb() and netpoll_poll_dev() which will be used later;
4) hide a pointer to struct netpoll in struct netpoll_info, ditto.
5) introduce ->real_dev for struct netpoll.
6) introduce a new status NETDEV_BONDING_DESLAE, which is used to disable
netconsole before releasing a slave, to avoid deadlocks.
Cc: David Miller <davem@davemloft.net> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Rose [Wed, 5 May 2010 19:57:49 +0000 (19:57 +0000)]
ixgbevf: Cache PF ack bit in interrupt
When the PF acks a message from the VF the VF gets an interrupt. It
must cache the ack bit so that polling SW will not miss the ack. Also
avoid reading the message buffer on acks because that also will clear
the ack bit.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Rose [Wed, 5 May 2010 19:57:30 +0000 (19:57 +0000)]
ixgbe: Streamline MC filter setup for VFs
The driver was calling the set Rx mode function for every multicast
filter set by the VF. When starting many VMs where each might have
multiple VLAN interfaces this would result in the function being
called hundreds or even thousands of times. This is unnecessary
for the case of the imperfect filters used in the MTA and has been
streamlined to be more efficient.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Rose [Wed, 5 May 2010 19:57:10 +0000 (19:57 +0000)]
ixgbe: Remove unneeded register writes in VF VLAN setup
The driver is unnecessarily writing values to VLAN control registers.
These writes already done elsewhere and are superfluous here.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nick Nunley [Tue, 4 May 2010 21:58:07 +0000 (21:58 +0000)]
igb: reduce cache misses on tx cleanup
This patch reduces the number of skb cache misses in the
clean_tx_irq path, and results in an overall increase
in tx packet throughput.
Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sebastien Jan [Wed, 5 May 2010 08:45:54 +0000 (08:45 +0000)]
ks8851: companion eeprom access through ethtool
Accessing ks8851 companion eeprom permits modifying the ks8851 stored
MAC address.
Example how to change the MAC address using ethtool, to set the
01:23:45:67:89:AB MAC address:
$ echo "0:AB8976452301" | xxd -r > mac.bin
$ sudo ethtool -E eth0 magic 0x8870 offset 2 < mac.bin
Signed-off-by: Sebastien Jan <s-jan@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 5 May 2010 18:15:21 +0000 (18:15 +0000)]
forcedeth: Account for consumed budget in napi poll
Repeated calls to nv_rx_process in napi poll routine do not take
portion of budget that has been consumed in previous calls. Fix by
subtracting the number of packets processed.
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Wed, 5 May 2010 13:03:11 +0000 (13:03 +0000)]
netdev: octeon_mgmt: Free TX skbufs in a timely manner.
We also reduce the high water mark to 1 so skbufs are not stranded for
long periods of time. Since we are cleaning after each packet, no
need to do it in the transmit path.
Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Rose [Tue, 4 May 2010 22:12:06 +0000 (22:12 +0000)]
ixgbe: Add support for VF MAC and VLAN configuration
Add support for the "ip link set" and "ip link show" commands that allow
configuration of the virtual functions' MAC and port VLAN via user space
command line.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Greg Rose [Tue, 4 May 2010 22:11:46 +0000 (22:11 +0000)]
ixgbe: Add boolean parameter to ixgbe_set_vmolr
Add a boolean parameter to ixgbe-set_vmolr so that the caller can
specify whether the pool should accept untagged packets. Required
for a follow on patch to enable administrative configuration of port
VLAN for virtual functions.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
e1000/e1000e: implement a simple interrupt moderation
Back before e1000-7.3.20, the e1000 driver had a simple algorithm that
managed interrupt moderation. The driver was updated in 7.3.20 to
have the new "adaptive" interrupt moderation but we have customer
requests to redeploy the old way as an option. This patch adds the
old functionality back. The new functionality can be enabled via
module parameter or at runtime via ethtool.
Module parameter: (InterruptThrottleRate=4) to use this new
moderation method.
Ethtool method: ethtool -C ethX rx-usecs 4
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 4 May 2010 22:25:42 +0000 (22:25 +0000)]
e1000e: increase rx fifo size to 36K on 82574 and 82583
This change increases the RX fifo size to 36K for standard frames and
decreases the TX fifo size to 4K. The reason for this change is that on
slower systems the RX is much more likely to backfill and need space than
the TX is. As long as the TX fifo is twice the size of the MTU we should
have more than enough TX fifo.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 5 May 2010 14:03:32 +0000 (14:03 +0000)]
e1000e: Save irq into netdev structure
Set net->devirq to pdev->irq. This should be consistent with other
drivers.
Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 5 May 2010 14:03:11 +0000 (14:03 +0000)]
e1000e: Remove unnessary log message
Remove e_info message printed whenever TSO is enabled or disabled.
This is not very useful and just clutters dmesg.
Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 5 May 2010 14:02:49 +0000 (14:02 +0000)]
e1000e: reduce writes of RX producer ptr
Reduce number of writes to RX producer pointer. When alloc'ing RX
buffers, only write the RX producer pointer once every
E1000_RX_BUFFER_WRITE (16) buffers created.
Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 5 May 2010 14:02:27 +0000 (14:02 +0000)]
e1000e: save skb counts in TX to avoid cache misses
In e1000_tx_map, precompute number of segements and bytecounts which
are derived from fields in skb; these are stored in buffer_info. When
cleaning tx in e1000_clean_tx_irq use the values in the associated
buffer_info for statistics counting, this eliminates cache misses
on skb fields.
Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 5 May 2010 08:07:37 +0000 (01:07 -0700)]
net: __alloc_skb() speedup
With following patch I can reach maximum rate of my pktgen+udpsink
simulator :
- 'old' machine : dual quad core E5450 @3.00GHz
- 64 UDP rx flows (only differ by destination port)
- RPS enabled, NIC interrupts serviced on cpu0
- rps dispatched on 7 other cores. (~130.000 IPI per second)
- SLAB allocator (faster than SLUB in this workload)
- tg3 NIC
- 1.080.000 pps without a single drop at NIC level.
Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch
first sk_buff cache line, the second to prefetch the shinfo part.
Also using one memset() to initialize all skb_shared_info fields instead
of one by one to reduce number of instructions, using long word moves.
All skb_shared_info fields before 'dataref' are cleared in
__alloc_skb().
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Inspection of the Ralink vendor driver shows that the TX_BAND_CFG register
and BBP register 3 are about HT40- indication, not about HT40+ indication.
Inverse the meaning of these fields in the code.
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reinette Chatre [Mon, 3 May 2010 17:55:07 +0000 (10:55 -0700)]
iwlwifi: recalculate average tpt if not current
We currently have this check as a BUG_ON, which is being hit by people.
Previously it was an error with a recalculation if not current, return that
code.
... the portion adding the BUG_ON is reverted since we are encountering the error
and BUG_ON was created with assumption that error is not encountered.
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
David S. Miller [Tue, 4 May 2010 06:33:05 +0000 (23:33 -0700)]
forcedeth: Kill NAPI config options.
All distributions enable it, therefore no significant body of users
are even testing the driver with it disabled. And making NAPI
configurable is heavily discouraged anyways.
I left the MSI-X interrupt enabling thing in an "#if 0" block
so hopefully someone can debug that and it can get re-enabled.
Probably it was just one of the NVIDIA chipset MSI erratas that
we work handle these days in the PCI quirks (see drivers/pci/quirks.c
and stuff like nvenet_msi_disable()).
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 4 May 2010 06:18:14 +0000 (23:18 -0700)]
net: skb_free_datagram_locked() fix
Commit 4b0b72f7dd617b ( net: speedup udp receive path )
introduced a bug in skb_free_datagram_locked().
We should not skb_orphan() skb if we dont have the guarantee we are the
last skb user, this might happen with MSG_PEEK concurrent users.
To keep socket locked for the smallest period of time, we split
consume_skb() logic, inlined in skb_free_datagram_locked()
Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 3 May 2010 10:50:14 +0000 (10:50 +0000)]
net: rcu fixes
Add hlist_for_each_entry_rcu_bh() and
hlist_for_each_entry_continue_rcu_bh() macros, and use them in
ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps
warnings.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Korsgaard [Mon, 3 May 2010 10:01:26 +0000 (10:01 +0000)]
dm9601: fix phy/eeprom write routine
Use correct bit positions in DM_SHARED_CTRL register for writes.
Michael Planes recently encountered a 'KY-RS9600 USB-LAN converter', which
came with a driver CD containing a Linux driver. This driver turns out to
be a copy of dm9601.c with symbols renamed and my copyright stripped.
That aside, it did contain 1 functional change in dm_write_shared_word(),
and after checking the datasheet the original value was indeed wrong
(read versus write bits).
On Michaels HW, this change bumps receive speed from ~30KB/s to ~900KB/s.
On other devices the difference is less spectacular, but still significant
(~30%).
Reported-by: Michael Planes <michael.planes@free.fr> CC: stable@kernel.org Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Arlott [Mon, 3 May 2010 10:20:27 +0000 (10:20 +0000)]
ppp_generic: handle non-linear skbs when passing them to pppd
Frequently when using PPPoE with an interface MTU greater than 1500,
the skb is likely to be non-linear. If the skb needs to be passed to
pppd then the skb data must be read correctly.
The previous commit fixes an issue with accidentally sending skbs
to pppd based on an invalid read of the protocol type. When that
error occurred pppd was reading invalid skb data too.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
wireless: rt2x00: rt2800usb: be in sync with latest windows drivers.
0x07d1,0x3c17 D-Link Wireless N 150 USB Adapter DWA-125
0x1b75,0x3071 Ovislink Airlive WN-301USB
0x1d4d,0x0011 Pegatron Ralink RT3072 802.11b/g/n Wireless Lan USB Device
0x083a,0xf511 Arcadyan 802.11 USB Wireless LAN Card
0x13d3,0x3322 AzureWave 802.11 n/g/b USB Wireless LAN Card
Signed-off-by: Xose Vazquez Perez <xose.vazquez@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Mon, 3 May 2010 06:49:48 +0000 (08:49 +0200)]
mac80211: improve IBSS scanning
When IBSS is fixed to a frequency, it can still
scan to try to find the right BSSID. This makes
sense if the BSSID isn't also fixed, but it need
not scan all channels -- just one is sufficient.
Make it do that by moving the scan setup code to
ieee80211_request_internal_scan() and include
a channel variable setting.
Note that this can be further improved to start
the IBSS right away if both frequency and BSSID
are fixed.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Sat, 1 May 2010 16:53:51 +0000 (18:53 +0200)]
mac80211: allow controlling aggregation manually
This allows enabling TX and disabling both TX and
RX aggregation sessions manually in debugfs. It is
very useful for debugging session initiation and
teardown problems since with this you don't have
to force a lot of traffic to get aggregation and
thus have less data to analyse.
Also, to debug mac80211 code itself, make hwsim
"support" aggregation sessions. It will still just
transfer the frame, but go through the setup and
teardown handshakes.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
David Kilroy [Sat, 1 May 2010 13:05:41 +0000 (14:05 +0100)]
orinoco: add orinoco_usb driver
This driver uses the core orinoco modules for the bulk of
the functionality. The low level hermes routines (for local bus
cards) are replaced, the driver supplies its own ndo_xmit_start
function, and locking is done with the _bh variant.
Some recent functionality is not available to the USB cards yet
(firmware loading and WPA).
Out-of-tree driver originally written by Manuel Estrada Sainz.
Thanks to Mark Davis for supplying hardware to test the updates.
Signed-off-by: David Kilroy <kilroyd@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
David Kilroy [Sat, 1 May 2010 13:05:40 +0000 (14:05 +0100)]
orinoco: encapsulate driver locking
Local bus and USB drivers will need to do locking differently.
The original orinoco_usb patches had a boolean variable controlling
whether spin_lock_bh was used, or irq based locking. This version
provides wrappers for the lock functions and the drivers specify the
functions pointers needed.
This will introduce a performance penalty, but I'm not expecting it to
be noticable.
Signed-off-by: David Kilroy <kilroyd@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
David Kilroy [Sat, 1 May 2010 13:05:39 +0000 (14:05 +0100)]
orinoco: allow driver to specify netdev_ops
Allow the main drivers to specify a custom version of the net_device_ops
structure. This is required by orinoco_usb to supply a separate transmit
function.
Export existing net_device_ops callbacks so that the drivers can reuse
some of the existing code.
Signed-off-by: David Kilroy <kilroyd@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Fri, 30 Apr 2010 11:48:36 +0000 (13:48 +0200)]
mac80211: fix ieee80211_find_sta[_by_hw]
Both of these functions can currently return
a station pointer that, to the driver, is
invalid (in IBSS mode only) because adding
the station failed. Check for that, and also
make ieee80211_find_sta() properly use the
per interface station search.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
"iwl: cleanup: remove unneeded error handling" missed the one in
if_sdio_debugfs_init().
I don't think we even need to check -ENODEV ourselves because if
DEBUG_FS is not compiled in, all the debugfs utility functions will
become no-op.
Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Dan Carpenter <error27@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
virtio added mergeable buffers mode where 2 bytes of extra info is put
after vnet header but before actual data (tun does not need this data).
In hindsight, it would have been better to add the new info *before* the
packet: as it is, users need a lot of tricky code to skip the extra 2
bytes in the middle of the iovec, and in fact applications seem to get
it wrong, and only work with specific iovec layout. The fact we might
need to split iovec also means we might in theory overflow iovec max
size.
This patch adds a simpler way for applications to handle this,
and future proofs the interface against further extensions,
by making the size of the virtio net header configurable
from userspace. As a result, tun driver will simply
skip the extra 2 bytes on both input and output.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David S. Miller <davem@davemloft.net>
Changli Gao [Sun, 2 May 2010 05:42:16 +0000 (05:42 +0000)]
net: fix softnet_stat
Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context
without any protection. And enqueue_to_backlog should update the netdev_rx_stat
of the target CPU.
This patch renames softnet_data.total to softnet_data.processed: the number of
packets processed in uppper levels(IP stacks).
softnet_stat data is moved into softnet_data.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
include/linux/netdevice.h | 17 +++++++----------
net/core/dev.c | 26 ++++++++++++--------------
net/sched/sch_generic.c | 2 +-
3 files changed, 20 insertions(+), 25 deletions(-) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elina Pasheva [Wed, 28 Apr 2010 13:28:24 +0000 (13:28 +0000)]
net/usb: initiate sync sequence in sierra_net.c driver
The following patch adds the initiation of the sync sequence to
"sierra_net_bind()". If this step is omitted, the modem will never sync up
with the host and it will not be possible to establish a data connection.
Signed-off-by: Elina Pasheva <epasheva@sierrawireless.com> Signed-off-by: Rory Filer <rfiler@sierrawireless.com> Tested-by: Elina Pasheva <epasheva@sierrawireless.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 29 Apr 2010 11:01:49 +0000 (11:01 +0000)]
net: sock_def_readable() and friends RCU conversion
sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.
RCU conversion is pretty much needed :
1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).
[Future patch will add a list anchor for wakeup coalescing]
2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().
3) Respect RCU grace period when freeing a "struct socket_wq"
4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"
5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep
6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.
7) Change all sk_has_sleeper() callers to :
- Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
- Use wq_has_sleeper() to eventually wakeup tasks.
- Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)
8) sock_wake_async() is modified to use rcu protection as well.
9) Exceptions :
macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.
Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Sat, 1 May 2010 02:41:10 +0000 (22:41 -0400)]
sctp: Tag messages that can be Nagle delayed at creation.
When we create the sctp_datamsg and fragment the user data,
we know exactly if we are sending full segments or not and
how they might be bundled. During this time, we can mark
messages a Nagle capable or not. This makes the check at
transmit time much simpler.
Vlad Yasevich [Sat, 1 May 2010 02:41:10 +0000 (22:41 -0400)]
sctp: Optimize computation of highest new tsn in SACK.
Right now, if the highest tsn in the SACK doesn't change, we'll
end up scanning the transmitted lists on the transports twice:
once for locating the highest _new_ tsn, and once for actually
tagging chunks as acked. This is a waste, since we can record
the highest _new_ tsn at the same time as tagging chunks. Long
ago this was not possible because we would try to mark chunks
as missing at the same time as tagging them acked and this approach
didn't work. Now that the two steps are separate, we can re-use
the old approach.
Vlad Yasevich [Sat, 1 May 2010 02:41:10 +0000 (22:41 -0400)]
sctp: correctly mark missing chunks in fast recovery
According to RFC 4960 Section 7.2.4:
If an endpoint is in Fast
Recovery and a SACK arrives that advances the Cumulative TSN Ack
Point, the miss indications are incremented for all TSNs reported
missing in the SACK.
Vlad Yasevich [Sat, 1 May 2010 02:41:10 +0000 (22:41 -0400)]
sctp: rwnd_press should be cumulative
rwnd_press tracks the pressure on the recieve window. Every
timer the receive buffer overlows, we truncate the receive
window and then grow it back. However, if we don't track
the cumulative presser, it's possible to reach a situation
when receive buffer is empty, but rwnd stays truncated.
Vlad Yasevich [Sat, 1 May 2010 02:41:10 +0000 (22:41 -0400)]
sctp: update transport initializations
Right now, sctp transports are not fully initialized and when
adding any new fields, they have to be explicitely initialized.
This is prone to mistakes. So we switch to calling kzalloc()
which makes things much simpler.
Vlad Yasevich [Sat, 1 May 2010 02:41:09 +0000 (22:41 -0400)]
sctp: Do not force T3 timer on fast retransmissions.
We don't need to force the T3 timer any more and it's
actually wrong to do as it causes too long of a delay.
The timer will be started if one is not running, but if
one is running, we leave it alone.
Vlad Yasevich [Sat, 1 May 2010 02:41:09 +0000 (22:41 -0400)]
sctp: remove 'resent' bit from the chunk
The 'resent' bit is used to make sure that we don't update
rto estimate based on retransmitted chunks. However, we already
have the 'rto_pending' bit that we test when need to update rto,
so 'resent' bit is just extra. Additionally, we currently have
a bug in that we always set a 'resent' bit and thus rto estimate
is only updated by Heartbeats.
added code to make sure that we do not select unconfirmed paths
for data transmission. This caused a problem when there are only
2 paths, 1 unconfirmed and 1 unreachable. In that case, the next
retransmit path returned is NULL and that causes a kernel crash.
The solution is to only change retransmit paths if we found one to use.
Reported-by: Frank Schuster <frank.schuster01@web.de>
Signed-off-b: Vlad Yasevich <vladislav.yasevich@hp.com>
Wei Yongjun [Sat, 1 May 2010 02:41:09 +0000 (22:41 -0400)]
sctp: implement sctp association probing module
This patch implement sctp association probing module, the module
will be called sctp_probe.
This module allows for capturing the changes to SCTP association
state in response to incoming packets. It is used for debugging
SCTP congestion control algorithms.
Vlad Yasevich [Sat, 1 May 2010 02:39:26 +0000 (22:39 -0400)]
sctp: Do no select unconfirmed transports for retransmissions
An unconfirmed transport is one that we have not been
able to reach since the beginning. There is no point in
trying to retrasnmit data on those transports. Also, the
specification forbids it due to security issues.
Reported-by: Frank Schuster <frank.schuster01@web.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Wei Yongjun [Sat, 1 May 2010 02:38:53 +0000 (22:38 -0400)]
sctp: fix to retranmit at least one DATA chunk
While doing retranmit, if control chunk exists, such as
FORWARD TSN chunk, and the DATA chunk can not be bundled with
this control chunk because of PMTU limit, no DATA chunk
will be retranmitted in the current implementation. This
patch makes sure to retranmit at least one DATA chunk in this case.
Elina Pasheva [Sat, 1 May 2010 02:05:28 +0000 (19:05 -0700)]
net/usb: remove default in Kconfig for sierra_net driver
The following patch removes the default from the Kconfig entry for sierra_net
driver as recommended.
Signed-off-by: Elina Pasheva <epasheva@sierrawireless.com> Signed-off-by: Rory Filer <rfiler@sierrawireless.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sat, 1 May 2010 01:42:44 +0000 (21:42 -0400)]
sctp: missing set src and dest port while lookup output route
While lookup the output route, we do not set the src and dest
port. This will cause we got a wrong route if we had set the
outbund transport to IPsec with src or dst port.
Wei Yongjun [Sat, 1 May 2010 01:42:44 +0000 (21:42 -0400)]
sctp: discard ABORT chunk with zero verification tag in COOKIE-WAIT state
In current implementation if ABORT chunk is received with T flag is set
and zero verification tag in COOKIE-WAIT state, the ABORT chunk will be
always accepted. This is because in COOKIE-WAIT state, the endpoint does
not know the peer's verification tag, and it's zero in the endpoint.
Wei Yongjun [Sat, 1 May 2010 01:42:43 +0000 (21:42 -0400)]
sctp: assure at least one T3-rtx timer is running if a FORWARD TSN is sent
PR-SCTP extension section 3.5 Sender Side Implementation of PR-SCTP:
C5) If a FORWARD TSN is sent, the sender MUST assure that at
least one T3-rtx timer is running.
So this patch fix to assure at least one T3-rtx timer is running
if a FORWARD TSN is or will to sent.
Vlad Yasevich [Sat, 1 May 2010 01:42:43 +0000 (21:42 -0400)]
sctp: send SHUTDOWN-ACK chunk back to the source.
SHUTDOWN-ACK is alaways sent to the primary path at the first time,
but should better transmit SHUTDOWN-ACK chunk to the same destination
transport address from which it received the SHUTDOWN chunk.
Based on the work from Wei Yongjun <yjwei@cn.fujitsu.com>.
Dan Carpenter [Fri, 30 Apr 2010 23:42:08 +0000 (16:42 -0700)]
ipv6: cleanup: remove unneeded null check
We dereference "sk" unconditionally elsewhere in the function.
This was left over from: b30bd282 "ip6_xmit: remove unnecessary NULL
ptr check". According to that commit message, "the sk argument to
ip6_xmit is never NULL nowadays since the skb->priority assigment
expects a valid socket."
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>