Chen-Yu Tsai [Fri, 17 Jan 2014 13:24:40 +0000 (21:24 +0800)]
net: stmmac: Enable stmmac main clock when probing hardware
The stmmac driver does not enable the main clock during the probe phase.
If the clock was not enabled by the boot loader or was disabled by the
kernel, hardware features and the MDIO bus would not be probed properly.
Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Thu, 16 Jan 2014 22:32:13 +0000 (01:32 +0300)]
DT: net: davinci_emac: "phy-handle" property is actually optional
Though described as required, the "phy-handle" property for the DaVinci EMAC
binding is actually optional, as the driver will happily function without it,
assuming 100/FULL link; the property is not specified either in the example
device node, or in the actual EMAC device nodes for DA850 and AM3517 device
trees.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Weilong Chen [Thu, 16 Jan 2014 09:24:31 +0000 (17:24 +0800)]
net: fix "queues" uevent between network namespaces
When I create a new namespace with 'ip netns add net0', or add/remove
new links in a namespace with 'ip link add/delete type veth', rx/tx
queues events can be got in all namespaces. That is because rx/tx queue
ktypes do not have namespace support, and their kobj parents are setted to
NULL. This patch is to fix it.
Reported-by: Libo Chen <chenlibo@huawei.com> Signed-off-by: Libo Chen <chenlibo@huawei.com> Signed-off-by: Weilong Chen <chenweilong@huawei.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 15 Jan 2014 23:49:30 +0000 (15:49 -0800)]
net_sched: act: remove capab from struct tc_action_ops
It is not actually implemented.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Mon, 20 Jan 2014 03:25:13 +0000 (11:25 +0800)]
net: document accel_priv parameter for __dev_queue_xmit()
To silent "make htmldocs" warning.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Sacren [Sun, 19 Jan 2014 22:24:53 +0000 (15:24 -0700)]
sctp: fix missing SCTP mailing list address update
The commit 91705c61b5202 ("net: sctp: trivial: update mailing list
address") updated almost all the SCTP mailing list address from
"lksctp-developers@lists.sourceforge.net"
to
"linux-sctp@vger.kernel.org"
except for the one in include/linux/sctp.h file. Fix this way trivial
one so that all is updated.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6_link_dev_addr sorts newly added addresses by scope in
ifp->addr_list. Smaller scope addresses are added to the tail of the
list. Use this fact to iterate in reverse over addr_list and break out
as soon as a higher scoped one showes up, so we can spare some cycles
on machines with lot's of addresses.
The ordering of the addresses is not relevant and we are more likely to
get the eui64 generated address with this change anyway.
Suggested-by: Brian Haley <brian.haley@hp.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Fengguang Wu [Sun, 19 Jan 2014 19:37:12 +0000 (11:37 -0800)]
qlcnic: fix sparse warnings
Previous patch changed prototypes, but forgot functions.
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams
We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.
This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.
ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.
I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.
Reported-by: Gert Doering <gert@space.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Florent Fourcot [Fri, 17 Jan 2014 16:15:05 +0000 (17:15 +0100)]
ipv6: add flowlabel_consistency sysctl
With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.
Changelog of V3:
* rename ip6_flowlabel_consistency to flowlabel_consistency
* use net_info_ratelimited()
* checkpatch cleanups
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Florent Fourcot [Fri, 17 Jan 2014 16:15:04 +0000 (17:15 +0100)]
ipv6: add a flag to get the flow label used remotly
This information is already available via IPV6_FLOWINFO
of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label
information. But it is probably logical and easier for users to add this
here, and to control both sent/received flow label values with the
IPV6_FLOWLABEL_MGR option.
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Florent Fourcot [Fri, 17 Jan 2014 16:15:03 +0000 (17:15 +0100)]
ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET
With this option, the socket will reply with the flow label value read
on received packets.
The goal is to have a connection with the same flow label in both
direction of the communication.
Changelog of V4:
* Do not erase the flow label on the listening socket. Use pktopts to
store the received value
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Hurrle [Fri, 17 Jan 2014 21:53:15 +0000 (22:53 +0100)]
net: add build-time checks for msg->msg_name size
This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
handler msg_name and msg_namelen logic").
DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.
Signed-off-by: Steffen Hurrle <steffen@hurrle.net> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Sekletar [Fri, 17 Jan 2014 16:09:45 +0000 (17:09 +0100)]
net: introduce SO_BPF_EXTENSIONS
For user space packet capturing libraries such as libpcap, there's
currently only one way to check which BPF extensions are supported
by the kernel, that is, commit aa1113d9f85d ("net: filter: return
-EINVAL if BPF_S_ANC* operation is not supported"). For querying all
extensions at once this might be rather inconvenient.
Therefore, this patch introduces a new option which can be used as
an argument for getsockopt(), and allows one to obtain information
about which BPF extensions are supported by the current kernel.
As David Miller suggests, we do not need to define any bits right
now and status quo can just return 0 in order to state that this
versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
additions to BPF extensions need to add their bits to the
bpf_tell_extensions() function, as documented in the comment.
Signed-off-by: Michal Sekletar <msekleta@redhat.com> Cc: David Miller <davem@davemloft.net> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.
Minor overlapping changes in bnx2x driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) The value choosen for the new SO_MAX_PACING_RATE socket option on
parisc was very poorly choosen, let's fix it while we still can.
From Eric Dumazet.
2) Our generic reciprocal divide was found to handle some edge cases
incorrectly, part of this is encoded into the BPF as deep as the JIT
engines themselves. Just use a real divide throughout for now.
From Eric Dumazet.
3) Because the initial lookup is lockless, the TCP metrics engine can
end up creating two entries for the same lookup key. Fix this by
doing a second lookup under the lock before we actually create the
new entry. From Christoph Paasch.
4) Fix scatter-gather list init in usbnet driver, from Bjørn Mork.
5) Fix unintended 32-bit truncation in cxgb4 driver's bit shifting.
From Dan Carpenter.
6) Netlink socket dumping uses the wrong socket state for timewait
sockets. Fix from Neal Cardwell.
7) Fix netlink memory leak in ieee802154_add_iface(), from Christian
Engelmayer.
8) Multicast forwarding in ipv4 can overflow the per-rule reference
counts, causing all multicast traffic to cease. Fix from Hannes
Frederic Sowa.
9) via-rhine needs to stop all TX queues when it resets the device,
from Richard Weinberger.
10) Fix RDS per-cpu accesses broken by the this_cpu_* conversions. From
Gerald Schaefer.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
s390/bpf,jit: fix 32 bit divisions, use unsigned divide instructions
parisc: fix SO_MAX_PACING_RATE typo
ipv6: simplify detection of first operational link-local address on interface
tcp: metrics: Avoid duplicate entries with the same destination-IP
net: rds: fix per-cpu helper usage
e1000e: Fix compilation warning when !CONFIG_PM_SLEEP
bpf: do not use reciprocal divide
be2net: add dma_mapping_error() check for dma_map_page()
bnx2x: Don't release PCI bars on shutdown
net,via-rhine: Fix tx_timeout handling
batman-adv: fix batman-adv header overhead calculation
qlge: Fix vlan netdev features.
net: avoid reference counter overflows on fib_rules in multicast forwarding
dm9601: add USB IDs for new dm96xx variants
MAINTAINERS: add virtio-dev ML for virtio
ieee802154: Fix memory leak in ieee802154_add_iface()
net: usbnet: fix SG initialisation
inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets
cxgb4: silence shift wrapping static checker warning
David S. Miller [Sat, 18 Jan 2014 03:16:02 +0000 (19:16 -0800)]
Merge branch 'ixgbevf'
Aaron Brown says:
====================
Intel Wired LAN Driver Updates
This series contains updates from Emil to ixgbevf.
He cleans up the code by removing the adapter structure as a
parameter from multiple functions in favor of using the ixgbevf_ring
structure and moves hot-path specific statistic int the ring
structure for anticipated performance gains.
He also removes the Tx/Rx counters for checksum offload and adds
counters for tx_restart_queue and tx_timeout_count.
Next he makes it so that the first tx_buffer structure acts as a
central storage location for most the skb info we are about to
transmit, then takes advantage of the dma buffer always being
present in the first descriptor and mapped as single allowing a
call to dma_unmap_single which alleviates the need to check for
DMA mapping in ixgbevf_clean_tx_irq().
Finally he merges the ixgbevf_tx_map call and the ixgbevf_tx_queue
call into a single function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Sat, 18 Jan 2014 02:30:05 +0000 (18:30 -0800)]
ixgbevf: merge ixgbevf_tx_map and ixgbevf_tx_queue into a single function
This change merges the ixgbevf_tx_map call and the ixgbevf_tx_queue call
into a single function. In order to make room for this setting of cmd_type
and olinfo flags is done in separate functions.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Sat, 18 Jan 2014 02:30:04 +0000 (18:30 -0800)]
ixgbevf: redo dma mapping using the tx buffer info
This patch takes advantage of the dma buffer always being present in the
first descriptor and mapped as single. As such we can call dma_unmap_single
and don't need to check for DMA mapping in ixgbevf_clean_tx_irq().
In addition this patch makes use of the DMA API.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Sat, 18 Jan 2014 02:30:03 +0000 (18:30 -0800)]
ixgbevf: make the first tx_buffer a repository for most of the skb info
This change makes it so that the first tx_buffer structure acts as a
central storage location for most of the info about the skb we are about
to transmit.
In addition this patch makes tx_flags part of the ixgbevf_tx_buffer struct.
This allows us to use the flags directly from the stucture and as result
removes the tx_flags parameter from some functions. Also as a cleanup
mapped_as_page is folded into tx_flags and some unused flags were removed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Sat, 18 Jan 2014 02:30:02 +0000 (18:30 -0800)]
ixgbevf: add tx counters
This patch adds counters for tx_restart_queue and tx_timeout_count.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Sat, 18 Jan 2014 02:30:01 +0000 (18:30 -0800)]
ixgbevf: remove counters for Tx/Rx checksum offload
This patch removes the Tx/Rx counters for checksum offload.
The Tx counter was never updated and the Rx counter is of limited use.
This is in effort to clean up the counters and make them consistent
with the counters shown by ixgbe.
Also this patch removes some members of the adapter structure that were
never used and shuffles others to reduce number of holes.
before:
/* size: 1568, cachelines: 25, members: 48 */
/* sum members: 1519, holes: 10, sum holes: 43 */
/* padding: 6 */
/* last cacheline: 32 bytes */
after:
/* size: 1480, cachelines: 24, members: 43 */
/* sum members: 1479, holes: 1, sum holes: 1 */
/* last cacheline: 8 bytes */
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Sat, 18 Jan 2014 02:30:00 +0000 (18:30 -0800)]
ixgbevf: move ring specific stats into ring specific structure
This patch moves hot-path specific statistics into the ring structure.
This allows us to drop the adapter structure in some functions and should
help with performance.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Sat, 18 Jan 2014 02:29:59 +0000 (18:29 -0800)]
ixgbevf: make use of the dev pointer in the ixgbevf_ring struct
This patch cleans up the code by removing the adapter structure as
parameter from multiple functions. The adapter structure was previously
being used to access the dev pointer, but this can also be done via the
ixgbevf_ring structure. This way we can drop the adapter as parameter from
these functions.
This patch also includes small cleanups in some error code paths.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 18 Jan 2014 03:14:05 +0000 (19:14 -0800)]
Merge branch 'i40e'
Aaron Brown says:
====================
Intel Wired LAN Driver Updates
This series contains updates to i40e.
Neerav implements DCB and DCBNL support and adds DCB options
to Kconfig. DCB is disabled by default.
Anjali refactors flow control director to fix inconsistencies
that were preventing clean unloads of the driver, move the
queues for handling flow director error into their own hardware
VSI and implement a corrected version of the basic ethtool add
ntuple rule.
Jesse provides fixes for a compiler warning, firmware workaround,
white space fixes and renames some defines.
Shannon reworks the device ID #defines to follow the
DEV_ID_ convention followed by our other drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Warren [Fri, 17 Jan 2014 19:29:24 +0000 (12:29 -0700)]
Bluetooth: remove direct compilation of 6lowpan_iphc.c
It's now built as a separate utility module, and enabling BT selects
that module in Kconfig. This fixes:
net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): multiple definition of `__ksymtab_lowpan_process_data'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): first defined here
net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): multiple definition of `__ksymtab_lowpan_header_compress'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): first defined here
net/ieee802154/built-in.o: In function `lowpan_header_compress':
net/ieee802154/6lowpan_iphc.c:606: multiple definition of `lowpan_header_compress'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:606: first defined here
net/ieee802154/built-in.o: In function `lowpan_process_data':
net/ieee802154/6lowpan_iphc.c:344: multiple definition of `lowpan_process_data'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:344: first defined here
make[1]: *** [net/built-in.o] Error 1
(this change probably simply wasn't "git add"d to a53d34c3465b)
Fixes: a53d34c3465b ("net: move 6lowpan compression code to separate module") Fixes: 18722c247023 ("Bluetooth: Enable 6LoWPAN support for BT LE devices") Signed-off-by: Stephen Warren <swarren@nvidia.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Fri, 17 Jan 2014 23:36:39 +0000 (15:36 -0800)]
i40e: Fix device ID define names to align to standard
Rework the device ID #defines to follow the _DEV_ID convention
already established in the other Intel drivers.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neerav Parikh [Fri, 17 Jan 2014 23:36:38 +0000 (15:36 -0800)]
i40e: add DCB option to Kconfig
Allow compiling DCB related files if I40E_DCB option
is supported in the kernel configuration.
DCB is disabled by default.
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Tested-By: Jack Morgan<jack.morgan@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neerav Parikh [Fri, 17 Jan 2014 23:36:37 +0000 (15:36 -0800)]
i40e: add DCB and DCBNL support
This patch adds capability to configure DCB on i40e network
interfaces using Intel XL710 adapter firmware APIs.
By default all VSIs are only enabled for the default traffic
class enabled by firmware for any given PF. The driver would
query the firmware for the traffic classes that are enabled for
the port and reconfigure the LAN VSI to match to the port traffic
class settings. All other VSIs are only enabled for the default
traffic class settings for now.
The driver registers and listens to firmware events that may
require change in the DCB settings. It may reconfigure the VSI
settings based on these events.
This patch exposes IEEE DCBNL interfaces for the i40e driver to
allow any application to query the DCB settings on the adapter.
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-By: Jack Morgan<jack.morgan@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neerav Parikh [Fri, 17 Jan 2014 23:36:36 +0000 (15:36 -0800)]
i40e: implement DCB support infastructure
Intel XL710 series of adapters support QoS as per the
IEEE 802.1 DCB (Data Center Bridging) standard.
This is supported in conjuction with:
- Enhanced Transmission Selection (ETS) - IEEE 802.1Qaz
- Priority Flow Control (PFC) - IEEE 802.1Qbb
- DCB eXchange Protocol (DCBX) - IEEE 802.1Qaz
On Intel XL710 adapters DCBX is performed by the adapter
firmware. The firmware runs DCBX in willing mode and configures
the port as per the DCB settings recommended by it's link
partner.
By default in absence of any DCBX; firmware would configure the
port with a single traffic class and all of the port bandwith
will be allocated to that traffic class.
This patch adds functions and calls to support querying and
configuring DCB using firmware APIs.
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-By: Jack Morgan<jack.morgan@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The i40e hardware was generating some inconsistent results
when using current programming methods. This refactor
fixes the inconsistencies that were preventing clean
unloads of the driver, and moves the queues for handling
flow director errors into their own hardware VSI.
This patch also implements a corrected version of the
basic ethtool add ntuple rule, which will disable
the driver's automatic flow programming. A future patch
adds remove/replay/list support for ntuple.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Fri, 17 Jan 2014 23:36:33 +0000 (15:36 -0800)]
i40e: whitespace fixes
Fix more whitespace issues, including making some locals declared
in a nicer order.
Also update Copyright string printed when the driver loads.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Fri, 17 Jan 2014 23:36:32 +0000 (15:36 -0800)]
i40e: Change firmware workaround
Remove a workaround that is no longer necessary and implement
a better understanding of what firmware is returning in the MSI-X
vector count. This makes it so that the driver ends up with the
right amount of queues when using all available MSI-X vectors.
Change-ID: I34e60cc71dcfb1b5412f37df956fedcc49ade187 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Fri, 17 Jan 2014 23:36:31 +0000 (15:36 -0800)]
i40e: fix compile warning on checksum_local
Compile testing with higher warning levels found this complaint:
i40e_nvm.c: warning: 'checksum_local' may be used uninitialized in
this function
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Warren [Fri, 17 Jan 2014 19:29:24 +0000 (12:29 -0700)]
Bluetooth: remove direct compilation of 6lowpan_iphc.c
It's now built as a separate utility module, and enabling BT selects
that module in Kconfig. This fixes:
net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): multiple definition of `__ksymtab_lowpan_process_data'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): first defined here
net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): multiple definition of `__ksymtab_lowpan_header_compress'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): first defined here
net/ieee802154/built-in.o: In function `lowpan_header_compress':
net/ieee802154/6lowpan_iphc.c:606: multiple definition of `lowpan_header_compress'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:606: first defined here
net/ieee802154/built-in.o: In function `lowpan_process_data':
net/ieee802154/6lowpan_iphc.c:344: multiple definition of `lowpan_process_data'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:344: first defined here
make[1]: *** [net/built-in.o] Error 1
(this change probably simply wasn't "git add"d to a53d34c3465b)
Fixes: a53d34c3465b ("net: move 6lowpan compression code to separate module") Fixes: 18722c247023 ("Bluetooth: Enable 6LoWPAN support for BT LE devices") Signed-off-by: Stephen Warren <swarren@nvidia.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Dalton [Fri, 17 Jan 2014 09:27:08 +0000 (01:27 -0800)]
virtio-net: fix build error when CONFIG_AVERAGE is not enabled
Commit ab7db91705e9 ("virtio-net: auto-tune mergeable rx buffer size for
improved performance") introduced a virtio-net dependency on EWMA.
The inclusion of EWMA is controlled by CONFIG_AVERAGE. Fix build error
when CONFIG_AVERAGE is not enabled by adding select AVERAGE to
virtio-net's Kconfig entry.
Build failure reported using config make ARCH=s390 defconfig.
Signed-off-by: Michael Dalton <mwdalton@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 18 Jan 2014 02:57:35 +0000 (18:57 -0800)]
Merge branch 'ixgbe'
Aaron Brown says:
====================
Intel Wired LAN Driver Updates
This series contains an updates to ixgbe and ixgbevf.
Jacob add braces around some ixgbe_qv_lock_* calls lto better adhere
to Kernel style guidelines. Don bumps the versions on ixgbe and
ixgbevf to match internal driver functionality better.
====================
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Fri, 17 Jan 2014 09:21:38 +0000 (01:21 -0800)]
ixgbevf: bump version
Bump the version number to better match functionality provided with out of
tree driver of the same version.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Fri, 17 Jan 2014 09:21:37 +0000 (01:21 -0800)]
ixgbe: bump version number
Bump the version number to better match functionality provided with out
of tree driver of the same version.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Fri, 17 Jan 2014 09:21:36 +0000 (01:21 -0800)]
ixgbe: add braces around else condition in ixgbe_qv_lock_* calls
This patch adds braces around the ixgbe_qv_lock_* calls which previously only
had braces around the if portion. Kernel style guidelines for this require
parenthesis around all conditions if they are required around one. In addition
the comment while not illegal C syntax makes the code look wrong at a cursory
glance. This patch corrects the style and adds braces so that the full if-else
block is uniform.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiko Carstens [Fri, 17 Jan 2014 08:37:15 +0000 (09:37 +0100)]
s390/bpf,jit: fix 32 bit divisions, use unsigned divide instructions
The s390 bpf jit compiler emits the signed divide instructions "dr" and "d"
for unsigned divisions.
This can cause problems: the dividend will be zero extended to a 64 bit value
and the divisor is the 32 bit signed value as specified A or X accumulator,
even though A and X are supposed to be treated as unsigned values.
The divide instrunctions will generate an exception if the result cannot be
expressed with a 32 bit signed value.
This is the case if e.g. the dividend is 0xffffffff and the divisor either 1
or also 0xffffffff (signed: -1).
To avoid all these issues simply use unsigned divide instructions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 18 Jan 2014 02:52:58 +0000 (18:52 -0800)]
Merge branch 'bonding_slave_sysfs'
Scott Feldman says:
====================
bonding: add slave netlink and sysfs support
v2:
- Address review comment from Ding (and Veacesiav): handle kobj cleanup
if sysfs_create_file() fails when adding slave attribute nodes.
v1:
The following series adds bonding slave netlink and sysfs interfaces.
Slave interfaces get a new IFLA_SLAVE set of netlink attributes, along
with RTM_NEWLINK notification when slave's active status changes. The
sysfs interface adds read-only nodes for slave attributes under a /slave
dir, simliar to how bond interfaces get a /bonding dir for bonding
attributes.
====================
Reviewed-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If link is IFF_SLAVE, extend link dev netlink attributes to include
slave attributes with new IFLA_SLAVE nest. Add netlink notification
(RTM_NEWLINK) when slave status changes from backup to active, or
visa-versa.
Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE
attributes. Currently only used by bonding driver, but could be
used by other aggregating devices with slaves.
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Apparently loopback device is being registered first and thus we
receive an event notification when vxlan_net is not ready. Hence,
when we call net_generic() and request vxlan_net_id, we seem to
access garbage at that point in time. In setup_net() where we set
up a newly allocated network namespace, we traverse the list of
pernet ops ...
... and loopback_net_init() is invoked first here, so in the middle
of setup_net() we get this notification in vxlan. As currently we
only care about devices that unregister, move access through
net_generic() there. Fix is based on Cong Wang's proposal, but
only changes what is needed here. It sucks a bit as we only work
around the actual cure: right now it seems the only way to check if
a netns actually finished traversing all init ops would be to check
if it's part of net_namespace_list. But that I find quite expensive
each time we go through a notifier callback. Anyway, did a couple
of tests and it seems good for now.
Fixes: acaf4e70997f ("net: vxlan: when lower dev unregisters remove vxlan dev as well") Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 18 Jan 2014 02:38:13 +0000 (18:38 -0800)]
Merge branch 'ixgbe'
Aaron Brown says:
====================
Intel Wired LAN Driver Updates
This series contains updates to ixgbe Ethan Zhao. The first one replaces
the magic number "63" with a macro, IXGBE_MAX_VFS_DRV_LIMIT, the second
moves the call to set driver_max_VFS to before SRIOV is enabled.
The code of these patches match the v3 (1/2) and v2 (2/2) versions sent
to the e1000-devel and netdev mailing lists. The intermediate versions
(v4, v5) are from sorting out style issues, mostly tabs to spaces and
split lines probably introduced via mailer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
So should set driver_max_VFs with pci_sriov_set_totalvfs() before
enable VFs with ixgbe_enable_sriov().
V2: revised for net-next tree.
Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ethan.zhao [Fri, 17 Jan 2014 03:41:04 +0000 (19:41 -0800)]
ixgbe: define IXGBE_MAX_VFS_DRV_LIMIT macro and cleanup const 63
Because ixgbe driver limit the max number of VF
functions could be enabled to 63, so define one macro IXGBE_MAX_VFS_DRV_LIMIT
and cleanup the const 63 in code.
v3: revised for net-next tree.
Signed-off-by: Ethan Zhao <ethan.kernel@gmail.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Jan 2014 00:41:19 +0000 (16:41 -0800)]
ipv4: fix a dst leak in tunnels
This patch :
1) Remove a dst leak if DST_NOCACHE was set on dst
Fix this by holding a reference only if dst really cached.
2) Remove a lockdep warning in __tunnel_dst_set()
This was reported by Cong Wang.
3) Remove usage of a spinlock where xchg() is enough
4) Remove some spurious inline keywords.
Let compiler decide for us.
Fixes: 7d442fab0a67 ("ipv4: Cache dst in tunnels") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <cwang@twopensource.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 17 Jan 2014 00:22:28 +0000 (09:22 +0900)]
sh_eth: Add support for r7s72100
The r7s72100 SoC includes a fast ethernet controller.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Fri, 17 Jan 2014 00:22:27 +0000 (09:22 +0900)]
sh_eth: Use bool as return type of sh_eth_is_gether()
Return a boolean from sh_eth_is_gether() and refactor it as a one-liner.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Flavio Leitner [Thu, 16 Jan 2014 21:27:59 +0000 (19:27 -0200)]
ipv6: send Change Status Report after DAD is completed
The RFC 3810 defines two type of messages for multicast
listeners. The "Current State Report" message, as the name
implies, refreshes the *current* state to the querier.
Since the querier sends Query messages periodically, there
is no need to retransmit the report.
On the other hand, any change should be reported immediately
using "State Change Report" messages. Since it's an event
triggered by a change and that it can be affected by packet
loss, the rfc states it should be retransmitted [RobVar] times
to make sure routers will receive timely.
Currently, we are sending "Current State Reports" after
DAD is completed. Before that, we send messages using
unspecified address (::) which should be silently discarded
by routers.
This patch changes to send "State Change Report" messages
after DAD is completed fixing the behavior to be RFC compliant
and also to pass TAHI IPv6 testsuite.
Signed-off-by: Flavio Leitner <fbl@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 16 Jan 2014 19:15:12 +0000 (11:15 -0800)]
parisc: fix SO_MAX_PACING_RATE typo
SO_MAX_PACING_RATE definition on parisc got a typo.
Its not too late to fix it, before 3.13 is official.
Fixes: 62748f32d501 ("net: introduce SO_MAX_PACING_RATE") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: simplify detection of first operational link-local address on interface
In commit 1ec047eb4751e3 ("ipv6: introduce per-interface counter for
dad-completed ipv6 addresses") I build the detection of the first
operational link-local address much to complex. Additionally this code
now has a race condition.
Replace it with a much simpler variant, which just scans the address
list when duplicate address detection completes, to check if this is
the first valid link local address and send RS and MLD reports then.
Fixes: 1ec047eb4751e3 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses") Reported-by: Jiri Pirko <jiri@resnulli.us> Cc: Flavio Leitner <fbl@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Flavio Leitner <fbl@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Paasch [Thu, 16 Jan 2014 19:01:21 +0000 (20:01 +0100)]
tcp: metrics: Avoid duplicate entries with the same destination-IP
Because the tcp-metrics is an RCU-list, it may be that two
soft-interrupts are inside __tcp_get_metrics() for the same
destination-IP at the same time. If this destination-IP is not yet part of
the tcp-metrics, both soft-interrupts will end up in tcpm_new and create
a new entry for this IP.
So, we will have two tcp-metrics with the same destination-IP in the list.
This patch checks twice __tcp_get_metrics(). First without holding the
lock, then while holding the lock. The second one is there to confirm
that the entry has not been added by another soft-irq while waiting for
the spin-lock.
Fixes: 51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerald Schaefer [Thu, 16 Jan 2014 15:54:48 +0000 (16:54 +0100)]
net: rds: fix per-cpu helper usage
commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu
handling for rds. chpfirst is the result of __this_cpu_read(), so it is
an absolute pointer and not __percpu. Therefore, __this_cpu_write()
should not operate on chpfirst, but rather on cache->percpu->first, just
like __this_cpu_read() did before.
David S. Miller [Sat, 18 Jan 2014 01:30:55 +0000 (17:30 -0800)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
Please pull this batch of updates for the 3.14 stream!
For the mac80211 bits, Johannes says:
"This time I have uAPSD fixes since I was working on that, hwsim
improvements to make dynamic radios possible for the test suite, the
evidently long-overdue channel_change_time removal and a few other small
collected fix and improvements."
For the iwlwifi bits, Emmanuel says:
"Besides a few trivial patches, I have an important workaround for a HW
issue that has kept me busy for a long time. Along with it, a fix that
prevents an error from being printed.
Eyal fixes our behavior against SISO APs and Ilan fixes an issue with
multiple interface scenarios.
Eliad fixes an error path in our init flow.
We also have a few 'static analyzers' fix."
For the NFC bits, Samuel says:
"It includes:
* A new NFC driver for Marvell's 8897, and a few NCI fixes and
improvements needed to support this chipset.
* An LLCP fix for how we were setting the default MIU on a p2p link. If
there is no explicit MIU extension announced at connection time, we
must use the default one and not the one announced at LLCP link
establishement time.
* A pn544 EEPROM config update. Some of the currently EEPROM configured
values are overwriting the firmware ones while other should not be set
by the driver itself.
* Some NFC digital stack fixes and improvements. Asynchronous functions
are better documented, RF technologies and CRC functions are set upon
PSL_REQ reception, and a few minor bugs are fixed.
* Minor and miscelaneous pn533, mei_phy and port100 fixes."
For the ath bits, Kalle says:
"Janusz added Kconfig option for DFS. The DFS code was there already, but
after fixes to mac80211 we can now enable it.
Bartosz added a runtime firmware feature flag to disable P2P. Our 10.1
firmware branch doesn't support P2P and ath10k can now disable that. He
also added a limit for how many clients can connect to ath10k AP.
Michal fixed WEP shared authentication, in case someone still uses it.
And I added firmware debug log to help the firmware engineers."
Along with that is a small batch of ath9k updates and a few other bits
here and there.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 18 Jan 2014 01:29:36 +0000 (17:29 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace fixes from Eric Biederman:
"This is a set of 3 regression fixes.
This fixes /proc/mounts when using "ip netns add <netns>" to display
the actual mount point.
This fixes a regression in clone that broke lxc-attach.
This fixes a regression in the permission checks for mounting /proc
that made proc unmountable if binfmt_misc was in use. Oops.
My apologies for sending this pull request so late. Al Viro gave
interesting review comments about the d_path fix that I wanted to
address in detail before I sent this pull request. Unfortunately a
bad round of colds kept from addressing that in detail until today.
The executive summary of the review was:
Al: Is patching d_path really sufficient?
The prepend_path, d_path, d_absolute_path, and __d_path family of
functions is a really mess.
Me: Yes, patching d_path is really sufficient. Yes, the code is mess.
No it is not appropriate to rewrite all of d_path for a regression
that has existed for entirely too long already, when a two line
change will do"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
vfs: Fix a regression in mounting proc
fork: Allow CLONE_PARENT after setns(CLONE_NEWPID)
vfs: In d_path don't call d_dname on a mount point
The virtio-net device currently uses aligned MTU-sized mergeable receive
packet buffers. Network throughput for workloads with large average
packet size can be improved by posting larger receive packet buffers.
However, due to SKB truesize effects, posting large (e.g, PAGE_SIZE)
buffers reduces the throughput of workloads that do not benefit from GRO
and have no large inbound packets.
This patchset introduces virtio-net mergeable buffer size auto-tuning,
with buffer sizes ranging from aligned MTU-size to PAGE_SIZE. Packet
buffer size is chosen based on a per-receive queue EWMA of incoming
packet size.
To unify mergeable receive buffer memory allocation and improve
SKB frag coalescing, all mergeable buffer memory allocation is
migrated to per-receive queue page frag allocators.
The per-receive queue mergeable packet buffer size is exported via
sysfs, and the network device sysfs layer has been extended to add
support for device-specific per-receive queue sysfs attribute groups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add initial support for per-rx queue sysfs attributes to virtio-net. If
mergeable packet buffers are enabled, adds a read-only mergeable packet
buffer size sysfs attribute for each RX queue.
Suggested-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Dalton [Fri, 17 Jan 2014 06:23:29 +0000 (22:23 -0800)]
lib: Ensure EWMA does not store wrong intermediate values
To ensure ewma_read() without a lock returns a valid but possibly
out of date average, modify ewma_add() by using ACCESS_ONCE to prevent
intermediate wrong values from being written to avg->internal.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Dalton [Fri, 17 Jan 2014 06:23:28 +0000 (22:23 -0800)]
net-sysfs: add support for device-specific rx queue sysfs attributes
Extend existing support for netdevice receive queue sysfs attributes to
permit a device-specific attribute group. Initial use case for this
support will be to allow the virtio-net device to export per-receive
queue mergeable receive buffer size.
Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Dalton [Fri, 17 Jan 2014 06:23:27 +0000 (22:23 -0800)]
virtio-net: auto-tune mergeable rx buffer size for improved performance
Commit 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page frag
allocators") changed the mergeable receive buffer size from PAGE_SIZE to
MTU-size, introducing a single-stream regression for benchmarks with large
average packet size. There is no single optimal buffer size for all
workloads. For workloads with packet size <= MTU bytes, MTU + virtio-net
header-sized buffers are preferred as larger buffers reduce the TCP window
due to SKB truesize. However, single-stream workloads with large average
packet sizes have higher throughput if larger (e.g., PAGE_SIZE) buffers
are used.
This commit auto-tunes the mergeable receiver buffer packet size by
choosing the packet buffer size based on an EWMA of the recent packet
sizes for the receive queue. Packet buffer sizes range from MTU_SIZE +
virtio-net header len to PAGE_SIZE. This improves throughput for
large packet workloads, as any workload with average packet size >=
PAGE_SIZE will use PAGE_SIZE buffers.
These optimizations interact positively with recent commit ba275241030c ("virtio-net: coalesce rx frags when possible during rx"),
which coalesces adjacent RX SKB fragments in virtio_net. The coalescing
optimizations benefit buffers of any size.
Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
between two QEMU VMs on a single physical machine. Each VM has two VCPUs
with all offloads & vhost enabled. All VMs and vhost threads run in a
single 4 CPU cgroup cpuset, using cgroups to ensure that other processes
in the system will not be scheduled on the benchmark CPUs. Trunk includes
SKB rx frag coalescing.
Michael Dalton [Fri, 17 Jan 2014 06:23:26 +0000 (22:23 -0800)]
virtio-net: use per-receive queue page frag alloc for mergeable bufs
The virtio-net driver currently uses netdev_alloc_frag() for GFP_ATOMIC
mergeable rx buffer allocations. This commit migrates virtio-net to use
per-receive queue page frags for GFP_ATOMIC allocation. This change unifies
mergeable rx buffer memory allocation, which now will use skb_refill_frag()
for both atomic and GFP-WAIT buffer allocations.
To address fragmentation concerns, if after buffer allocation there
is too little space left in the page frag to allocate a subsequent
buffer, the remaining space is added to the current allocated buffer
so that the remaining space can be used to store packet data.
Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Dalton [Fri, 17 Jan 2014 06:23:25 +0000 (22:23 -0800)]
net: allow > 0 order atomic page alloc in skb_page_frag_refill
skb_page_frag_refill currently permits only order-0 page allocs
unless GFP_WAIT is used. Change skb_page_frag_refill to attempt
higher-order page allocations whether or not GFP_WAIT is used. If
memory cannot be allocated, the allocator will fall back to
successively smaller page allocs (down to order-0 page allocs).
This change brings skb_page_frag_refill in line with the existing
page allocation strategy employed by netdev_alloc_frag, which attempts
higher-order page allocations whether or not GFP_WAIT is set, falling
back to successively lower-order page allocations on failure. Part
of migration of virtio-net to per-receive queue page frag allocators.
Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Michael Dalton <mwdalton@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Fri, 17 Jan 2014 01:53:20 +0000 (09:53 +0800)]
net_sched: fix error return code in fw_change_attrs()
The error code was not set if change indev fail, so the error
condition wasn't reflected in the return value. Fix to return a
negative error code from this error handling case instead of 0.
Fixes: 2519a602c273 ('net_sched: optimize tcf_match_indev()') Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 17 Jan 2014 03:11:22 +0000 (19:11 -0800)]
Merge branch 'tipc'
Ying Xue says:
====================
tipc: align TIPC behaviours of waiting for events with other stacks
Comparing the current implementations of waiting for events in TIPC
socket layer with other stacks, TIPC's behaviour is very different
because wait_event_interruptible_timeout()/wait_event_interruptible()
are always used by TIPC to wait for events while relevant socket or
port variables are fed to them as their arguments. As socket lock has
to be released temporarily before the two routines of waiting for
events are called, their arguments associated with socket or port
structures are out of socket lock protection. This might cause
serious issues where the process of calling socket syscall such as
sendsmg(), connect(), accept(), and recvmsg(), cannot be waken up
at all even if proper event arrives or improperly be woken up
although the condition of waking up the process is not satisfied
in practice.
Therefore, aligning its behaviours with similar functions implemented
in other stacks, for instance, sk_stream_wait_connect() and
inet_csk_wait_for_connect() etc, can avoid above risks for us.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 17 Jan 2014 01:50:07 +0000 (09:50 +0800)]
tipc: standardize recvmsg routine
Standardize the behaviour of waiting for events in TIPC recvmsg()
so that all variables of socket or port structures are protected
within socket lock, allowing the process of calling recvmsg() to
be woken up at appropriate time.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 17 Jan 2014 01:50:06 +0000 (09:50 +0800)]
tipc: standardize sendmsg routine of connected socket
Standardize the behaviour of waiting for events in TIPC send_packet()
so that all variables of socket or port structures are protected within
socket lock, allowing the process of calling sendmsg() to be woken up
at appropriate time.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 17 Jan 2014 01:50:05 +0000 (09:50 +0800)]
tipc: standardize sendmsg routine of connectionless socket
Comparing the behaviour of how to wait for events in TIPC sendmsg()
with other stacks, the TIPC implementation might be perceived as
different, and sometimes even incorrect. For instance, sk_sleep()
and tport->congested variables associated with socket are exposed
without socket lock protection while wait_event_interruptible_timeout()
accesses them. So standardizing it with similar implementation
in other stacks can help us correct these errors which the process
of calling sendmsg() cannot be woken up event if an expected event
arrive at socket or improperly woken up although the wake condition
doesn't match.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 17 Jan 2014 01:50:04 +0000 (09:50 +0800)]
tipc: standardize accept routine
Comparing the behaviour of how to wait for events in TIPC accept()
with other stacks, the TIPC implementation might be perceived as
different, and sometimes even incorrect. As sk_sleep() and
sk->sk_receive_queue variables associated with socket are not
protected by socket lock, the process of calling accept() may be
woken up improperly or sometimes cannot be woken up at all. After
standardizing it with inet_csk_wait_for_connect routine, we can
get benefits including: avoiding 'thundering herd' phenomenon,
adding a timeout mechanism for accept(), coping with a pending
signal, and having sk_sleep() and sk->sk_receive_queue being
always protected within socket lock scope and so on.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Fri, 17 Jan 2014 01:50:03 +0000 (09:50 +0800)]
tipc: standardize connect routine
Comparing the behaviour of how to wait for events in TIPC connect()
with other stacks, the TIPC implementation might be perceived as
different, and sometimes even incorrect. For instance, as both
sock->state and sk_sleep() are directly fed to
wait_event_interruptible_timeout() as its arguments, and socket lock
has to be released before we call wait_event_interruptible_timeout(),
the two variables associated with socket are exposed out of socket
lock protection, thereby probably getting stale values so that the
process of calling connect() cannot be woken up exactly even if
correct event arrives or it is woken up improperly even if the wake
condition is not satisfied in practice. Therefore, standardizing its
behaviour with sk_stream_wait_connect routine can avoid these risks.
Additionally the implementation of connect routine is simplified as a
whole, allowing it to return correct values in all different cases.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
wangweidong [Thu, 16 Jan 2014 08:25:19 +0000 (16:25 +0800)]
sctp: remove the unnecessary assignment
When go the right path, the status is 0, no need to assign it again.
So just remove the assignment.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 16 Jan 2014 06:45:24 +0000 (14:45 +0800)]
virtio-net: drop rq->max and rq->num
It looks like there's no need for those two fields:
- Unless there's a failure for the first refill try, rq->max should be always
equal to the vring size.
- rq->num is only used to determine the condition that we need to do the refill,
we could check vq->num_free instead.
- rq->num was required to be increased or decreased explicitly after each
get/put which results a bad API.
So this patch removes them both to make the code simpler.
Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lad, Prabhakar [Thu, 16 Jan 2014 06:05:41 +0000 (11:35 +0530)]
net: davinci_mdio: Fix sparse warning
This patch fixes following sparse warning
davinci_mdio.c:85:27: warning: symbol 'default_pdata' was not declared. Should it be static?
Also makes the default_pdata as a constant.
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Thu, 16 Jan 2014 01:04:29 +0000 (02:04 +0100)]
bonding: handle slave's name change with primary_slave logic
Currently, if a slave's name change, we just pass it by. However, if the
slave is a current primary_slave, then we end up with using a slave, whose
name != params.primary, for primary_slave. And vice-versa, if we don't have
a primary_slave but have params.primary set - we will not detected a new
primary_slave.
Fix this by catching the NETDEV_CHANGENAME event and setting primary_slave
accordingly. Also, if the primary_slave was changed, issue a reselection of
the active slave, cause the priorities have changed.
Reported-by: Ding Tianhong <dingtianhong@huawei.com> CC: Ding Tianhong <dingtianhong@huawei.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 15 Jan 2014 23:38:43 +0000 (15:38 -0800)]
net_sched: act: pick a different type for act_xt
In tcf_register_action() we check either ->type or ->kind to see if
there is an existing action registered, but ipt action registers two
actions with same type but different kinds. They should have different
types too.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 15 Jan 2014 23:23:26 +0000 (15:23 -0800)]
net_sched: act: use tcf_hash_release() in net/sched/act_police.c
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 15 Jan 2014 23:18:24 +0000 (15:18 -0800)]
i40e: updates to AdminQ interface
Refinements to cloud support in the Firmware API.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 15 Jan 2014 23:18:23 +0000 (15:18 -0800)]
i40e: check desc pointer before printing
Check that the descriptors were allocated before trying to dump
them to the logfile. While we're there, de-trick-ify the code
so as to be easier to read and not abusing the types and unions.
Change-ID: I22898f4b22cecda3582d4d9e4018da9cd540f177 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 17 Jan 2014 01:16:43 +0000 (17:16 -0800)]
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included change:
- properly compute the batman-adv header overhead. Such
result is later used to initialize the hard_header_len
member of the soft-interface netdev object
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Wed, 15 Jan 2014 23:02:19 +0000 (00:02 +0100)]
team: block mtu change before it happens via NETDEV_PRECHANGEMTU
Now it catches the NETDEV_CHANGEMTU notification, which is signaled after
the actual change happened on the device, and returns NOTIFY_BAD, so that
the change on the device is reverted.
This might be quite costly and messy, so use the new NETDEV_PRECHANGEMTU to
catch the MTU change before the actual change happens and signal that it's
forbidden to do it.
CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Wed, 15 Jan 2014 23:02:18 +0000 (00:02 +0100)]
net: add NETDEV_PRECHANGEMTU to notify before mtu change happens
Currently, if a device changes its mtu, first the change happens (invloving
all the side effects), and after that the NETDEV_CHANGEMTU is sent so that
other devices can catch up with the new mtu. However, if they return
NOTIFY_BAD, then the change is reverted and error returned.
This is a really long and costy operation (sometimes). To fix this, add
NETDEV_PRECHANGEMTU notification which is called prior to any change
actually happening, and if any callee returns NOTIFY_BAD - the change is
aborted. This way we're skipping all the playing with apply/revert the mtu.
CC: "David S. Miller" <davem@davemloft.net> CC: Jiri Pirko <jiri@resnulli.us> CC: Eric Dumazet <edumazet@google.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 17 Jan 2014 00:33:27 +0000 (11:33 +1100)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Revert "arm64: Fix memory shareability attribute for ioremap_wc/cache"
We noticed that it breaks ioremap (and earlyprintk) with 64K page
configuration"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
Revert "arm64: Fix memory shareability attribute for ioremap_wc/cache"
Hugh Dickins [Thu, 16 Jan 2014 23:26:48 +0000 (15:26 -0800)]
percpu_counter: unbreak __percpu_counter_add()
Commit 74e72f894d56 ("lib/percpu_counter.c: fix __percpu_counter_add()")
looked very plausible, but its arithmetic was badly wrong: obvious once
you see the fix, but maddening to get there from the weird tmpfs ENOSPCs
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Ming Lei <tom.leiming@gmail.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Fan Du <fan.du@windriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Florian Fainelli [Wed, 15 Jan 2014 21:04:26 +0000 (13:04 -0800)]
r6040: use ETH_ZLEN instead of MISR for SKB length checking
Ever since this driver was merged the following code was included:
if (skb->len < MISR)
skb->len = MISR;
MISR is defined to 0x3C which is also equivalent to ETH_ZLEN, but use
ETH_ZLEN directly which is exactly what we want to be checking for.
Reported-by: Marc Volovic <marcv@ezchip.com> Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Wed, 15 Jan 2014 21:04:25 +0000 (13:04 -0800)]
r6040: add delays in MDIO read/write polling loops
On newer and faster machines (Vortex X86DX) using the r6040 driver, it
was noticed that the driver was returning an error during probing traced
down to being the MDIO bus probing and the inability to complete a MDIO
read operation in time. It turns out that the MDIO operations on these
faster machines usually complete after ~2140 iterations which is bigger
than 2048 (MAC_DEF_TIMEOUT) and results in spurious timeouts depending
on the system load.
Update r6040_phy_read() and r6040_phy_write() to include a 1
micro second delay in each busy-looping iteration of the loop which is a
much safer operation than incrementing MAC_DEF_TIMEOUT.
Reported-by: Nils Koehler <nils.koehler@ibt-interfaces.de> Reported-by: Daniel Goertzen <daniel.goertzen@gmail.com> Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Wed, 15 Jan 2014 17:30:33 +0000 (17:30 +0000)]
xen-netfront: add support for IPv6 offloads
This patch adds support for IPv6 checksum offload and GSO when those
features are available in the backend.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>