Add netlink directives and ndo entry to trust VF user.
This controls the special permission of VF user.
The administrator will dedicatedly trust VF user to use some features
which impacts security and/or performance.
The administrator never turn it on unless VF user is fully trusted.
CC: Sy Jong Choi <sy.jong.choi@intel.com> Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jean Sacren [Tue, 13 Oct 2015 07:06:32 +0000 (01:06 -0600)]
i40e: fix unconditional execution of cpu_to_le16()
The commit 3092e5e4cc79 ("i40e: add little endian conversion for
checksum") fixed the checksum bug on big-endian architecture.
But we should not execute cpu_to_le16() unconditionally. Thus, put
cpu_to_le16() under certain condition.
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jean Sacren [Tue, 13 Oct 2015 07:06:31 +0000 (01:06 -0600)]
i40e: clean up local variable initialization
In both i40e_calc_nvm_checksum() and i40e_update_nvm_checksum(), the
local variables designated by 'ret_code' are overwritten immediately. As
such, they should merely be declared.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jean Sacren [Tue, 13 Oct 2015 07:06:30 +0000 (01:06 -0600)]
i40evf: clean up local variable initialization
In i40evf_msix_aq(), the first two lines of rd32() are mainly to clear
the registers. If we initialize 'val' at this point, it will be
overwritten immediately. We shall simply discard the return value here.
When we initialize 'val', we might as well include the mask in one step.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If a call to enable SR-IOV in the kernel failed, we need to disable
I40E_FLAG_VEB_MODE_ENABLED, so that bridge mode could fall back to
VEPA, which is a default.
Change-ID: I12b6f776769506db85b29bea94b9c88d0b5ee65e Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Carolyn Wyborny [Thu, 1 Oct 2015 18:37:39 +0000 (14:37 -0400)]
i40e: Fix an incorrect OEM version string
This patch fixes a problem where the driver output of the OEM
version string varied from the other tools. The mask value
and the order of operations were incorrect, per the original
change request. Without this patch, the version string will
appear incorrect from the driver.
Change-ID: Ie1ca6485284b4ce3b57e5a99b18b7641617c7ef7 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Helin Zhang [Thu, 1 Oct 2015 18:37:38 +0000 (14:37 -0400)]
i40e: fix inconsistent statuses after a PF reset
This patch fixes a problem of possibly getting inconsistent flow control
statuses after a PF reset. Requested_mode was being set with a default
value during probing, but the initial HW state could be different from
this mode.
Change-ID: I772bf07b78616e87086418d4bd87954b66fa17cd Signed-off-by: Helin Zhang <helin.zhang@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 1 Oct 2015 18:37:37 +0000 (14:37 -0400)]
i40evf: use correct struct for list manipulation
Not sure how this compiles at all. Use the correct struct for
manipulating the VLAN filter list. Without this, the VLAN filter
list doesn't get processed correctly, and VLAN filters will not
be re-enabled after any kind of reset.
Change-ID: Iceff2dc089f303058fb71ecb08419eed471e0e90 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix i40e_is_vsi_uplink_mode_veb to check if bridge is actually
in VEB mode before allowing LB in the add VSI routine, instead of
unconditionally returning VEB bridge mode.
Change-ID: I162397b1bdd02367735fe9baaeb51465be2a3ce9 Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e/i40evf: Add a workaround to drop all flow control frames
This patch adds a workaround to drop any flow control frames from being
transmitted from any VSI. FW can still send flow control frames if flow
control is enabled.
With this patch in place a malicious VF cannot send flow control or PFC
packets out on the wire.
Change-ID: I4303b24e98b93066d2767fec24dfe78be591c277 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Paolo Abeni [Tue, 20 Oct 2015 08:28:45 +0000 (10:28 +0200)]
ipv4: implement support for NOPREFIXROUTE ifa flag for ipv4 address
Currently adding a new ipv4 address always cause the creation of the
related network route, with default metric. When a host has multiple
interfaces on the same network, multiple routes with the same metric
are created.
If the userspace wants to set specific metric on each routes, i.e.
giving better metric to ethernet links in respect to Wi-Fi ones,
the network routes must be deleted and recreated, which is error-prone.
This patch implements the support for IFA_F_NOPREFIXROUTE for ipv4
address. When an address is added with such flag set, no associated
network route is created, no network route is deleted when
said IP is gone and it's up to the user space manage such route.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 22 Oct 2015 20:01:17 +0000 (16:01 -0400)]
bnxt_en: New Broadcom ethernet driver.
Broadcom ethernet driver for the new family of NetXtreme-C/E
ethernet devices.
v5:
- Removed empty blank lines at end of files (noted by David Miller).
- Moved busy poll helper functions to bnxt.h to at least make the
.c file look less cluttered with #ifdef (noted by Stephen Hemminger).
v4:
- Broke up 2 long message strings with "\n" (suggested by John Linville)
- Constify an array of strings (suggested by Stephen Hemminger)
- Improve bnxt_vf_pciid() (suggested by Stephen Hemminger)
- Use PCI_VDEVICE() to populate pci_device_id table for more compact
source.
v3:
- Fixed 2 more sparse warnings.
- Removed some unused structures in .h files.
v2:
- Fixed all kbuild test robot reported warnings.
- Fixed many of the checkpatch.pl errors and warnings.
- Fixed the Kconfig description (noted by Dmitry Kravkov).
Acked-by: Eddie Wai <eddie.wai@broadcom.com> Acked-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Prashant Sreedharan <prashant@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Oct 2015 14:39:07 +0000 (07:39 -0700)]
Merge branch 'dsa-port_fdb_dump'
Vivien Didelot says:
====================
net: dsa: implement port_fdb_dump in drivers
Not all switch chips provide a Get Next kind of operation to dump FDB entries.
It is preferred to let the driver handle the dump operation the way it works
best for the chip. Thus, drop port_fdb_getnext and implement the port_fdb_dump
operation in DSA, which pushes the switchdev FDB dump callback down to the
drivers. mv88e6xxx is the only driver affected and is updated accordingly.
v3 -> v4: fix rejects on latest net-next
v2 -> v3: opencode switchdev_obj_dump_cb_t to avoid multiple typedef;
use ether_addr_copy in fdb_dump
v1 -> v2: fix a few "return err" instead of "goto unlock" in mv88e6xxx.c
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 22 Oct 2015 13:34:40 +0000 (09:34 -0400)]
net: dsa: mv88e6xxx: write MAC outside of ATU Get Next code
There is no need to write the MAC address before every Get Next
operation, since ATU MAC registers are not cleared between calls.
Move the _mv88e6xxx_atu_mac_write call outside of _mv88e6xxx_atu_getnext
so future code could call ATU Get Next multiple times and save a few
register access.
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Thu, 22 Oct 2015 13:34:39 +0000 (09:34 -0400)]
net: dsa: mv88e6xxx: write VID outside of VTU Get Next code
There is no need to write the VLAN ID before every Get Next operation,
since the VTU VID register is not cleared between calls.
Move the VID write call in a _mv88e6xxx_vtu_vid_write function outside
of _mv88e6xxx_vtu_getnext so future code could call VTU Get Next
multiple times and save a few register accesses.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Oct 2015 14:28:41 +0000 (07:28 -0700)]
Merge tag 'mac80211-next-for-davem-2015-10-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Here's another set of patches for the current cycle:
* I merged net-next back to avoid a conflict with the
* cfg80211 scheduled scan API extensions
* preparations for better scan result timestamping
* regulatory cleanups
* mac80211 statistics cleanups
* a few other small cleanups and fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
yankejian [Wed, 21 Oct 2015 09:57:44 +0000 (17:57 +0800)]
net: hisilicon: deals with the sub ctrl by syscon
the global Soc configuration is treated by syscon, and sub ctrl bus is
Soc bus. it has to be treated by syscon.
Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Oct 2015 14:04:08 +0000 (07:04 -0700)]
Merge branch 'cxgb4-trivial-fixes'
Hariprasad Shenai says:
====================
Trivial fixes for cxgb4 driver
This patch series updates driver description for next gen. adapters, updates
firmware info., returns error for setup_rss error case, restores L1
configuration in case of FW rejects new config, updates and aligns ethtool
get stats settings, etc
This patch series has been created against net-next tree and includes
patches on cxgb4 and cxgb4vf driver.
We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Wed, 21 Oct 2015 06:00:10 +0000 (23:00 -0700)]
openvswitch: Use dev_queue_xmit for vport send.
With use of lwtunnel, we can directly call dev_queue_xmit()
rather than calling netdev vport send operation.
Following change make tunnel vport code bit cleaner.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Wed, 21 Oct 2015 03:47:46 +0000 (20:47 -0700)]
openvswitch: Fix incorrect type use.
Patch fixes following sparse warning.
net/openvswitch/flow_netlink.c:583:30: warning: incorrect type in assignment (different base types)
net/openvswitch/flow_netlink.c:583:30: expected restricted __be16 [usertype] ipv4
net/openvswitch/flow_netlink.c:583:30: got int
Fixes: 6b26ba3a7d ("openvswitch: netlink attributes for IPv6 tunneling") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 Oct 2015 13:42:23 +0000 (06:42 -0700)]
Merge branch 'bpf-perf'
Alexei Starovoitov says:
====================
bpf_perf_event_output helper
Over the last year there were multiple attempts to let eBPF programs
output data into perf events by He Kuang and Wangnan.
The last one was:
https://lkml.org/lkml/2015/7/20/736
It was almost perfect with exception that all bpf programs would sent
data into one global perf_event.
This patch set takes different approach by letting user space
open independent PERF_COUNT_SW_BPF_OUTPUT events, so that program
output won't collide.
Wangnan is working on corresponding perf patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Performance test and example of bpf_perf_event_output().
kprobe is attached to sys_write() and trivial bpf program streams
pid+cookie into userspace via PERF_COUNT_SW_BPF_OUTPUT event.
Usage:
$ sudo ./bld_x64/samples/bpf/trace_output
recv 2968913 events per sec
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This helper is used to send raw data from eBPF program into
special PERF_TYPE_SOFTWARE/PERF_COUNT_SW_BPF_OUTPUT perf_event.
User space needs to perf_event_open() it (either for one or all cpus) and
store FD into perf_event_array (similar to bpf_perf_event_read() helper)
before eBPF program can send data into it.
Today the programs triggered by kprobe collect the data and either store
it into the maps or print it via bpf_trace_printk() where latter is the debug
facility and not suitable to stream the data. This new helper replaces
such bpf_trace_printk() usage and allows programs to have dedicated
channel into user space for post-processing of the raw data collected.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Brenden Blanco [Tue, 20 Oct 2015 23:47:33 +0000 (16:47 -0700)]
ipvlan: read direct ifindex instead of iflink
In the ipv4 outbound path of an ipvlan device in l3 mode, the ifindex is
being grabbed from dev_get_iflink. This works for the physical device
case, since as the documentation of that function notes: "Physical
interfaces have the same 'ifindex' and 'iflink' values.". However, if
the master device is a veth, and the pairs are in separate net
namespaces, the route lookup will fail with -ENODEV due to outer veth
pair being in a separate namespace from the ipvlan master/routing
namespace.
ns0 | ns1 | ns2
veth0a--|--veth0b--|--ipvl0
In ipvlan_process_v4_outbound(), a packet sent from ipvl0 in the above
configuration will pass fl.flowi4_oif == veth0a to
ip_route_output_flow(), but *net == ns1.
Notice also that ipv6 processing is not using iflink. Since there is a
discrepancy in usage, fixup both v4 and v6 case to use local dev
variable.
Tested this with l3 ipvlan on top of veth, as well as with single
physical interface in the top namespace.
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Reviewed-by: Jiri Benc <jbenc@redhat.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 20 Oct 2015 20:17:40 +0000 (13:17 -0700)]
tcp: fastopen: limit max_qlen
Allowing an application to set whatever limit for
the list of recently RST fastopen sessions [1] is not wise,
as it open ways to deplete kernel memory.
Cap the user provided limit by somaxconn sysctl,
like listen() backlog.
Eric Dumazet [Tue, 20 Oct 2015 03:17:59 +0000 (20:17 -0700)]
net: dummy: add more features
While testing my SIT/GRO patch using netfilter TEE module and a dummy
device, I found some features were missing :
TSO IPv6, UFO, and encapsulated traffic.
ethtool -k dummy0 now gives :
...
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp6-segmentation: on
udp-fragmentation-offload: on
...
tx-gre-segmentation: on
tx-ipip-segmentation: on
tx-sit-segmentation: on
tx-udp_tnl-segmentation: on
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arad, Ronen [Mon, 19 Oct 2015 16:23:28 +0000 (09:23 -0700)]
netlink: Rightsize IFLA_AF_SPEC size calculation
if_nlmsg_size() overestimates the minimum allocation size of netlink
dump request (when called from rtnl_calcit()) or the size of the
message (when called from rtnl_getlink()). This is because
ext_filter_mask is not supported by rtnl_link_get_af_size() and
rtnl_link_get_size().
The over-estimation is significant when at least one netdev has many
VLANs configured (8 bytes for each configured VLAN).
This patch-set "rightsizes" the protocol specific attribute size
calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
and adding this a argument to get_link_af_size op in rtnl_af_ops.
Bridge module already used filtering aware sizing for notifications.
br_get_link_af_size_filtered() is consistent with the modified
get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
br_get_link_af_size() becomes unused and thus removed.
Signed-off-by: Ronen Arad <ronen.arad@intel.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Mon, 19 Oct 2015 12:37:25 +0000 (15:37 +0300)]
Adding switchdev ageing notification on port bridged
Configure ageing time to the HW for newly bridged device
CC: Scott Feldman <sfeldma@gmail.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Elad Raz <eladr@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 21 Oct 2015 14:00:59 +0000 (07:00 -0700)]
Merge branch 'tcp-rack'
Yuchung Cheng says:
====================
RACK loss detection
RACK (Recent ACK) loss recovery uses the notion of time instead of
packet sequence (FACK) or counts (dupthresh).
It's inspired by the FACK heuristic in tcp_mark_lost_retrans(): when a
limited transmit (new data packet) is sacked in recovery, then any
retransmission sent before that newly sacked packet was sent must have
been lost, since at least one round trip time has elapsed.
But that existing heuristic from tcp_mark_lost_retrans()
has several limitations:
1) it can't detect tail drops since it depends on limited transmit
2) it's disabled upon reordering (assumes no reordering)
3) it's only enabled in fast recovery but not timeout recovery
RACK addresses these limitations with a core idea: an unacknowledged
packet P1 is deemed lost if a packet P2 that was sent later is is
s/acked, since at least one round trip has passed.
Since RACK cares about the time sequence instead of the data sequence
of packets, it can detect tail drops when a later retransmission is
s/acked, while FACK or dupthresh can't. For reordering RACK uses a
dynamically adjusted reordering window ("reo_wnd") to reduce false
positives on ever (small) degree of reordering, similar to the delayed
Early Retransmit.
In the current patch set RACK is only a supplemental loss detection
and does not trigger fast recovery. However we are developing RACK
to replace or consolidate FACK/dupthresh, early retransmit, and
thin-dupack. These heuristics all implicitly bear the time notion.
For example, the delayed Early Retransmit is simply applying RACK
to trigger the fast recovery with small inflight.
RACK requires measuring the minimum RTT. Tracking a global min is less
robust due to traffic engineering pathing changes. Therefore it uses a
windowed filter by Kathleen Nichols. The min RTT can also be useful
for various other purposes like congestion control or stat monitoring.
This patch has been used on Google servers for well over 1 year. RACK
has also been implemented in the QUIC protocol. We are submitting an
IETF draft as well.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Sat, 17 Oct 2015 04:57:47 +0000 (21:57 -0700)]
tcp: use RACK to detect losses
This patch implements the second half of RACK that uses the the most
recent transmit time among all delivered packets to detect losses.
tcp_rack_mark_lost() is called upon receiving a dubious ACK.
It then checks if an not-yet-sacked packet was sent at least
"reo_wnd" prior to the sent time of the most recently delivered.
If so the packet is deemed lost.
The "reo_wnd" reordering window starts with 1msec for fast loss
detection and changes to min-RTT/4 when reordering is observed.
We found 1msec accommodates well on tiny degree of reordering
(<3 pkts) on faster links. We use min-RTT instead of SRTT because
reordering is more of a path property but SRTT can be inflated by
self-inflicated congestion. The factor of 4 is borrowed from the
delayed early retransmit and seems to work reasonably well.
Since RACK is still experimental, it is now used as a supplemental
loss detection on top of existing algorithms. It is only effective
after the fast recovery starts or after the timeout occurs. The
fast recovery is still triggered by FACK and/or dupack threshold
instead of RACK.
We introduce a new sysctl net.ipv4.tcp_recovery for future
experiments of loss recoveries. For now RACK can be disabled by
setting it to 0.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Sat, 17 Oct 2015 04:57:46 +0000 (21:57 -0700)]
tcp: track the packet timings in RACK
This patch is the first half of the RACK loss recovery.
RACK loss recovery uses the notion of time instead
of packet sequence (FACK) or counts (dupthresh). It's inspired by the
previous FACK heuristic in tcp_mark_lost_retrans(): when a limited
transmit (new data packet) is sacked, then current retransmitted
sequence below the newly sacked sequence must been lost,
since at least one round trip time has elapsed.
But it has several limitations:
1) can't detect tail drops since it depends on limited transmit
2) is disabled upon reordering (assumes no reordering)
3) only enabled in fast recovery ut not timeout recovery
RACK (Recently ACK) addresses these limitations with the notion
of time instead: a packet P1 is lost if a later packet P2 is s/acked,
as at least one round trip has passed.
Since RACK cares about the time sequence instead of the data sequence
of packets, it can detect tail drops when later retransmission is
s/acked while FACK or dupthresh can't. For reordering RACK uses a
dynamically adjusted reordering window ("reo_wnd") to reduce false
positives on ever (small) degree of reordering.
This patch implements tcp_advanced_rack() which tracks the
most recent transmission time among the packets that have been
delivered (ACKed or SACKed) in tp->rack.mstamp. This timestamp
is the key to determine which packet has been lost.
Consider an example that the sender sends six packets:
T1: P1 (lost)
T2: P2
T3: P3
T4: P4
T100: sack of P2. rack.mstamp = T2
T101: retransmit P1
T102: sack of P2,P3,P4. rack.mstamp = T4
T205: ACK of P4 since the hole is repaired. rack.mstamp = T101
We need to be careful about spurious retransmission because it may
falsely advance tp->rack.mstamp by an RTT or an RTO, causing RACK
to falsely mark all packets lost, just like a spurious timeout.
We identify spurious retransmission by the ACK's TS echo value.
If TS option is not applicable but the retransmission is acknowledged
less than min-RTT ago, it is likely to be spurious. We refrain from
using the transmission time of these spurious retransmissions.
The second half is implemented in the next patch that marks packet
lost using RACK timestamp.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Sat, 17 Oct 2015 04:57:45 +0000 (21:57 -0700)]
tcp: skb_mstamp_after helper
a helper to prepare the first main RACK patch.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Sat, 17 Oct 2015 04:57:44 +0000 (21:57 -0700)]
tcp: add tcp_tsopt_ecr_before helper
a helper to prepare the main RACK patch
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Sat, 17 Oct 2015 04:57:43 +0000 (21:57 -0700)]
tcp: remove tcp_mark_lost_retrans()
Remove the existing lost retransmit detection because RACK subsumes
it completely. This also stops the overloading the ack_seq field of
the skb control block.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Sat, 17 Oct 2015 04:57:42 +0000 (21:57 -0700)]
tcp: track min RTT using windowed min-filter
Kathleen Nichols' algorithm for tracking the minimum RTT of a
data stream over some measurement window. It uses constant space
and constant time per update. Yet it almost always delivers
the same minimum as an implementation that has to keep all
the data in the window. The measurement window is tunable via
sysctl.net.ipv4.tcp_min_rtt_wlen with a default value of 5 minutes.
The algorithm keeps track of the best, 2nd best & 3rd best min
values, maintaining an invariant that the measurement time of
the n'th best >= n-1'th best. It also makes sure that the three
values are widely separated in the time window since that bounds
the worse case error when that data is monotonically increasing
over the window.
Upon getting a new min, we can forget everything earlier because
it has no value - the new min is less than everything else in the
window by definition and it's the most recent. So we restart fresh
on every new min and overwrites the 2nd & 3rd choices. The same
property holds for the 2nd & 3rd best.
Therefore we have to maintain two invariants to maximize the
information in the samples, one on values (1st.v <= 2nd.v <=
3rd.v) and the other on times (now-win <=1st.t <= 2nd.t <= 3rd.t <=
now). These invariants determine the structure of the code
The RTT input to the windowed filter is the minimum RTT measured
from ACK or SACK, or as the last resort from TCP timestamps.
The accessor tcp_min_rtt() returns the minimum RTT seen in the
window. ~0U indicates it is not available. The minimum is 1usec
even if the true RTT is below that.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Sat, 17 Oct 2015 04:57:41 +0000 (21:57 -0700)]
tcp: apply Kern's check on RTTs used for congestion control
Currently ca_seq_rtt_us does not use Kern's check. Fix that by
checking if any packet acked is a retransmit, for both RTT used
for RTT estimation and congestion control.
Fixes: 5b08e47ca ("tcp: prefer packet timing to TS-ECR for RTT") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 21 Oct 2015 13:29:56 +0000 (06:29 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-10-19
This series contains updates to i40e and i40evf only.
Kiran adds a spinlock around code accessing VSI MAC filter list to
ensure that we are synchronizing access to the filter list, otherwise
we can end up with multiple accesses at the same time which can cause
the VSI MAC filter list to get in an unstable or corrupted state.
Jesse fixes overlong BIT defines, where the RSS enabling call were
mistakenly missed. Also fixes a bug where the enable function was
enabling the interrupt twice while trying to update the two interrupt
throttle rate thresholds for Rx and Tx, while refactoring the IRQ
enable function to simplify reading the flow. Addressed the high
CPU utilization of some small streaming workloads that the driver should
reduce CPU in.
Anjali fixes two X722 issues with respect to EEPROM checksum verify and
reading NVM version info. Fixed where a mask value was accidentally
replaced with a bit mask causing Flow Director sideband to be broken.
Alex Duyck fixes areas of the drivers which run from hard interrupt
context or with interrupts already disabled in netpoll, so use
napi_schedule_irqoff() instead of napi_schedule().
Mitch fixes the VF drivers to not easily give up when it is not able
to communicate with the PF driver.
Carolyn fixes a problem where our tools MAC loopback test, after driver
unbind would fail because the hardware was configured for multiqueue and
unbind operation did not clear this configuration. Also fixed a issue
where the NVMUpdate tool gets bad data from the PHY when using the PHY
NVM feature because of contention on the MDIO interface from getting
PHY capability calls from the driver during regular operations.
Catherine fixed an issue where we were checking if autoneg was allowed
to change before checking if autoneg was changing, these checks need to
be in the reverse order.
Jean Sacren fixes up an function header comment to align the kernel-docs
with the actual code.
v2: Cleaned up the use of spin_is_locked() in patch 1 based on feedback
from David Miller, since it always evaluates to zero on uni-processor
builds
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Fri, 16 Oct 2015 15:18:11 +0000 (17:18 +0200)]
mac80211: move beacon_loss_count into ifmgd
There's little point in keeping (and even sending to userspace)
the beacon_loss_count value per station, since it can only apply
to the AP on a managed-mode connection. Move the value to ifmgd,
advertise it only in managed mode, and remove it from ethtool as
it's available through better interfaces.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 16 Oct 2015 14:55:51 +0000 (16:55 +0200)]
mac80211: remove sta->last_ack_signal
This file only feeds a debugfs file that isn't very useful, so remove
it. If necessary, we can add other ways to get this information, for
example in the NL80211_CMD_PROBE_CLIENT response.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jean Sacren [Sat, 19 Sep 2015 11:08:45 +0000 (05:08 -0600)]
i40e: declare rather than initialize int object
'err' would be overwritten immediately, so we should declare it only
rather than initialize it to zero.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jean Sacren [Sat, 19 Sep 2015 11:08:43 +0000 (05:08 -0600)]
i40e: fix kernel-doc argument name
The second argument name in the kernel-doc argument list for
i40e_features_check() was slightly off. Fix it for the kernel doc.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There is an error coming back from get_phy_capabilities that does not
seem to have any functional implications. We will continue looking into
why this error message is occurring, but in the meantime, we will move it
to debug to avoid confusion.
Change-ID: I9091754bf62c066ddedeb249923d85606e2d68ed Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e: Fix order of checks when enabling/disabling autoneg in ethtool
We were previously checking if autoneg was allowed to change before
checking if autoneg was changing. We need to do this in the other order
or else we will erroneously return EINVAL when autoneg is not changing.
Change-ID: Iff9f7d1c9bddc1ad1e5d227d4f42754f90155410 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes a problem where the NVMUpdate Tool, when using the PHY
NVM feature, gets bad data from the PHY because of contention on the
MDIO interface from get PHY capability calls from the driver during
regular operations. The problem is fixed by adding a check if media
is available before calling get PHY capability function because that
bit is not set when device is in PHY interaction mode.
Change-ID: Ib89991b0f841808dd92410f5e8683d6ee3301cd0 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e: Fix for Tools loopback test failing after driver load
This patch fixes a problem where our Tools MAC Loopback test, after
driver unbind would fail. This was because the hw was configured
for multiqueue and unbind operation did not clear this configuration.
The problem is fixed by resetting this configuration in i40e_remove.
Change-ID: I130c05138319182ed1476d3a0b5222d6a6320af9 Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e/i40evf: adjust interrupt throttle less frequently
The adaptive ITR (interrupt throttle rate) algorithm was adjusting
the hardware's interrupt rate too frequently. This caused a lot
of variation in the interrupt rate for fairly constant workloads.
Change the code to have a counter and adjust only once every N
number of interrupts.
Change-ID: I0460f1f86571037484eca5aca36ac4d889cb8389 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The dynamic algorithm, while now working, doesn't have good
performance in 40G mode.
One part of this patch addresses the high CPU utilization of some small
streaming workloads that the driver should reduce CPU in.
It also changes the minimum ITR that the dynamic algorithm
will settle on, causing our minimum latency to go from 12us
to about 14us, when using adaptive mode.
It also changes the BULK interrupt rate to allow maximum throughput
on a 40Gb connection with a single thread of transmit, clamping
interrupt rate to 8000 for TX makes single thread traffic go too
slow.
The new ULTRA bulk setting is introduced and is used
when the Rx packet rate on this queue exceeds 40000 packets per
second. This value of 40000 was chosen because the automatic tuning
of minimum ITR=20us means that a single queue can't quite achieve
that many packets per second from a round-robin test.
Change-ID: Icce8faa128688ca5fd2c4229bdd9726877a92ea2 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change moves a multi-line register setting into a function
which simplifies reading the flow of the enable function.
This also fixes a bug where the enable function was enabling
the interrupt twice while trying to update the two interrupt
throttle rate thresholds for Rx and Tx.
Change-ID: Ie308f9d0d48540204590cb9d7a5a7b1196f959bb Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 28 Sep 2015 18:16:50 +0000 (14:16 -0400)]
i40evf: don't give up
When the VF driver is unable to communicate with the PF, it just gives
up and never tries again. Aside from the obvious character flaw that
this shows, it's also a lousy user experience.
When PF communications fail, wait five seconds, and try again. And
again. Don't give up, little VF driver! Your prince will come!
Change-ID: Ia1378a39879883563b8faffce819f375821f9585 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 29 Sep 2015 22:19:50 +0000 (15:19 -0700)]
i40e/i40evf: use napi_schedule_irqoff()
The i40e_intr and i40e/i40evf_msix_clean_rings functions run from hard
interrupt context or with interrupts already disabled in netpoll.
They can use napi_schedule_irqoff() instead of napi_schedule()
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acquire NVM, before issuing an AQ read nvm command for X722.
We need to acquire the NVM before issuing an AQ read to the NVM
otherwise we will get EBUSY from the FW. Also release when done.
This fixes the two X722 issues with respect to eeprom checksum verify
and reading NVM version info.
With this patch in place, i40e driver will provide basic support
for X722 devices.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces a spinlock which is to be used for synchronizing
access to VSI's MAC filter list.
This patch also synchronizes execution of other codepaths which are
accessing VSI's MAC filter list with execution of
service_task:sync_vsi_filters.
In function i40e_add_vsi, copied out LAA MAC address instead of cloning
MAC filter entry because only MAC address is needed to remove MAC VLAN
filter from FW/HW.
Change-ID: I0e10ac7c715d44aa994239642aa4d57c998573a2 Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
1) Account for extra headroom in ath9k driver, from Felix Fietkau.
2) Fix OOPS in pppoe driver due to incorrect socket state transition,
from Guillaume Nault.
3) Kill memory leak in amd-xgbe debugfx, from Geliang Tang.
4) Power management fixes for iwlwifi, from Johannes Berg.
5) Fix races in reqsk_queue_unlink(), from Eric Dumazet.
6) Fix dst_entry usage in ARP replies, from Jiri Benc.
7) Cure OOPSes with SO_GET_FILTER, from Daniel Borkmann.
8) Missing allocation failure check in amd-xgbe, from Tom Lendacky.
9) Various resource allocation/freeing cures in DSA< from Neil
Armstrong.
10) A series of bug fixes in the openvswitch conntrack support, from
Joe Stringer.
11) Fix two cases (BPF and act_mirred) where we have to clean the sender
cpu stored in the SKB before transmitting. From WANG Cong and
Alexei Starovoitov.
12) Disable VLAN filtering in promiscuous mode in mlx5 driver, from
Achiad Shochat.
13) Older bnx2x chips cannot do 4-tuple UDP hashing, so prevent this
configuration via ethtool. From Yuval Mintz.
14) Don't call rt6_uncached_list_flush_dev() from rt6_ifdown() when
'dev' is NULL, from Eric Biederman.
15) Prevent stalled link synchronization in tipc, from Jon Paul Maloy.
16) kcalloc() gstrings ethtool buffer before having driver fill it in,
in order to prevent kernel memory leaking. From Joe Perches.
17) Fix mixxing rt6_info initialization for blackhole routes, from
Martin KaFai Lau.
18) Kill VLAN regression in via-rhine, from Andrej Ota.
19) Missing pfmemalloc check in sk_add_backlog(), from Eric Dumazet.
20) Fix spurious MSG_TRUNC signalling in netlink dumps, from Ronen Arad.
21) Scrube SKBs when pushing them between namespaces in openvswitch,
from Joe Stringer.
22) bcmgenet enables link interrupts too early, fix from Florian
Fainelli.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits)
net: bcmgenet: Fix early link interrupt enabling
tunnels: Don't require remote endpoint or ID during creation.
openvswitch: Scrub skb between namespaces
xen-netback: correctly check failed allocation
net: asix: add support for the Billionton GUSB2AM-1G-B USB adapter
netlink: Trim skb to alloc size to avoid MSG_TRUNC
net: add pfmemalloc check in sk_add_backlog()
via-rhine: fix VLAN receive handling regression.
ipv6: Initialize rt6_info properly in ip6_blackhole_route()
ipv6: Move common init code for rt6_info to a new function rt6_info_init()
Bluetooth: Fix initializing conn_params in scan phase
Bluetooth: Fix conn_params list update in hci_connect_le_scan_cleanup
Bluetooth: Fix remove_device behavior for explicit connects
Bluetooth: Fix LE reconnection logic
Bluetooth: Fix reference counting for LE-scan based connections
Bluetooth: Fix double scan updates
mlxsw: core: Fix race condition in __mlxsw_emad_transmit
tipc: move fragment importance field to new header position
ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings
tipc: eliminate risk of stalled link synchronization
...
Florian Fainelli [Sat, 17 Oct 2015 21:22:46 +0000 (14:22 -0700)]
net: bcmgenet: Fix early link interrupt enabling
Link interrupts are enabled in init_umac(), which is too early for us to
process them since we do not yet have a valid PHY device pointer. On
BCM7425 chips for instance, we will crash calling phy_mac_interrupt()
because phydev is NULL.
Fix this by moving the link interrupts enabling in
bcmgenet_netif_start(), under a specific function:
bcmgenet_link_intr_enable() and while at it, update the comments
surrounding the code.
Fixes: 6cc8e6d4dcb36 ("net: bcmgenet: Delay PHY initialization to bcmgenet_open()") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree. Most relevantly, updates for the nfnetlink_log to integrate with
conntrack, fixes for cttimeout and improvements for nf_queue core, they are:
1) Remove useless ifdef around static inline function in IPVS, from
Eric W. Biederman.
2) Simplify the conntrack support for nfnetlink_queue: Merge
nfnetlink_queue_ct.c file into nfnetlink_queue_core.c, then rename it back
to nfnetlink_queue.c
3) Use y2038 safe timestamp from nfnetlink_queue.
4) Get rid of dead function definition in nf_conntrack, from Flavio
Leitner.
5) Attach conntrack support for nfnetlink_log.c, from Ken-ichirou MATSUZAWA.
This adds a new NETFILTER_NETLINK_GLUE_CT Kconfig switch that
controls enabling both nfqueue and nflog integration with conntrack.
The userspace application can request this via NFULNL_CFG_F_CONNTRACK
configuration flag.
6) Remove unused netns variables in IPVS, from Eric W. Biederman and
Simon Horman.
7) Don't put back the refcount on the cttimeout object from xt_CT on success.
8) Fix crash on cttimeout policy object removal. We have to flush out
the cttimeout extension area of the conntrack not to refer to an unexisting
object that was just removed.
9) Make sure rcu_callback completion before removing nfnetlink_cttimeout
module removal.
10) Fix compilation warning in br_netfilter when no nf_defrag_ipv4 and
nf_defrag_ipv6 are enabled. Patch from Arnd Bergmann.
11) Autoload ctnetlink dependencies when NFULNL_CFG_F_CONNTRACK is
requested. Again from Ken-ichirou MATSUZAWA.
12) Don't use pointer to previous hook when reinjecting traffic via
nf_queue with NF_REPEAT verdict since it may be already gone. This
also avoids a deadloop if the userspace application keeps returning
NF_REPEAT.
13) A bunch of cleanups for netfilter IPv4 and IPv6 code from Ian Morris.
14) Consolidate logger instance existence check in nfulnl_recv_config().
15) Fix broken atomicity when applying configuration updates to logger
instances in nfnetlink_log.
16) Get rid of the .owner attribute in our hook object. We don't need
this anymore since we're dropping pending packets that have escaped
from the kernel when unremoving the hook. Patch from Florian Westphal.
17) Remove unnecessary rcu_read_lock() from nf_reinject code, we always
assume RCU read side lock from .call_rcu in nfnetlink. Also from Florian.
18) Use static inline function instead of macros to define NF_HOOK() and
NF_HOOK_COND() when no netfilter support in on, from Arnd Bergmann.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini found hang with rds-ping while testing RDS over TCP. Its
a corner case and doesn't happen always. The issue is not reproducible
with IB transport. Its clear from below dump why we see it with RDS TCP.
This happens because rds_send_xmit() chain wants to take
sock_lock which is already taken by tcp_v4_rcv() on its
way to rds_tcp_data_ready(). Commit db6526dcb51b ("RDS: use
rds_send_xmit() state instead of RDS_LL_SEND_FULL") which
was trying to opportunistically finish the send request
in same thread context.
But because of above recursive lock hang with RDS TCP,
the send work from rds_send_pong() needs to deferred to
worker to avoid lock up. Given RDS ping is more of connectivity
test than performance critical path, its should be ok even
for transport like IB.
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Fri, 16 Oct 2015 23:36:00 +0000 (16:36 -0700)]
tunnels: Don't require remote endpoint or ID during creation.
Before lightweight tunnels existed, it really didn't make sense to
create a tunnel that was not fully specified, such as without a
destination IP address - the resulting packets would go nowhere.
However, with lightweight tunnels, the opposite is true - it doesn't
make sense to require this information when it will be provided later
on by the route. This loosens the requirements for this information.
An alternative would be to allow the relaxed version only when
COLLECT_METADATA is enabled. However, since there are several
variations on this theme (such as NBMA tunnels in GRE), just dropping
the restrictions seems the most consistent across tunnels and with
the existing configuration.
CC: John Linville <linville@tuxdriver.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing rule to export mpls iptunnel header needed by iproute2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Fri, 16 Oct 2015 18:08:18 +0000 (11:08 -0700)]
openvswitch: Scrub skb between namespaces
If OVS receives a packet from another namespace, then the packet should
be scrubbed. However, people have already begun to rely on the behaviour
that skb->mark is preserved across namespaces, so retain this one field.
This is mainly to address information leakage between namespaces when
using OVS internal ports, but by placing it in ovs_vport_receive() it is
more generally applicable, meaning it should not be overlooked if other
port types are allowed to be moved into namespaces in future.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 19 Oct 2015 05:23:33 +0000 (22:23 -0700)]
Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:
====================
pull request: bluetooth 2015-10-16
First of all, sorry for the late set of patches for the 4.3 cycle. We
just finished an intensive week of testing at the Bluetooth UnPlugFest
and discovered (and fixed) issues there. Unfortunately a few issues
affect 4.3-rc5 in a way that they break existing Bluetooth LE mouse and
keyboard support.
The regressions result from supporting LE privacy in conjunction with
scanning for Resolvable Private Addresses before connecting. A feature
that has been tested heavily (including automated unit tests), but sadly
some regressions slipped in. The UnPlugFest with its multitude of test
platforms is a good battle testing ground for uncovering every corner
case.
The patches in this pull request focus only on fixing the regressions in
4.3-rc5. The patches look a bit larger since we also added comments in
the critical sections of the fixes to improve clarity.
I would appreciate if we can get these regression fixes to Linus
quickly. Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 16 Oct 2015 10:00:51 +0000 (12:00 +0200)]
net: hix5hd2_gmac: avoid integer overload warning
BITS_RX_EN is an 'unsigned long' constant, so the ones complement of that
has bits set that do not fit into a 32-bit variable on 64-bit architectures,
which causes a harmless gcc warning:
drivers/net/ethernet/hisilicon/hix5hd2_gmac.c: In function 'hix5hd2_port_disable':
drivers/net/ethernet/hisilicon/hix5hd2_gmac.c:374:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
writel_relaxed(~(BITS_RX_EN | BITS_TX_EN), priv->base + PORT_EN);
This adds a cast to (u32) to tell gcc that the code is indeed fine.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 16 Oct 2015 09:33:49 +0000 (11:33 +0200)]
net: hisilicon: add OF dependency
The HNS MDIO driver fails to build on older ARM machines that are not
yet converted to CONFIG_OF:
drivers/net/ethernet/hisilicon/hns_mdio.c: In function 'hns_mdio_bus_name':
drivers/net/ethernet/hisilicon/hns_mdio.c:405:14: error: 'OF_BAD_ADDR' undeclared (first use in this function)
u64 taddr = OF_BAD_ADDR;
^
drivers/net/ethernet/hisilicon/hns_mdio.c:405:14: note: each undeclared identifier is reported only once for each function it appears in
drivers/net/ethernet/hisilicon/hns_mdio.c:409:11: error: implicit declaration of function 'of_translate_address' [-Werror=implicit-function-declaration]
taddr = of_translate_address(np, addr);
^
This clarifies the dependency to ensure we don't attempt to build these
drivers without CONFIG_OF, but also adds a COMPILE_TEST alternative to
give us better build coverage testing.
Build-tested on x86 as well to ensure this actually works.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 16 Oct 2015 09:30:56 +0000 (11:30 +0200)]
net: hisilicon: include linux/vmalloc.h in dsaf
Some configurations fail to build the hns dsaf code because of
a missing header file:
ethernet/hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_init':
ethernet/hisilicon/hns/hns_dsaf_main.c:1096:2: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
priv->soft_mac_tbl = vzalloc(sizeof(*priv->soft_mac_tbl)
This adds the correct #include.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 19 Oct 2015 02:57:12 +0000 (19:57 -0700)]
Merge branch 'hns-fixes'
yankejian says:
====================
net: hns: fixes two bugs in hns driver
This patchset fixes two bugs in hns driver.
- fixes timeout when received pause frame from the connective ports
- should be set by using ethtool -s when the devices are link down
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
lisheng [Fri, 16 Oct 2015 09:03:20 +0000 (17:03 +0800)]
net: hns: fixes a bug about timeout by pause frame
this patch fixes the bug triggered timeout sequence. when the connective
ports cannot accept the packets with higher speed, they will send out the
pause frame to the Soc's mac. At that time, the driver resets the relevant
of the Soc, then it causes the packets cannot be sent out immediately.
this patch fixes the issue.
Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chenny Xu [Fri, 16 Oct 2015 09:03:19 +0000 (17:03 +0800)]
net: hns: fixes the issue by using ethtool -s
before this patch, hns driver only permits user to set the net device
by using ethtool -s when the device is link up. it is obviously not so
good. it needs to be set no matter it is link up or down. so this patch
fixes this issue.
Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Chenny Xu <chenny.xu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 19 Oct 2015 02:54:45 +0000 (19:54 -0700)]
Merge branch 'hsi-fixes'
huangdaode says:
====================
net: hisilicon fix some bugs in HNS drivers
This patchset fixes the two bugs in HNS driver, one is remove the hnae sysfs interface
according to the review comments from Arnd Bergmann <arnd@arndb.de>, another
is fixing the wrong mac_id judgement bug which is found during internal tests.
change log:
v3:
remove the hnae sysfs interface.
v2:
1) remove first bug fix, which is fixed in another patch submitted by
Arnd Bergmann <arnd@arndb.de>
2) change the code sytyle according to Joe.
v1:
initial version.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
huangdaode [Fri, 16 Oct 2015 03:54:17 +0000 (11:54 +0800)]
net: hisilicon fix a bug on Hisilicon Network Subsystem
This patch fixes the wrong judgement of mac_id when get port num.
Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>