git.proxmox.com Git - mirror_ubuntu-eoan-kernel.git/log

netpoll: Don't drop all received packets.

Change the strategy of netpoll from dropping all packets received
during netpoll_poll_dev to calling napi poll with a budget of 0
(to avoid processing drivers rx queue), and to ignore packets received
with netif_rx (those will safely be placed on the backlog queue).

All of the netpoll supporting drivers have been reviewed to ensure
either thay use netif_rx or that a budget of 0 is supported by their
napi poll routine and that a budget of 0 will not process the drivers
rx queues.

Not dropping packets makes NETPOLL_RX_DROP unnecesary so it is removed.

npinfo->rx_flags is removed as rx_flags with just the NETPOLL_RX_ENABLED
flag becomes just a redundant mirror of list_empty(&npinfo->rx_np).

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netpoll: Add netpoll_rx_processing

Add a helper netpoll_rx_processing that reports when netpoll has
receive side processing to perform.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netpoll: Warn if more packets are processed than are budgeted

There is already a warning for this case in the normal netpoll path,
but put a copy here in case how netpoll calls the poll functions
causes a differenet result.

netpoll will shortly call the napi poll routine with a budget 0 to
avoid any rx packets being processed. As nothing does that today
we may encounter drivers that have problems so a netpoll specific
warning seems desirable.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netpoll: Visit all napi handlers in poll_napi

In poll_napi loop through all of the napi handlers even when the
budget falls to 0 to ensure that we process all of the tx_queues, and
so that we continue to call into drivers when our initial budget is 0.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netpoll: Pass budget into poll_napi

This moves the control logic to the top level in netpoll_poll_dev
instead of having it dispersed throughout netpoll_poll_dev,
poll_napi and poll_one_napi.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netpoll: move setting of NETPOLL_RX_DROP into netpoll_poll_dev

Today netpoll depends on setting NETPOLL_RX_DROP before networking
drivers receive packets in interrupt context so that the packets can
be dropped. Move this setting into netpoll_poll_dev from
poll_one_napi so that if ndo_poll_controller happens to receive
packets we will drop the packets on the floor instead of letting the
packets bounce through the networking stack and potentially cause problems.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next,
most relevantly they are:

* cleanup to remove double semicolon from stephen hemminger.

* calm down sparse warning in xt_ipcomp, from Fan Du.

* nf_ct_labels support for nf_tables, from Florian Westphal.

* new macros to simplify rcu dereferences in the scope of nfnetlink
  and nf_tables, from Patrick McHardy.

* Accept queue and drop (including reason for drop) to verdict
  parsing in nf_tables, also from Patrick.

* Remove unused random seed initialization in nfnetlink_log, from
  Florian Westphal.

* Allow to attach user-specific information to nf_tables rules, useful
  to attach user comments to rule, from me.

* Return errors in ipset according to the manpage documentation, from
  Jozsef Kadlecsik.

* Fix coccinelle warnings related to incorrect bool type usage for ipset,
  from Fengguang Wu.

* Add hash:ip,mark set type to ipset, from Vytas Dauksa.

* Fix message for each spotted by ipset for each netns that is created,
  from Ilia Mirkin.

* Add forceadd option to ipset, which evicts a random entry from the set
  if it becomes full, from Josh Hunt.

* Minor IPVS cleanups and fixes from Andi Kleen and Tingwei Liu.

* Improve conntrack scalability by removing a central spinlock, original
  work from Eric Dumazet. Jesper Dangaard Brouer took them over to address
  remaining issues. Several patches to prepare this change come in first
  place.

* Rework nft_hash to resolve bugs (leaking chain, missing rcu synchronization
  on element removal, etc. from Patrick McHardy.

* Restore context in the rule deletion path, as we now release rule objects
  synchronously, from Patrick McHardy. This gets back event notification for
  anonymous sets.

* Fix NAT family validation in nft_nat, also from Patrick.

* Improve scalability of xt_connlimit by using an array of spinlocks and
  by introducing a rb-tree of hashtables for faster lookup of accounted
  objects per network. This patch was preceded by several patches and
  refactorizations to accomodate this change including the use of kmem_cache,
  from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: connlimit: use rbtree for per-host conntrack obj storage

With current match design every invocation of the connlimit_match
function means we have to perform (number_of_conntracks % 256) lookups
in the conntrack table [ to perform GC/delete stale entries ].
This is also the reason why ____nf_conntrack_find() in perf top has
> 20% cpu time per core.

This patch changes the storage to rbtree which cuts down the number of
ct objects that need testing.

When looking up a new tuple, we only test the connections of the host
objects we visit while searching for the wanted host/network (or
the leaf we need to insert at).

The slot count is reduced to 32.  Increasing slot count doesn't
speed up things much because of rbtree nature.

before patch (50kpps rx, 10kpps tx):
+  20.95%  ksoftirqd/0  [nf_conntrack] [k] ____nf_conntrack_find
+  20.50%  ksoftirqd/1  [nf_conntrack] [k] ____nf_conntrack_find
+  20.27%  ksoftirqd/2  [nf_conntrack] [k] ____nf_conntrack_find
+   5.76%  ksoftirqd/1  [nf_conntrack] [k] hash_conntrack_raw
+   5.39%  ksoftirqd/2  [nf_conntrack] [k] hash_conntrack_raw
+   5.35%  ksoftirqd/0  [nf_conntrack] [k] hash_conntrack_raw

after (90kpps, 51kpps tx):
+  17.24%       swapper  [nf_conntrack]    [k] ____nf_conntrack_find
+   6.60%   ksoftirqd/2  [nf_conntrack]    [k] ____nf_conntrack_find
+   2.73%       swapper  [nf_conntrack]    [k] hash_conntrack_raw
+   2.36%       swapper  [xt_connlimit]    [k] count_tree

Obvious disadvantages to previous version are the increase in code
complexity and the increased memory cost.

Partially based on Eric Dumazets fq scheduler.

Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: connlimit: make same_source_net signed

currently returns 1 if they're the same. Make it work like mem/strcmp
so it can be used as rbtree search function.

Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: connlimit: use keyed locks

connlimit currently suffers from spinlock contention, example for
4-core system with rps enabled:

+  20.84%   ksoftirqd/2  [kernel.kallsyms] [k] _raw_spin_lock_bh
+  20.76%   ksoftirqd/1  [kernel.kallsyms] [k] _raw_spin_lock_bh
+  20.42%   ksoftirqd/0  [kernel.kallsyms] [k] _raw_spin_lock_bh
+   6.07%   ksoftirqd/2  [nf_conntrack]    [k] ____nf_conntrack_find
+   6.07%   ksoftirqd/1  [nf_conntrack]    [k] ____nf_conntrack_find
+   5.97%   ksoftirqd/0  [nf_conntrack]    [k] ____nf_conntrack_find
+   2.47%   ksoftirqd/2  [nf_conntrack]    [k] hash_conntrack_raw
+   2.45%   ksoftirqd/0  [nf_conntrack]    [k] hash_conntrack_raw
+   2.44%   ksoftirqd/1  [nf_conntrack]    [k] hash_conntrack_raw

May allow parallel lookup/insert/delete if the entry is hashed to
another slot.  With patch:

+  20.95%  ksoftirqd/0  [nf_conntrack] [k] ____nf_conntrack_find
+  20.50%  ksoftirqd/1  [nf_conntrack] [k] ____nf_conntrack_find
+  20.27%  ksoftirqd/2  [nf_conntrack] [k] ____nf_conntrack_find
+   5.76%  ksoftirqd/1  [nf_conntrack] [k] hash_conntrack_raw
+   5.39%  ksoftirqd/2  [nf_conntrack] [k] hash_conntrack_raw
+   5.35%  ksoftirqd/0  [nf_conntrack] [k] hash_conntrack_raw
+   2.00%  ksoftirqd/1  [kernel.kallsyms] [k] __rcu_read_unlock

Improved rx processing rate from ~35kpps to ~50 kpps.

Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge branch 'napi_budget_zero'

Eric W. Biederman says:

====================
Don't receive packets when the napi budget == 0

After reading through all 120 drivers supporting netpoll I have found 16
more that process at least received packet when the napi budget == 0.

Processing more packets than your budget has always been a bug but
we haven't cared before so it looks like these drivers slipped through,
and need fixes.

As netpoll will shortly be using a budget of 0 to get the tx queue
processing with the rx queue processing we now care.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxge: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tc35815: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tilepro: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tilegx: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s2io: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sky2: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmveth: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fs_enet: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd8111e: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-next'

Hariprasad Shenai says:

====================
Doorbell drop Avoidance Bug fix for iw_cxgb4

This patch series provides fixes for Chelsio T4/T5 adapters
related to DB Drop avoidance and other small fix related to keepalive on
iw-cxgb4.

The patches series is created against David Miller's 'net-next' tree.
And includes patches on cxgb4 and iw_cxgb4 driver.

We would like to request this patch series to get merged via David Miller's
'net-next' tree.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes

The current logic suffers from a slow response time to disable user DB
usage, and also fails to avoid DB FIFO drops under heavy load. This commit
fixes these deficiencies and makes the avoidance logic more optimal.
This is done by more efficiently notifying the ULDs of potential DB
problems, and implements a smoother flow control algorithm in iw_cxgb4,
which is the ULD that puts the most load on the DB fifo.

Design:

cxgb4:

Direct ULD callback from the DB FULL/DROP interrupt handler.  This allows
the ULD to stop doing user DB writes as quickly as possible.

While user DB usage is disabled, the LLD will accumulate DB write events
for its queues.  Then once DB usage is reenabled, a single DB write is
done for each queue with its accumulated write count.  This reduces the
load put on the DB fifo when reenabling.

iw_cxgb4:

Instead of marking each qp to indicate DB writes are disabled, we create
a device-global status page that each user process maps.  This allows
iw_cxgb4 to only set this single bit to disable all DB writes for all
user QPs vs traversing the idr of all the active QPs.  If the libcxgb4
doesn't support this, then we fall back to the old approach of marking
each QP.  Thus we allow the new driver to work with an older libcxgb4.

When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
via the status page and transition the DB state to STOPPED.  As user
processes see that DB writes are disabled, they call into iw_cxgb4
to submit their DB write events.  Since the DB state is in STOPPED,
the QP trying to write gets enqueued on a new DB "flow control" list.
As subsequent DB writes are submitted for this flow controlled QP, the
amount of writes are accumulated for each QP on the flow control list.
So all the user QPs that are actively ringing the DB get put on this
list and the number of writes they request are accumulated.

When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
context, we change the DB state to FLOW_CONTROL, and begin resuming all
the QPs that are on the flow control list.  This logic runs on until
the flow control list is empty or we exit FLOW_CONTROL mode (due to
a DB DROP upcall, for example).  QPs are removed from this list, and
their accumulated DB write counts written to the DB FIFO.  Sets of QPs,
called chunks in the code, are removed at one time. The chunk size is 64.
So 64 QPs are resumed at a time, and before the next chunk is resumed, the
logic waits (blocks) for the DB FIFO to drain.  This prevents resuming to
quickly and overflowing the FIFO.  Once the flow control list is empty,
the db state transitions back to NORMAL and user QPs are again allowed
to write directly to the user DB register.

The algorithm is designed such that if the DB write load is high enough,
then all the DB writes get submitted by the kernel using this flow
controlled approach to avoid DB drops.  As the load lightens though, we
resume to normal DB writes directly by user applications.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4/iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice

Based on original work by Anand Priyadarshee <anandp@chelsio.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq

Replace the bh safe variant with the hard irq safe variant.

We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.

Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Conflicts:
drivers/net/usb/r8152.c
drivers/net/xen-netback/netback.c

Both the r8152 and netback conflicts were simple overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'alb_learning'

Veaceslav Falico says:

====================
bonding: use correct ether type for alb

There have been reports that, while using the ETH_P_LOOP ether type
(0x0060), the ether type is treated as its packet length.

To avoid that and to not break already existing apps - add new ether type
ETH_P_LOOPBACK that contains the correct id - 0x9000.
====================

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>

bonding: use the correct ether type for alb

Currently it's using the wrong ETH_P_LOOP type, which is sometimes treated
as packet length instead of ether type (because it's 0x0060).

Use the new ETH_P_LOOPBACK type.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ether: add loopback type ETH_P_LOOPBACK

Per IEEE 802.3*, the correct packet type for loopback 0x9000. There's
already one ETH_P_LOOP 0x0060, which has been there for ages, however it's
plainly wrong as anything that small is considered a length field.

We can't remove it because legacy, so add a new type which corresponds to
the correct id.

http://www.iana.org/assignments/ieee-802-numbers/ieee-802-numbers.xhtml

CC: "David S. Miller" <davem@davemloft.net>
CC: Stefan Richter <stefanr@s5r6.in-berlin.de>
CC: Simon Wunderlich <sw@simonwunderlich.de>
CC: Neil Jerram <Neil.Jerram@metaswitch.com>
CC: Simon Horman <horms@verge.net.au>
CC: Arvid Brodin <Arvid.Brodin@xdin.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to igb, i40e and i40evf.

I provide a code comment fix which David Miller noticed in the last
series of patches I submitted.

Shannon provides a patch to cleanup the NAPI structs when deleting the
netdev.

Anjali provides several patches for i40e, first fixes a bug in the update
filter logic which was causing a kernel panic.  Then provides a fix to
rename an error bit to correctly indicate the error.  Adds a definition
for a new state variable to keep track of features automatically disabled
due to hardware resource limitations versus user enforced feature disabled.
Anjali provides a patch to add code to handle when there is a filter
programming error due to a full table, which also resolves a previous
compile warning about an unused "*pf" variable introduced in the last i40e
series patch submission.

Jesse provides three i40e patches to cleanup strings to make more
consistent and to align with other Intel drivers.

Akeem cleans up a misleading function header comment for i40e.

Mitch provides a fix for i40e/i40evf to use the correctly reported number
of MSI-X vectors in the PF an VF.  Then provides a patch to use
dma_set_mask_and_coherent() which was introduced in v3.13 and simplifies
the DMA mapping code a bit.

v2:
- dropped the 2 ixgbe patches from Emil based on feedback from David Miller,
  where the 2 fixes should be handled in the net core to fix all drivers
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ieee802154-next'

Phoebe Buckheister says:

====================
ieee802154: fix endianness and header handling

This patch set enforces network byte order on all internal operations and
fields of the 802.15.4 stack and adds a general representation of 802.15.4
headers with operations to create and parse those headers. This reduces code
duplication in the current stack and also allows for upper layers to read
headers of packets they have just received; it is also necessary for 802.15.4
link layer security, which requires header mangling.

Changes since v1:
* fixed lowpan packet rx after reassembly. Control blocks were used to
retrieve source/dest addresses, but the CB is clobbered by reassembly.
Instead, parse the header anew in lowpan.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: add proper length checks to header creations

Have mac802154 header_ops.create fail with -EMSGSIZE if the length
passed will be too large to fit a frame. Since 6lowpan will ensure that
no packet payload will be too large, pass a length of 0 there. 802.15.4
dgram sockets will also return -EMSGSIZE on payloads larger than the
device MTU instead of -EINVAL.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

6lowpan: move lowpan frag_info out of 802.15.4 headers

Fragmentation and reassembly information for 6lowpan is independent from
the 802.15.4 stack and used only by the 6lowpan reassembly process. Move
the ieee802154_frag_info struct to a private are, it needn't be in the
802.15.4 skb control block.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: use ieee802154_addr instead of *_sa variants

Change all internal uses of ieee802154_addr_sa to ieee802154_addr,
except for those instances that communicate directly with userspace.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac802154: use header operations to create/parse headers

Use the operations on 802.15.4 header structs introduced in a previous
patch to create and parse all headers in the mac802154 stack. This patch
reduces code duplication between different parts of the mac802154 stack
that needed information from headers, and also fixes a few bugs that
seem to have gone unnoticed until now:

* 802.15.4 dgram sockets would return a slightly incorrect value for
   the SIOCINQ ioctl
* mac802154 would not drop frames with the "security enabled" bit set,
   even though it does not support security, in violation of the
   standard

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: add header structs with endiannes and operations

This patch provides a set of structures to represent 802.15.4 MAC
headers, and a set of operations to push/pull/peek these structs from
skbs. We cannot simply pointer-cast the skb MAC header pointer to these
structs, because 802.15.4 headers are wildly variable - depending on the
first three bytes, virtually all other fields of the header may be
present or not, and be present with different lengths.

The new header creation/parsing routines also support 802.15.4 security
headers, which are currently not supported by the mac802154
implementation of the protocol.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: enforce consistent endianness in the 802.15.4 stack

Enable sparse warnings about endianness, replace the remaining fields
regarding network operations without explicit endianness annotations
with such that are annotated, and propagate this through the entire
stack.

Uses of ieee802154_addr_sa are not changed yet, this patch is only
concerned with all other fields (such as address filters, operation
parameters and the likes).

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: add address struct with proper endiannes and some operations

Add a replacement ieee802154_addr struct with proper endianness on
fields. Short address fields are stored as __le16 as on the network,
extended (EUI64) addresses are __le64 as opposed to the u8[8] format
used previously. This disconnect with the netdev address, which is
stored as big-endian u8[8], is intentional.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: rename struct ieee802154_addr to *_sa

The struct as currently defined uses host byte order for some fields,
and most big endian/EUI display byte order for other fields. Inside the
stack, endianness should ideally match network byte order where possible
to minimize the number of byteswaps done in critical paths, but this
patch does not address this; it is only preparatory.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"Two x86 fixes: Suresh's eager FPU fix, and a fix to the NUMA quirk for
  AMD northbridges.

  This only includes Suresh's fix patch, not the "mostly a cleanup"
  patch which had __init issues"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/amd/numa: Fix northbridge quirk to assign correct NUMA node
  x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPU

Merge tag 'pm+acpi-3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:
"Three of these are regression fixes, for two recent regressions and
  one introduced during the 3.13 cycle, and the fourth one is a working
  version of the fix that had to be reverted last time.

  Specifics:

   - A recent ACPI resources handling fix overlooked the fact that it
     had to update the ACPI PNP subsystem's resources parsing too and
     caused confusing warning messages to be printed during system
     intialization on some systems (with arguably buggy ACPI tables).
     Fix from Zhang Rui.

   - Moving the early ACPI initialization before timekeeping_init()
     earlier in this cycle broke fast TSC calibration on at least one
     system, so it needs to be done later, but still before
     efi_enter_virtual_mode() to allow the EFI initialization to refer
     to ACPI.

   - A change related to code duplication reduction in the cpufreq core
     inadvertently caused cpufreq intialization to fail for some CPUs
     handled by intel_pstate by adding checks that may fail for that
     driver, but aren't even necessary when it is used.  The issue is
     addressed by preventing those checks from run in the configurations
     in which they aren't needed.

   - If the Hardware Reduced ACPI flag is set in the ACPI tables, system
     suspend, hibernation and ACPI power off will only work when special
     sleep control and sleep status registeres are provided (their
     addresses in the ACPI tables are not zero).  If those registers are
     not available, the features in question have no chances to work, so
     they shouldn't even be regarded as supported.  That helps with
     power off in particular, because alternative power off methods may
     be used then and they may actually work"

* tag 'pm+acpi-3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / sleep: Add extra checks for HW Reduced ACPI mode sleep states
  ACPI / init: Invoke early ACPI initialization later
  cpufreq: Skip current frequency initialization for ->setpolicy drivers
  PNP / ACPI: proper handling of ACPI IO/Memory resource parsing failures

Merge tag 'dm-3.14-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device-mapper fixes form Mike Snitzer:
"Two small fixes for the DM cache target:

   - fix corruption with >2TB fast device due to truncation bug
   - fix access beyond end of origin device due to a partial block"

* tag 'dm-3.14-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: fix access beyond end of origin device
  dm cache: fix truncation bug when copying a block to/from >2TB fast device

i40e/i40evf: Use dma_set_mask_and_coherent

In Linux 3.13, dma_set_mask_and_coherent was introduced, and we have
been encouraged to use it. It simplifies the DMA mapping code a bit as
well.

Change-ID: I66e340245af7d0dedfa8b40fec1f5e352754432e
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Use correct number of VF vectors

Now that the 2.4 firmware reports the correct number of MSI-X vectors,
use this value correctly when communicating with the VF, and when
setting up the interrupt linked list.

The PF has always reported the correct number of MSI-X vectors, so we
should never increment the value in the vf driver.

Change-ID: Ifeefc631c321390192219ce2af9ada6180c1492f
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Let MDD events be handled by MDD handler

We have a separate handler for MDD events, a generic reset is not required.

Change-ID: I77858e2d479e4e65c52aede67109464649ea0253
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Bug fix for FDIR replay logic

The FDIR replay logic was being run a little too soon (before the
queues were enabled) and hence the tail bump was not effective till
a later transaction happened on the queue.

Change-ID: Icfd7cd2e79fc3cae3cbd3f703a2b3a148b4e7bf6
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Add code to handle FD table full condition

Add code to enforce the following policy:
- If the HW reports filter programming error, we check if it's due to a
  full table.
- If so, we go ahead and turn off new rule addition for ATR and then SB
  in that order.
- We monitor the programmed filter count, if enough room is created due
  to filter deletion/reset, we then re-enable SB and ATR new rule addition.

Change-ID: I69d24b29e5c45bc4fa861258e11c2fa7b8868748
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Define a new state variable to keep track of feature auto disable

This variable is a bit mask. It is needed to differentiate between
user enforced feature disables and auto disable of features due to
HW resource limitations.

Change-ID: Ib4b4f6ae1bb2668c12e482d2555100bc8ad713d5
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix function comments

Correct misleading function comment.

Change-ID: I3f66cff5cc00250a285756b6500a58fad8eba4b5
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: simplified init string

In a similar way to how ixgbe works, print a short one-line string
showing what features and number of queues the driver and hardware has
enabled at probe time.

Example (wrapped for the commit message):
i40e 0000:06:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 32 FDir RSS
ATR NTUPLE DCB

Change-ID: I177bf7f93d1c4c921529c92fdf66e614f6b4f755
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: cleanup strings

This patch cleans up the strings that the driver prints during normal
operation and moves many strings into dev_dbg. It also cleans up
strings printed during reset.

Change-ID: I1835cc4e3c3b22596182b683284e6bb87eac61b2
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: make string references to q be queue

This cleans up strings for consistency, q is replaced with queue.

Change-ID: Ia5f9dfae9af261f4c24485854264e02363729cf3
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Some flow director HW definition fixes

1) Fix a name of the error bit to correctly indicate the error.
2) Added a fd_id field in the 32 byte desc at the place(qw0) where it gets
reported in the programming error desc WB. In a normal data desc
the fd_id field is reported in qw3.

Change-ID: Ide9a24bff7273da5889c36635d629bc3b5212010
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix a bug in the update logic for FDIR SB filter.

The update filter logic was causing a kernel panic in the original code.
We need to compare the input set to decide whether or not to delete a
filter since we do not have a hash stored. This new design helps fix the issue.

Change-ID: I2462b108e58ca4833312804cda730b4660cc18c9
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: delete netdev after deleting napi and vectors

We've been deleting the netdev before getting around to deleting the napi
structs. Unfortunately, we then didn't delete the napi structs because we
have a check for netdev, thus we were leaving garbage around in the system.

Change-ID: Ife540176f6c9f801147495b3f2d2ac2e61ddcc58
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: Fix code comment

Recently added code comment was missing a space that is needed.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

x86/amd/numa: Fix northbridge quirk to assign correct NUMA node

For systems with multiple servers and routed fabric, all
northbridges get assigned to the first server. Fix this by also
using the node reported from the PCI bus. For single-fabric
systems, the northbriges are on PCI bus 0 by definition, which
are on NUMA node 0 by definition, so this is invarient on most
systems.

Tested on fam10h and fam15h single and multi-fabric systems and
candidate for stable.

Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1394710981-3596-1-git-send-email-daniel@numascale.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Pretty minor set of fixes for radeon, ttm and vmwgfx.  The ttm ones
  are a regression and an oops seen on server chipsets"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/vmwgfx: Fix a surface reference corner-case in legacy emulation mode
  drm/radeon/cik: properly set compute ring status on disable
  drm/radeon/cik: stop the sdma engines in the enable() function
  drm/radeon/cik: properly set sdma ring status on disable
  drm/radeon: fix runpm disabling on non-PX harder
  drm/ttm: don't oops if no invalidate_caches()
  drm/ttm: Work around performance regression with VM_PFNMAP

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c Kconfig fix from Wolfram Sang.

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: Remove usage of orphaned symbol OF_I2C

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"I know this is a bit more than you want to see, and I've told the
  wireless folks under no uncertain terms that they must severely scale
  back the extent of the fixes they are submitting this late in the
  game.

  Anyways:

   1) vmxnet3's netpoll doesn't perform the equivalent of an ISR, which
      is the correct implementation, like it should.  Instead it does
      something like a NAPI poll operation.  This leads to crashes.

      From Neil Horman and Arnd Bergmann.

   2) Segmentation of SKBs requires proper socket orphaning of the
      fragments, otherwise we might access stale state released by the
      release callbacks.

      This is a 5 patch fix, but the initial patches are giving
      variables and such significantly clearer names such that the
      actual fix itself at the end looks trivial.

      From Michael S.  Tsirkin.

   3) TCP control block release can deadlock if invoked from a timer on
      an already "owned" socket.  Fix from Eric Dumazet.

   4) In the bridge multicast code, we must validate that the
      destination address of general queries is the link local all-nodes
      multicast address.  From Linus Lüssing.

   5) The x86 BPF JIT support for negative offsets puts the parameter
      for the helper function call in the wrong register.  Fix from
      Alexei Starovoitov.

   6) The descriptor type used for RTL_GIGA_MAC_VER_17 chips in the
      r8169 driver is incorrect.  Fix from Hayes Wang.

   7) The xen-netback driver tests skb_shinfo(skb)->gso_type bits to see
      if a packet is a GSO frame, but that's not the correct test.  It
      should use skb_is_gso(skb) instead.  Fix from Wei Liu.

   8) Negative msg->msg_namelen values should generate an error, from
      Matthew Leach.

   9) at86rf230 can deadlock because it takes the same lock from it's
      ISR and it's hard_start_xmit method, without disabling interrupts
      in the latter.  Fix from Alexander Aring.

  10) The FEC driver's restart doesn't perform operations in the correct
      order, so promiscuous settings can get lost.  Fix from Stefan
      Wahren.

  11) Fix SKB leak in SCTP cookie handling, from Daniel Borkmann.

  12) Reference count and memory leak fixes in TIPC from Ying Xue and
      Erik Hugne.

  13) Forced eviction in inet_frag_evictor() must strictly make sure all
      frags are deleted, otherwise module unload (f.e.  6lowpan) can
      crash.  Fix from Florian Westphal.

  14) Remove assumptions in AF_UNIX's use of csum_partial() (which it
      uses as a hash function), which breaks on PowerPC.  From Anton
      Blanchard.

      The main gist of the issue is that csum_partial() is defined only
      as a value that, once folded (f.e.  via csum_fold()) produces a
      correct 16-bit checksum.  It is legitimate, therefore, for
      csum_partial() to produce two different 32-bit values over the
      same data if their respective alignments are different.

  15) Fix endiannes bug in MAC address handling of ibmveth driver, also
      from Anton Blanchard.

  16) Error checks for ipv6 exthdrs offload registration are reversed,
      from Anton Nayshtut.

  17) Externally triggered ipv6 addrconf routes should count against the
      garbage collection threshold.  Fix from Sabrina Dubroca.

  18) The PCI shutdown handler added to the bnx2 driver can wedge the
      chip if it was not brought up earlier already, which in particular
      causes the firmware to shut down the PHY.  Fix from Michael Chan.

  19) Adjust the sanity WARN_ON_ONCE() in qdisc_list_add() because as
      currently coded it can and does trigger in legitimate situations.
      From Eric Dumazet.

  20) BNA driver fails to build on ARM because of a too large udelay()
      call, fix from Ben Hutchings.

  21) Fair-Queue qdisc holds locks during GFP_KERNEL allocations, fix
      from Eric Dumazet.

  22) The vlan passthrough ops added in the previous release causes a
      regression in source MAC address setting of outgoing headers in
      some circumstances.  Fix from Peter Boström"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits)
  ipv6: Avoid unnecessary temporary addresses being generated
  eth: fec: Fix lost promiscuous mode after reconnecting cable
  bonding: set correct vlan id for alb xmit path
  at86rf230: fix lockdep splats
  net/mlx4_en: Deregister multicast vxlan steering rules when going down
  vmxnet3: fix building without CONFIG_PCI_MSI
  MAINTAINERS: add networking selftests to NETWORKING
  net: socket: error on a negative msg_namelen
  MAINTAINERS: Add tools/net to NETWORKING [GENERAL]
  packet: doc: Spelling s/than/that/
  net/mlx4_core: Load the IB driver when the device supports IBoE
  net/mlx4_en: Handle vxlan steering rules for mac address changes
  net/mlx4_core: Fix wrong dump of the vxlan offloads device capability
  xen-netback: use skb_is_gso in xenvif_start_xmit
  r8169: fix the incorrect tx descriptor version
  tools/net/Makefile: Define PACKAGE to fix build problems
  x86: bpf_jit: support negative offsets
  bridge: multicast: enable snooping on general queries only
  bridge: multicast: add sanity check for general query destination
  tcp: tcp_release_cb() should release socket ownership
  ...

i2c: Remove usage of orphaned symbol OF_I2C

The symbol is an orphan, don't depend on it anymore.

Signed-off-by: Richard Weinberger <richard@nod.at>
[wsa: enhanced commit message]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Fixes: 687b81d083c0 (i2c: move OF helpers into the core)
Cc: stable@kernel.org

Merge branches 'pnp', 'acpi-init', 'acpi-sleep' and 'pm-cpufreq'

* pnp:
  PNP / ACPI: proper handling of ACPI IO/Memory resource parsing failures

* acpi-init:
  ACPI / init: Invoke early ACPI initialization later

* acpi-sleep:
  ACPI / sleep: Add extra checks for HW Reduced ACPI mode sleep states

* pm-cpufreq:
  cpufreq: Skip current frequency initialization for ->setpolicy drivers

ACPI / sleep: Add extra checks for HW Reduced ACPI mode sleep states

If the HW Reduced ACPI mode bit is set in the FADT, ACPICA uses
the optional sleep control and sleep status registers for making
the system enter sleep states (including S5), so it is not possible
to use system sleep states or power it off using ACPI if the HW
Reduced ACPI mode bit is set and those registers are not available.

For this reason, add a new function, acpi_sleep_state_supported(),
checking if the HW Reduced ACPI mode bit is set and whether or not
system sleep states are usable in that case in addition to checking
the return value of acpi_get_sleep_type_data() and make the ACPI
sleep setup routines use that function to check the availability of
system sleep states.

Among other things, this prevents the kernel from attempting to
use ACPI for powering off HW Reduced ACPI systems without the sleep
control and sleep status registers, because ACPI power off doesn't
have a chance to work on them. That allows alternative power off
mechanisms that may actually work to be used on those systems. The
affected machines include Dell Venue 8 Pro, Asus T100TA, Haswell
Desktop SDP and Ivy Bridge EP Demo depot.

References: https://bugzilla.kernel.org/show_bug.cgi?id=70931
Reported-by: Adam Williamson <awilliam@redhat.com>
Tested-by: Aubrey Li <aubrey.li@linux.intel.com>
Cc: 3.4+ <stable@vger.kernel.org> # 3.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

6lowpan: reassembly: un-export local functions

most of these are only used locally, make them static.
fold lowpan_expire_frag_queue into its caller, its small enough.

Cc: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Avoid unnecessary temporary addresses being generated

tmp_prefered_lft is an offset to ifp->tstamp, not now. Therefore
age needs to be added to the condition.

Age calculation in ipv6_create_tempaddr is different from the one
in addrconf_verify and doesn't consider ADDRCONF_TIMER_FUZZ_MINUS.
This can cause age in ipv6_create_tempaddr to be less than the one
in addrconf_verify and therefore unnecessary temporary address to
be generated.
Use age calculation as in addrconf_modify to avoid this.

Signed-off-by: Heiner Kallweit <heiner.kallweit@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: update OF PHY registeration

If the sh_eth device is registered using OF, then the driver
should call of_mdiobus_register() to register the PHYs described
in the devicetree and then use of_phy_connect() to connect the
PHYs to the device.

This ensures that any PHYs registered in the device tree are
appropriately connected to the parent devices nodes so that
the PHY drivers can access their OF properties.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

eth: fec: Fix lost promiscuous mode after reconnecting cable

If the Freescale fec is in promiscuous mode and network cable is
reconnected then the promiscuous mode get lost. The problem is caused
by a too soon call of set_multicast_list to re-enable promisc mode.
The FEC_R_CNTRL register changes are overwritten by fec_restart.

This patch fixes this by moving the call behind the init of FEC_R_CNTRL
register in fec_restart.

Successful tested on a i.MX28 board.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: set correct vlan id for alb xmit path

The commit d3ab3ffd1d728d7ee77340e7e7e2c7cfe6a4013e
(bonding: use rlb_client_info->vlan_id instead of ->tag)
remove the rlb_client_info->tag, but occur some issues,
The vlan_get_tag() will return 0 for success and -EINVAL for
error, so the client_info->vlan_id always be set to 0 if the
vlan_get_tag return 0 for success, so the client_info would
never get a correct vlan id.

We should only set the vlan id to 0 when the vlan_get_tag return error.

Fixes: d3ab3ffd1d7 (bonding: use rlb_client_info->vlan_id instead of ->tag)
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: fix lockdep splats

This patch fix a lockdep in the at86rf230 driver, otherwise we get:

[   30.206517] =================================
[   30.211078] [ INFO: inconsistent lock state ]
[   30.215647] 3.14.0-20140108-1-00994-g32e9426 #163 Not tainted
[   30.221660] ---------------------------------
[   30.226222] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[   30.232514] systemd-udevd/157 [HC1[1]:SC0[0]:HE0:SE1] takes:
[   30.238439]  (&(&lp->lock)->rlock){?.+...}, at: [<c03600f8>] at86rf230_isr+0x18/0x44
[   30.246621] {HARDIRQ-ON-W} state was registered at:
[   30.251728]   [<c0061ce4>] __lock_acquire+0x7a4/0x18d8
[   30.257135]   [<c0063500>] lock_acquire+0x68/0x7c
[   30.262071]   [<c0588820>] _raw_spin_lock+0x28/0x38
[   30.267203]   [<c0361240>] at86rf230_xmit+0x1c/0x144
[   30.272412]   [<c057ba6c>] mac802154_xmit_worker+0x88/0x148
[   30.278271]   [<c0047844>] process_one_work+0x274/0x404
[   30.283761]   [<c00484c0>] worker_thread+0x228/0x374
[   30.288971]   [<c004cfb8>] kthread+0xd0/0xe4
[   30.293455]   [<c000dac8>] ret_from_fork+0x14/0x2c
[   30.298493] irq event stamp: 8948
[   30.301963] hardirqs last  enabled at (8947): [<c00cb290>] __kmalloc+0xb4/0x110
[   30.309636] hardirqs last disabled at (8948): [<c00115d4>] __irq_svc+0x34/0x5c
[   30.317215] softirqs last  enabled at (8452): [<c0037324>] __do_softirq+0x1dc/0x264
[   30.325243] softirqs last disabled at (8439): [<c0037638>] irq_exit+0x80/0xf4

We use the lp->lock inside the isr of at86rf230, that's why we need the
irqsave spinlock calls.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: return nla_nest_end() instead of skb->len

nla_nest_end() already has return skb->len, so replace
return skb->len with return nla_nest_end instead().

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcm63xx_enet: Stop pretending to support netpoll

bcm_enet_netpoll does not exist, and causing
bcm63xx_net to fail to build when NET_POLL_CONTROLLER
is defined.

Remove the bogus .ndo_poll_controller = bcm_enet_netpoll

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'napi_budget_zero'

Eric W. Biederman says:

====================
Don't receive packets when the napi budget == 0

To the best of understanding processing any received packets when the
napi budget == 0 is broken driver behavior. At the same time I don't
think we have ever cared before so there are a handful of drivers that
need fixes.

I care now as I will shortly be using htis in netpoll to get the
tx queue processing without the rx queue processing.

Drivers that need fixes are few and far between, and so far I have only
found two of them. More similar patches later if I find more drivers
that need fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

8139cp: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Don't receive packets when the napi budget == 0

Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.

This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

consolidate duplicate code is skb_checksum_setup() helpers

consolidate duplicate code is skb_checksum_setup() helpers

Realizing that the skb_maybe_pull_tail() calls in the IP-protocol
specific portions of both helpers are terminal ones (i.e. no further
pulls are expected), their maximum size to be pulled can be made match
their minimal size needed, thus making the code identical and hence
possible to be moved into another helper.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to igb, e1000e, ixgbe and ixgbevf.

Tom Herbert provides changes to e1000e, igb and ixgbe to call skb_set_hash()
to set the hash and its type in an skbuff.

Carolyn provides a fix for igb where using ethtool for EEE settings, which
was not working correctly.

Jacob provides some trivial cleanups and fixes for ixgbe which mainly
dealt with the file headers.

Julia Lawall provides a one fix for ixgbevf where the driver did not need
to adjust the power state on suspend, so the call to pci_set_power_state()
in the resume function was a no-op.

v2:
- dropped patches 4-6 from original series which implemented debugfs for
igb from Carolyn based on feed back from David Miller and Or Gerlitz
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next

John W. Linville says:

====================
For the mac80211 bits, Johannes says:

"This time, I have a number of small fixes and improvements, and those
are fairly straight-forward. More interesting changes come from Luca
with some preparations for the CSA work, mostly around interface/channel
combinations checking. One other possibly interesting change is a small
one by myself to add NAPI support back to mac80211, which can help
improve TCP behaviour through GRO."

For the Bluetooth bits, Gustavo says:

"This is our first pull request for 3.15, the main feature here is the addition of
the privacy feature for low energy devices. Other than that we have a bunch of small
improvements, fixes, and clean ups all over the tree."

And...

"Another pull request to 3.15. Here we have the second part of the LE private
feature, the LE auto-connect feature and improvements to the power off
procedures. The rest are small improvements, clean up, and fixes."

For the iwlwifi bits, Emmanuel says:

"I have here a whole bunch of various things. Trivial cleanups,
debugfs handlers and new stuff for the new generation of devices
along with new capabilities for monitor mode. We also have support
for power save for dual interface mode, but that is not supported by
the firmware currently available."

And for the Atheros bits, Kalle says:

"For ath10k Alexander did some cleanup to PCI error cases and switched
ath10k to use pci_enable_msi_range(). Michal implemented AP CSA support
and sta_rc_update() operation. I enabled firmware "STA quick kickout"
functionality for faster detection of disappeared clients.

Also there are lots of small fixes to everywhere from various people."

I pulled the wireless tree to avoid some merge conflicts, and I
reverted the staging patch that I had mistakenly merged previously.
Along with that, mwifiex, brcmfmac, wil6210, ath9k, and a few other
drivers get their usual round of updates. Also notable is the addition
of yet another driver in the rtlwifi family.

...

I have amended this commit request to correct the build problems in
staging, including a warning added to one of the staging drivers by
the wireless-next tree. I also included a fix from Larry Finger to
address an issue found in rtl8723be by Dan Carpenter and smatch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-next'

Hariprasad Shenai says:

====================
Misc. fixes for cxgb4

This patch series provides miscelleneous fixes for Chelsio T4/T5 adapters
cxgb4 driver related to SGE and MTU.

Also fixes regression in LSO calcuation path.
("cxgb4: Calculate len properly for LSO path")

The patches series is created against David Miller's 'net-next' tree.
And includes patches on cxgb4 driver.

We would like to request this patch series to get merged via David Miller's
'net-next' tree.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Calculate len properly for LSO path

Commit 0034b29 ("cxgb4: Don't assume LSO only uses SGL path in t4_eth_xmit()")
introduced a regression where-in length was calculated wrongly for LSO path,
causing chip hangs.
So, correct the calculation of len.

Fixes: 0034b29 ("cxgb4: Don't assume LSO only uses SGL path in t4_eth_xmit()")
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Updates for T5 SGE's Egress Congestion Threshold

Based on original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Rectify emitting messages about SGE Ingress DMA channels being potentially stuck

Based on original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add code to dump SGE registers when hitting idma hangs

Based on original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Fix some small bugs in t4_sge_init_soft() when our Page Size is 64KB

We'd come in with SGE_FL_BUFFER_SIZE[0] and [1] both equal to 64KB and the
extant logic would flag that as an error.

Based on original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem

Conflicts:
drivers/net/wireless/ath/ath9k/recv.c

rtlwifi: rtl8723be: Fix array dimension problems

Commit a619d1abe20c leads to the following static checker warning:

drivers/net/wireless/rtlwifi/rtl8723be/phy.c:667 _rtl8723be_store_tx_power_by_rate()
error: buffer overflow 'rtlphy->tx_power_by_rate_offset[band]' 4 <= 5

This warning arises because the code is testing the indices for the wrong maximum
values. In addition, the tests merely putput a warning, and then procedes to
corrupt memory. With this change, any such invalid memory access is avoided.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

wlan-ng: fixup staging driver for removal of ieee80211_dsss_chan_to_freq

Commit 3ebe8e257307 ("ieee80211: remove function
ieee80211_{dsss_chan_to_freq, freq_to_dsss_chan}") removed
ieee80211_dsss_chan_to_freq, but it neglected to account for this
staging driver...

Cc: Zhao, Gang <gamerh2o@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

rtl8821ae: fixup staging driver for revised ieee80211_is_robust_mgmt_frame

Commit d8ca16db6bb2 ("mac80211: add length check in
ieee80211_is_robust_mgmt_frame()") changed that API to take an skb,
and added "_ieee80211_is_robust_mgmt_frame" as a direct replacement
for the older API. This is the same fix that was applied to the other
rtlwifi drivers in that commit.

Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

net/mlx4_en: Deregister multicast vxlan steering rules when going down

When mlx4_en_stop_port() is called, we need to deregister also the
tunnel steering rules that relate to multicast.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: remove NULL check in sctp_assoc_update_retran_path

This is basically just to let Coverity et al shut up. Remove an
unneeded NULL check in sctp_assoc_update_retran_path().

It is safe to remove it, because in sctp_assoc_update_retran_path()
we iterate over the list of transports, our own transport which is
asoc->peer.retran_path included. In the iteration, we skip the
list head element and transports in state SCTP_UNCONFIRMED.

Such transports came from peer addresses received in INIT/INIT-ACK
address parameters. They are not yet confirmed by a heartbeat and
not available for data transfers.

We know however that in the list of transports, even if it contains
such elements, it at least contains our asoc->peer.retran_path as
well, so even if next to that element, we only encounter
SCTP_UNCONFIRMED transports, we are always going to fall back to
asoc->peer.retran_path through sctp_trans_elect_best(), as that is
for sure not SCTP_UNCONFIRMED as per fbdf501c9374 ("sctp: Do no
select unconfirmed transports for retransmissions").

Whenever we call sctp_trans_elect_best() it will give us a non-NULL
element back, and therefore when we break out of the loop, we are
guaranteed to have a non-NULL transport pointer, and can remove
the NULL check.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vmxnet3: fix building without CONFIG_PCI_MSI

Since commit d25f06ea466e "vmxnet3: fix netpoll race condition",
the vmxnet3 driver fails to build when CONFIG_PCI_MSI is disabled,
because it unconditionally references the vmxnet3_msix_rx()
function.

To fix this, use the same #ifdef in the caller that exists around
the function definition.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
Cc: "VMware, Inc." <pv-drivers@vmware.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: stable@vger.kernel.org
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: add networking selftests to NETWORKING

Add it to NETWORKING [GENERAL] to make sure patches for selftests
go to the netdev list as well.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "Revert "Staging: rtl8812ae: remove modules field of rate_control_ops""

This reverts commit 161d7855543520cde5f49df788b0ea0553a9f83a.

Reversal of fortune -- I thought this was going to be resolved by other
means, but that hasn't materialized. Plus, apparently we now care more
than I realized about not breaking staging drivers...

Signed-off-by: John W. Linville <linville@tuxdriver.com>

netfilter: Convert uses of __constant_<foo> to <foo>

The use of __constant_<foo> has been unnecessary for quite awhile now.

Make these uses consistent with the rest of the kernel.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge tag 'ttm-fixes-3.14-2014-03-12' of git://people.freedesktop.org/~thomash/linux into drm-fixes

Second pull request of 2014-03-12. The first one was requested to be canceled.

Rob's fix for oops on invalidate_caches() and a fix for a
performance regression.

* tag 'ttm-fixes-3.14-2014-03-12' of git://people.freedesktop.org/~thomash/linux:
drm/ttm: don't oops if no invalidate_caches()
drm/ttm: Work around performance regression with VM_PFNMAP