git.proxmox.com Git - mirror_ubuntu-kernels.git/log

tcp: add generic netlink support for tcp_metrics

Add support for genl "tcp_metrics". No locking
is changed, only that now we can unlink and delete
entries after grace period. We implement get/del for
single entry and dump to support show/flush filtering
in user space. Del without address attribute causes
flush for all addresses, sadly under genl_mutex.

v2:
- remove rcu_assign_pointer as suggested by Eric Dumazet,
it is not needed because there are no other writes under lock
- move the flushing code in tcp_metrics_flush_all

v3:
- remove synchronize_rcu on flush as suggested by Eric Dumazet

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: remove old init remnant

Remove a for loop that does nothing in ixgbe_probe().
This is a remnant from when we had IO bars (compare to the ixgb code).

Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: Supported and Advertised Pause Frame

This patch add ethtool supports for Supported and Advertised Pause Frame,
based on Adapter Flow Control settings.

Signed-off-by: Akeem G. Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: reduce Rx header size

Reduce skb truesize by 256 bytes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: Add loopback test support for i210

Early release of i210 devices had the loopback test of the ethtool
self-test disabled. This patch enables the loopback test for i210 devices.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries

lsof reports some of socket descriptors as "can't identify protocol" like:

    [yamato@localhost]/tmp% sudo lsof | grep dbus | grep iden
    dbus-daem   652          dbus    6u     sock ... 17812 can't identify protocol
    dbus-daem   652          dbus   34u     sock ... 24689 can't identify protocol
    dbus-daem   652          dbus   42u     sock ... 24739 can't identify protocol
    dbus-daem   652          dbus   48u     sock ... 22329 can't identify protocol
    ...

lsof cannot resolve the protocol used in a socket because procfs
doesn't provide the map between inode number on sockfs and protocol
type of the socket.

For improving the situation this patch adds an extended attribute named
'system.sockprotoname' in which the protocol name for
/proc/PID/fd/SOCKET is stored. So lsof can know the protocol for a
given /proc/PID/fd/SOCKET with getxattr system call.

A few weeks ago I submitted a patch for the same purpose. The patch
was introduced /proc/net/sockfs which enumerates inodes and protocols
of all sockets alive on a system. However, it was rejected because (1)
a global lock was needed, and (2) the layout of struct socket was
changed with the patch.

This patch doesn't use any global lock; and doesn't change the layout
of any structs.

In this patch, a protocol name is stored to dentry->d_name of sockfs
when new socket is associated with a file descriptor. Before this
patch dentry->d_name was not used; it was just filled with empty
string. lsof may use an extended attribute named
'system.sockprotoname' to retrieve the value of dentry->d_name.

It is nice if we can see the protocol name with ls -l
/proc/PID/fd. However, "socket:[#INODE]", the name format returned
from sockfs_dname() was already defined. To keep the compatibility
between kernel and user land, the extended attribute is used to
prepare the value of dentry->d_name.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch

ieee802154: MRF24J40 driver

Driver for the Microchip MRF24J40 802.15.4 WPAN module.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add INET dependency on aes crypto for the sake of TCP fastopen.

Stephen Rothwell says:

====================
After merging the final tree, today's linux-next build (powerpc
ppc44x_defconfig) failed like this:

net/built-in.o: In function `tcp_fastopen_ctx_free':
tcp_fastopen.c:(.text+0x5cc5c): undefined reference to `crypto_destroy_tfm'
net/built-in.o: In function `tcp_fastopen_reset_cipher':
(.text+0x5cccc): undefined reference to `crypto_alloc_base'
net/built-in.o: In function `tcp_fastopen_reset_cipher':
(.text+0x5cd6c): undefined reference to `crypto_destroy_tfm'

Presumably caused by commit 104671636897 ("tcp: TCP Fast Open Server -
header & support functions") from the net-next tree. I assume that some
dependency on the CRYPTO infrastructure is missing.

I have reverted commit 1bed966cc3bd ("Merge branch
'tcp_fastopen_server'") for today.
====================

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: use list_move_tail instead of list_del/list_add_tail

Using list_move_tail() instead of list_del() + list_add_tail().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

usbnet: drop unneeded check for NULL

usbnet_start_xmit() is always called with a valid skb

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Increase maximum number of datapath ports.

Use hash table to store ports of datapath. Allow 64K ports per switch.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>

Merge branch 'master' of git://1984.lsi.us.es/nf-next

tcp: use PRR to reduce cwin in CWR state

Use proportional rate reduction (PRR) algorithm to reduce cwnd in CWR state,
in addition to Recovery state. Retire the current rate-halving in CWR.
When losses are detected via ACKs in CWR state, the sender enters Recovery
state but the cwnd reduction continues and does not restart.

Rename and refactor cwnd reduction functions since both CWR and Recovery
use the same algorithm:
tcp_init_cwnd_reduction() is new and initiates reduction state variables.
tcp_cwnd_reduction() is previously tcp_update_cwnd_in_recovery().
tcp_ends_cwnd_reduction() is previously tcp_complete_cwr().

The rate halving functions and logic such as tcp_cwnd_down(), tcp_min_cwnd(),
and the cwnd moderation inside tcp_enter_cwr() are removed. The unused
parameter, flag, in tcp_cwnd_reduction() is also removed.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: move tcp_update_cwnd_in_recovery

To prepare replacing rate halving with PRR algorithm in CWR state.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: move tcp_enter_cwr()

To prepare replacing rate halving with PRR algorithm in CWR state.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sierra_net: rx_urb_size is constant

The rx_urb_size is set to the same value for every device
supported by this driver. No need to keep a per-device
data structure to do that. Replacing with a macro constant.

This was the last device specific info, and removing it
allows us to delete the sierra_net_info_data struct.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sierra_net: make private symbols static

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cx82310_eth: use common match macro

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next

This merges (3f509c6 netfilter: nf_nat_sip: fix incorrect handling
of EBUSY for RTCP expectation) to Patrick McHardy's IPv6 NAT changes.

netfilter: properly annotate ipv4_netfilter_{init,fini}()

Despite being just a few bytes of code, they should still have proper
annotations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_queue()

Since 'list_for_each_continue_rcu' has already been replaced by
'list_for_each_entry_continue_rcu', pass 'list_head' to nf_queue() as a
parameter can not benefit us any more.

This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of
nf_queue() and __nf_queue() to save code.

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_iterate()

Since 'list_for_each_continue_rcu' has already been replaced by
'list_for_each_entry_continue_rcu', pass 'list_head' to nf_iterate() as a
parameter can not benefit us any more.

This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of
nf_iterate() to save code.

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: remove xt_NOTRACK

It was scheduled to be removed for a long time.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netfilter@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack: add nf_ct_timeout_lookup

This patch adds the new nf_ct_timeout_lookup function to encapsulate
the timeout policy attachment that is called in the nf_conntrack_in
path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: xt_CT: refactorize xt_ct_tg_check

This patch adds xt_ct_set_helper and xt_ct_set_timeout to reduce
the size of xt_ct_tg_check.

This aims to improve code mantainability by splitting xt_ct_tg_check
in smaller chunks.

Suggested by Eric Dumazet.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: xt_socket: fix compilation warnings with gcc 4.7

This patch fixes compilation warnings in xt_socket with gcc-4.7.

In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_mt6_v1’:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:265:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:265:9: note: ‘dport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:264:27: note: ‘saddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:264:19: note: ‘daddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_match.isra.4’:
include/net/netfilter/nf_tproxy_core.h:75:2: warning: ‘protocol’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:113:5: note: ‘protocol’ was declared here
In file included from include/net/tcp.h:37:0,
                 from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:45: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:112:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:106:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:112:9: note: ‘dport’ was declared here
In file included from include/net/tcp.h:37:0,
                 from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:15: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:111:16: note: ‘saddr’ was declared here
In file included from include/net/tcp.h:37:0,
                 from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:15: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:111:9: note: ‘daddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_mt6_v1’:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:268:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:268:9: note: ‘dport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:267:27: note: ‘saddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:267:19: note: ‘daddr’ was declared here

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

6lowpan: handle NETDEV_UNREGISTER event

Before, it was impossible to remove a wpan device which had lowpan
attached to it.

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@tempietto.lan>

6lowpan: Make a copy of skb's delivered to 6lowpan

Since lowpan_process_data() modifies the skb (by calling skb_pull()), we
need our own copy so that it doesn't affect the data received by other
protcols (in this case, af_ieee802154).

Signed-off-by: Alan Ott <alan@signal11.us>
Signed-off-by: David S. Miller <davem@tempietto.lan>

Merge branch 'tcp_fastopen_server'

Jerry Chu says:

====================
This patch series provides the server (passive open) side code
for TCP Fast Open. Together with the earlier client side patches
it completes the TCP Fast Open implementation.

The server side Fast Open code accepts data carried in the SYN
packet with a valid Fast Open cookie, and passes it to the
application right away, allowing application to send back response
data, all before TCP's 3-way handshake finishes.

A simple cookie scheme together with capping the number of
outstanding TFO requests (still in TCP_SYN_RECV state) to a limit
per listener forms the main line of defense against spoofed SYN
attacks.

For more details about TCP Fast Open see our IETF internet draft
at http://www.ietf.org/id/draft-ietf-tcpm-fastopen-01.txt
and a research paper at
http://conferences.sigcomm.org/co-next/2011/papers/1569470463.pdf

A prototype implementation was first developed by Sivasankar
Radhakrishnan (sivasankar@cs.ucsd.edu).

A patch based on an older version of Linux kernel has been
undergoing internal tests at Google for the past few months.

Jerry Chu (3):
  tcp: TCP Fast Open Server - header & support functions
  tcp: TCP Fast Open Server - support TFO listeners
  tcp: TCP Fast Open Server - main code path
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: TCP Fast Open Server - main code path

This patch adds the main processing path to complete the TFO server
patches.

A TFO request (i.e., SYN+data packet with a TFO cookie option) first
gets processed in tcp_v4_conn_request(). If it passes the various TFO
checks by tcp_fastopen_check(), a child socket will be created right
away to be accepted by applications, rather than waiting for the 3WHS
to finish.

In additon to the use of TFO cookie, a simple max_qlen based scheme
is put in place to fend off spoofed TFO attack.

When a valid ACK comes back to tcp_rcv_state_process(), it will cause
the state of the child socket to switch from either TCP_SYN_RECV to
TCP_ESTABLISHED, or TCP_FIN_WAIT1 to TCP_FIN_WAIT2. At this time
retransmission will resume for any unack'ed (data, FIN,...) segments.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: TCP Fast Open Server - support TFO listeners

This patch builds on top of the previous patch to add the support
for TFO listeners. This includes -

1. allocating, properly initializing, and managing the per listener
fastopen_queue structure when TFO is enabled

2. changes to the inet_csk_accept code to support TFO. E.g., the
request_sock can no longer be freed upon accept(), not until 3WHS
finishes

3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg()
if it's a TFO socket

4. properly closing a TFO listener, and a TFO socket before 3WHS
finishes

5. supporting TCP_FASTOPEN socket option

6. modifying tcp_check_req() to use to check a TFO socket as well
as request_sock

7. supporting TCP's TFO cookie option

8. adding a new SYN-ACK retransmit handler to use the timer directly
off the TFO socket rather than the listener socket. Note that TFO
server side will not retransmit anything other than SYN-ACK until
the 3WHS is completed.

The patch also contains an important function
"reqsk_fastopen_remove()" to manage the somewhat complex relation
between a listener, its request_sock, and the corresponding child
socket. See the comment above the function for the detail.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: TCP Fast Open Server - header & support functions

This patch adds all the necessary data structure and support
functions to implement TFO server side. It also documents a number
of flags for the sysctl_tcp_fastopen knob, and adds a few Linux
extension MIBs.

In addition, it includes the following:

1. a new TCP_FASTOPEN socket option an application must call to
supply a max backlog allowed in order to enable TFO on its listener.

2. A number of key data structures:
"fastopen_rsk" in tcp_sock - for a big socket to access its
request_sock for retransmission and ack processing purpose. It is
non-NULL iff 3WHS not completed.

"fastopenq" in request_sock_queue - points to a per Fast Open
listener data structure "fastopen_queue" to keep track of qlen (# of
outstanding Fast Open requests) and max_qlen, among other things.

"listener" in tcp_request_sock - to point to the original listener
for book-keeping purpose, i.e., to maintain qlen against max_qlen
as part of defense against IP spoofing attack.

3. various data structure and functions, many in tcp_fastopen.c, to
support server side Fast Open cookie operations, including
/proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying.

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: add D-Link DGE-560T identifiers.

This one includes a 8168. Not to be confused with the sky2 driven
one whose PCI vendor and device ID are the same.

Reported-by: Neyuki Inaya <in@joblog.ru>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add some slack to arp monitoring time limits

Currently, all the time limits in the bonding ARP monitor are in
multiples of arp_interval -- the time interval at which the ARP
monitor is periodically scheduled.

With a fast network round-trip and a little scheduling latency
of the ARP monitor work, a limit of n*delta_in_ticks may
effectively mean (n-1)*delta_in_ticks.

This is fatal in case of n==1 (the link will stay down
forever) and makes the behaviour non-deterministic in all the
other cases.

Add a delta_in_ticks/2 time slack to all the time limits.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: remove some deadcode

__ipv6_regen_rndid no longer returns anything other than 0
so there's no point in verifying what it returns

Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix documentation of skb_needs_linearize().

skb_needs_linearize() does not check highmem DMA as it does not call
illegal_highdma() anymore, so there is no need to mention highmem DMA here.

(Indeed, ~NETIF_F_SG flag, which is checked in skb_needs_linearize(), can
be set when illegal_highdma() returns true, and we are assured that
illegal_highdma() is invoked prior to skb_needs_linearize() as
skb_needs_linearize() is a static method called only once.
But ~NETIF_F_SG can be set not only there in this same invocation path.
It can also be set when can_checksum_protocol() returns false).

see commit 02932ce9e2c136e6fab2571c8e0dd69ae8ec9853,
Convert skb_need_linearize() to use precomputed features.
Signed-off-by: Rami Rosen <rosenr@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Minor logic clean-up in ipv4_mtu

In ipv4_mtu there is some logic where we are testing for a non-zero value
and a timer expiration, then setting the value to zero, and then testing if
the value is zero we set it to a value based on the dst. Instead of
bothering with the extra steps it is easier to just cleanup the logic so
that we set it to the dst based value if it is zero or if the timer has
expired.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>

net:atm:fix up ENOIOCTLCMD error handling

At commit 07d106d0, Linus pointed out that ENOIOCTLCMD should be
translated as ENOTTY to user mode.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net:stmmac: convert driver to use devm_request_and_ioremap.

This patch moves calls to ioremap and request_mem_region to
devm_request_and_ioremap call.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net:stmmac: Remove bus_id from mdio platform data.

This patch removes bus_id from mdio platform data, The reason to remove
bus_id is, stmmac mdio bus_id is always same as stmmac bus-id, so there
is no point in passing this in different variable.
Also stmmac ethernet driver connects to phy with bus_id passed its
platform data.
So, having single bus-id is much simpler.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net:stmmac: fix broken stmmac_pltfr_remove.

This patch fixes stmmac_pltfr_remove function, which is broken because,
it is accessing plat variable via freed memory priv pointer which gets
freed by free_netdev called from stmmac_dvr_remove.

In short this patch caches the plat pointer in local variable before
calling stmmac_dvr_remove to prevent code accessing freed memory.

Without this patch any attempt to remove the stmmac device will fail as
below:

Unregistering eth 0 ...
Unable to handle kernel paging request at virtual address 6b6b6bab
pgd = de5dc000
[6b6b6bab] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP
Modules linked in: cdev(O+)
CPU: 0    Tainted: G           O  (3.3.1_stm24_0210-b2000+ #25)
PC is at stmmac_pltfr_remove+0x2c/0xa0
LR is at stmmac_pltfr_remove+0x28/0xa0
pc : [<c01b8908>]    lr : [<c01b8904>]    psr: 60000013
sp : def6be78  ip : de6c5a00  fp : 00000000
r10: 00000028  r9 : c082d81d  r8 : 00000001
r7 : de65a600  r6 : df81b240  r5 : c0413fd8  r4 : 00000000
r3 : 6b6b6b6b  r2 : def6be6c  r1 : c0355e2b  r0 : 00000020
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c53c7d  Table: 5e5dc04a  DAC: 00000015
Process insmod (pid: 738, stack limit = 0xdef6a2f0)
Stack: (0xdef6be78 to 0xdef6c000)
be60:                                                       c0413fe0
c0403658
be80: c0400bb0 c019270c c01926f8 c0191478 00000000 c0414014 c0413fe0
c01914d8
bea0: 00000000 c0413fe0 df8045d0 c019109c c0413fe0 c0400bf0 c0413fd8
c018f04c
bec0: 00000000 bf000000 c0413fd8 c01929a0 c0413fd8 bf000000 00000000
c0192bfc
bee0: bf00009c bf000014 def6a000 c000859c 00000000 00000001 bf00009c
bf00009c
bf00: 00000001 bf00009c 00000001 bf0000e4 de65a600 00000001 c082d81d
c0058cd0
bf20: bf0000a8 c004fbd8 c0056414 c082d815 c02aea20 bf0001f0 00b0b008
e0846208
bf40: c03ec8a0 e0846000 0000db0d e0850604 e08504de e0853a24 00000204
000002d4
bf60: 00000000 00000000 0000001c 0000001d 00000009 00000000 00000006
00000000
bf80: 00000003 f63d4e2e 0000db0d bef02ed8 00000080 c000d2e8 def6a000
00000000
bfa0: 00000000 c000d140 f63d4e2e 0000db0d 00b0b018 0000db0d 00b0b008
b6f4f298
bfc0: f63d4e2e 0000db0d bef02ed8 00000080 00000003 00000000 00010000
00000000
bfe0: 00b0b008 bef02c64 00008d20 b6ef3784 60000010 00b0b018 5a5a5a5a
5a5a5a5a
[<c01b8908>] (stmmac_pltfr_remove+0x2c/0xa0) from [<c019270c>]
(platform_drv_remove+0x14/0x18)
[<c019270c>] (platform_drv_remove+0x14/0x18) from [<c0191478>]
(__device_release_driver+0x64/0xa4)
[<c0191478>] (__device_release_driver+0x64/0xa4) from [<c01914d8>]
(device_release_driver+0x20/0x2c)
[<c01914d8>] (device_release_driver+0x20/0x2c) from [<c019109c>]
(bus_remove_device+0xcc/0xdc)
[<c019109c>] (bus_remove_device+0xcc/0xdc) from [<c018f04c>]
(device_del+0x104/0x160)
[<c018f04c>] (device_del+0x104/0x160) from [<c01929a0>]
(platform_device_del+0x18/0x58)
[<c01929a0>] (platform_device_del+0x18/0x58) from [<c0192bfc>]
(platform_device_unregister+0xc/0x18)
[<c0192bfc>] (platform_device_unregister+0xc/0x18) from [<bf000014>]
(r_init+0x14/0x2c [cdev])
[<bf000014>] (r_init+0x14/0x2c [cdev]) from [<c000859c>]
(do_one_initcall+0x90/0x160)
[<c000859c>] (do_one_initcall+0x90/0x160) from [<c0058cd0>]
(sys_init_module+0x15c4/0x1794)
[<c0058cd0>] (sys_init_module+0x15c4/0x1794) from [<c000d140>]
(ret_fast_syscall+0x0/0x30)
Code: e1a04000 e59f0070 eb039b65 e59636e4 (e5933040)

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net:stmmac: Add check if mdiobus is registered in stmmac_mdio_unregister

This patch adds a basic check in stmmac_mdio_unregister to see if mdio
bus registeration for this driver was actually sucessfull or not.

Use case here is, if BSP considers using mdio-gpio bus along with stmmac
driver by passing mdio_bus_data as NULL in platform data.
Call to stmmac_mdio_register with mdio_bus_data as NULL returns 0, which
is a considered sucessfull call form stmmac. Then again when we unload
the driver we just call stmmac_mdio_unregister, this is were the actual
problem is stmmac-mdio code dont really know at this instance of calling
that stmmac_mdio_register was actually successful.

So Adding a check in stmmac_mdio_unregister is always safe.

Without this patch stmmac driver calls stmmac_mdio_register from
stmmac_release which Segfaults as mii bus was never registered at the
first point.

Originally the this bug was found when unloading an stmmac driver
instance which uses mdio-gpio for smi access.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

openvswitch: using kfree_rcu() to simplify the code

The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_unix: fix shutdown parameter checking

Return -EINVAL rather than 0 given an invalid "mode" parameter.

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

decnet: fix shutdown parameter checking

The allowed value of "how" is SHUT_RD/SHUT_WR/SHUT_RDWR (0/1/2),
rather than SHUTDOWN_MASK (3).

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Increase timeout for SYN segments

Commit 9ad7c049 ("tcp: RFC2988bis + taking RTT sample from 3WHS for
the passive open side") changed the initRTO from 3secs to 1sec in
accordance to RFC6298 (former RFC2988bis). This reduced the time till
the last SYN retransmission packet gets sent from 93secs to 31secs.

RFC1122 is stating that the retransmission should be done for at least 3
minutes, but this seems to be quite high.

  "However, the values of R1 and R2 may be different for SYN
  and data segments.  In particular, R2 for a SYN segment MUST
  be set large enough to provide retransmission of the segment
  for at least 3 minutes.  The application can close the
  connection (i.e., give up on the open attempt) sooner, of
  course."

This patch increases the value of TCP_SYN_RETRIES to the value of 6,
providing a retransmission window of 63secs.

The comments for SYN and SYNACK retries have also been updated to
describe the current settings. The same goes for the documentation file
"Documentation/networking/ip-sysctl.txt".

Signed-off-by: Alexander Bergmann <alex@linlab.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://1984.lsi.us.es/nf

netfilter: nf_conntrack: fix racy timer handling with reliable events

Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.

This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.

Tested-by: Oliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ixgbevf: Cleanup handling of configuration for jumbo frames

This change moves the code for notifying the PF of the VF maximum packet
size into the vf.c file. The main motivation behind this is that the vf.c
file is supposed to contain all of the messages used when communicating
with the PF.

In addition it creates a separate function for setting the Rx buffer size
so that we have on centralized area to review what buffer sizes will be
requested by the VF.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: Add suspend and resume support to the VF

This change adds PCI suspend and resume support to ixgbevf.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: update driver version number

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: cleanup - remove unnecessary variable

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: cleanup - remove inapplicable comment

Early Receive has been disabled in the driver so this comment is no longer
applicable.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: cleanup strict checkpatch check

CHECK: multiple assignments should be avoided

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: cleanup strict checkpatch MEMORY_BARRIER checks

Add comments to memory barriers per strict checkpatch.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: use correct type for read of 32-bit register

The POEMB register is 32 bits, not 16.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

bnx2x: Correct the ndo_poll_controller call

This patch correct poll_bnx2x (ndo_poll_controller call) which was not
functioning well with MSI-X.

Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Move netif_napi_add to the open call

Move netif_napi_add for all queues from the probe call to the open call, to
avoid the case that napi objects are added for queues that may eventually not
be initialized and activated. With the former behavior, the driver could crash
when netpoll was calling ndo_poll_controller.

Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: must use rcu protection while calling fib_lookup

Following lockdep splat was reported by Pavel Roskin :

[ 1570.586223] ===============================
[ 1570.586225] [ INFO: suspicious RCU usage. ]
[ 1570.586228] 3.6.0-rc3-wl-main #98 Not tainted
[ 1570.586229] -------------------------------
[ 1570.586231] /home/proski/src/linux/net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!
[ 1570.586233]
[ 1570.586233] other info that might help us debug this:
[ 1570.586233]
[ 1570.586236]
[ 1570.586236] rcu_scheduler_active = 1, debug_locks = 0
[ 1570.586238] 2 locks held by Chrome_IOThread/4467:
[ 1570.586240]  #0:  (slock-AF_INET){+.-...}, at: [<ffffffff814f2c0c>] release_sock+0x2c/0xa0
[ 1570.586253]  #1:  (fnhe_lock){+.-...}, at: [<ffffffff815302fc>] update_or_create_fnhe+0x2c/0x270
[ 1570.586260]
[ 1570.586260] stack backtrace:
[ 1570.586263] Pid: 4467, comm: Chrome_IOThread Not tainted 3.6.0-rc3-wl-main #98
[ 1570.586265] Call Trace:
[ 1570.586271]  [<ffffffff810976ed>] lockdep_rcu_suspicious+0xfd/0x130
[ 1570.586275]  [<ffffffff8153042c>] update_or_create_fnhe+0x15c/0x270
[ 1570.586278]  [<ffffffff815305b3>] __ip_rt_update_pmtu+0x73/0xb0
[ 1570.586282]  [<ffffffff81530619>] ip_rt_update_pmtu+0x29/0x90
[ 1570.586285]  [<ffffffff815411dc>] inet_csk_update_pmtu+0x2c/0x80
[ 1570.586290]  [<ffffffff81558d1e>] tcp_v4_mtu_reduced+0x2e/0xc0
[ 1570.586293]  [<ffffffff81553bc4>] tcp_release_cb+0xa4/0xb0
[ 1570.586296]  [<ffffffff814f2c35>] release_sock+0x55/0xa0
[ 1570.586300]  [<ffffffff815442ef>] tcp_sendmsg+0x4af/0xf50
[ 1570.586305]  [<ffffffff8156fc60>] inet_sendmsg+0x120/0x230
[ 1570.586308]  [<ffffffff8156fb40>] ? inet_sk_rebuild_header+0x40/0x40
[ 1570.586312]  [<ffffffff814f4bdd>] ? sock_update_classid+0xbd/0x3b0
[ 1570.586315]  [<ffffffff814f4c50>] ? sock_update_classid+0x130/0x3b0
[ 1570.586320]  [<ffffffff814ec435>] do_sock_write+0xc5/0xe0
[ 1570.586323]  [<ffffffff814ec4a3>] sock_aio_write+0x53/0x80
[ 1570.586328]  [<ffffffff8114bc83>] do_sync_write+0xa3/0xe0
[ 1570.586332]  [<ffffffff8114c5a5>] vfs_write+0x165/0x180
[ 1570.586335]  [<ffffffff8114c805>] sys_write+0x45/0x90
[ 1570.586340]  [<ffffffff815d2722>] system_call_fastpath+0x16/0x1b

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fsl_pq_mdio: add support for the Fman 1G MDIO controller

The MDIO controller on the Frame Manager (Fman) is compatible with the
QE and Gianfar MDIO controllers, but we don't care about the TBI because
the Ethernet drivers (FMD) take care of programming it.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fsl-pq-mdio: coalesce multiple memory allocations into one

Take advantage of the new mdiobus_alloc_size() function to combine three
different memory allocations into one. This also simplies the error
handling.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fsl_pq_mdio: streamline probing of MDIO nodes

Make the device tree probe function more data-driven, so that it no longer
searches the 'compatible' property more than once.  The of_device_id[] array
allows for per-entry private data, so we use that to store details about each
type of node that the driver supports.  This removes the need to check the
'compatible' property inside the probe function.

The driver supports four types on MDIO devices:

1) Gianfar MDIO nodes that only map the MII registers
2) Gianfar MDIO nodes that map the full MDIO register set
3) eTSEC2 MDIO nodes (which map the full MDIO register set)
4) QE MDIO nodes (which map only the MII registers)

Gianfar, eTSEC2, and QE have different mappings for the TBIPA register, which
is needed to initialize the TBI PHY.  In addition, the QE needs a special
hack because of the way the device tree is ordered.

All of this information is encapsulated in the fsl_pq_mdio_data structure,
so when an MDIO node is probed, per-device data and functions are used
to determine how to initialize the device.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fsl_pq_mdio: various small fixes

1) Replace printk with dev_err

2) Fix some whitespace mistakes

3) Rename "ofdev" to "pdev", since it's a platform_device now

4) Fix an inadvertent compound statement by replacing commas with semicolons

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fsl_pq_mdio: merge some functions together

A few small functions were called only by other functions in the same
file, so merge them together. One function, for example, was calculating
the device address even though the caller was doing the same thing.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fsl_pq_mdio: trim #include statements

Remove several unnecessary #include statements.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/freescale: do not export any functions from fsl_pq_mdio.c

None of the functions in fsl_pq_mdio.c are used by any other source file,
so there's no point in exporting them. Merge the header file into the
source file, make all the functions static, remove any EXPORT_SYMBOL
statements, and delete any #include "fsl_pq_mdio.h" statements.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: modify log msg for lack of privilege error

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fixup malloc/free of adapter->pmac_id

Free was missing and kcalloc() is better placed in be_ctrl_init()

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix FW default for VF tx-rate

BE3 FW initializes VF tx-rate to 100Mbps. Fix this to 10Gbps.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix max VFs reported by HW

BE3 FW allocates VF resources for upto 30 VFs per PF while a max value of 32
may be reported via PCI config space. Fix this in the driver.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: create RSS rings even in multi-channel configs

Changes from commit df505e were incorrectly over-written by commit 10ef9ab.
Fixing the same.

Change log of the original fix:
    Currently RSS rings are not created in a multi-channel config.
    RSS rings can be created on one (out of four) interfaces per port in a
    multi-channel config. Doing this insulates the driver from a FW bug wherin
    multi-channel config is wrongly reported even when not enabled. This also
    helps performance in a multi-channel config, as one interface per port gets
    RSS rings.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/ieee802154: move ieee802154 drivers to net folder

The IEEE 802.15.4 standard represents a networking protocol. I don't
exactly know why drivers for this protocol are stored into the root
'driver' folder, but better will be to store them with other
networking stuff. Currently there are only 3 drivers available for
IEEE 802.15.4 stack, so lets do it now with the smallest overhead.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/ieee802154/at86rf230: replace the code under _init and _exit by macro

The code under _init and _exit functions is similar to the code of
module_spi_driver macro, which is a wrapper to the module_driver macro,
so use it instead.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Devendra Naga <develkernel412222@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: fix 57840_MF pci id

Commit c3def943c7117d42caaed3478731ea7c3c87190e have added support for
new pci ids of the 57840 board, while failing to change the obsolete value
in 'pci_ids.h'.
This patch does so, allowing the probe of such devices.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: add minlen validation for the new signed types

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net/ethernet/tundra/tsi108_eth.c: delete double assignment

Delete successive assignments to the same location.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression i;
@@

*i = ...;
i = ...;
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: prevent TX timeouts after reboot

This complements patch "net-forcedeth: fix TX timeout caused by TX
pause on down link" which ensures that a lock-up sequence is not sent
to the NIC. Present patch ensures that if a NIC is already locked-up,
the driver will recover from it when initializing the device.

It does the equivalent of the following recovery sequence:
- write NVREG_TX_PAUSEFRAME_ENABLE_V1 to eth1's register
   NvRegTxPauseFrame
- write NVREG_XMITCTL_START to eth1's register
   NvRegTransmitterControl
- write 0 to eth1's register NvRegTransmitterControl
(this is at the heart of the "unbricking" sequence mentioned in patch
"net-forcedeth: fix TX timeout caused by TX pause on down link")

Tested:
- hardware is MCP55 device id 10de:0373 (rev a3), dual-port
- reboot a kernel without any of patches mentioned
- freeze the NIC (details on description for commit "net-forcedeth:
   fix TX timeout caused by TX pause on down link")
- wait 5mn until ping hangs & TX timeout in dmesg
- reboot on kernel with present patch
- host is immediatly operational, no TX timeout

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: fix TX timeout caused by TX pause on down link

On some dual-port forcedeth devices such as MCP55 10de:0373 (rev a3),
when autoneg & TX pause are enabled while port is connected but
interface is down, the NIC will eventually freeze (TX timeouts,
network unreachable).

This patch ensures that TX pause is not configured in hardware when
interface is down. The TX pause request will be honored when interface
is later configured.

Tested:
- hardware is MCP55 device id 10de:0373 (rev a3), dual-port
- eth0 connected and UP, eth1 connected but DOWN
- without this patch, following sequence would brick NIC:
      ifconfig eth0 down
      ifconfig eth1 up
      ifconfig eth1 down
      ethtool -A eth1 autoneg off rx on tx off
      ifconfig eth1 up
      ifconfig eth1 down
      ethtool -A eth1 autoneg on rx on tx on
      ifconfig eth1 up
      ifconfig eth1 down
      ifup eth0
      sleep 120  # or longer
      ethtool eth1
   Just in case, sequence to un-brick:
      ifconfig eth0 down
      ethtool -A eth1 autoneg off rx on tx off
      ifconfig eth1 up
      ifconfig eth1 down
      ifup eth0
- with this patch: no TX timeout after "bricking" sequence above

Details:
- The following register accesses have been identified as the ones
   causing the NIC to freeze in "bricking" sequence above:
    - write NVREG_TX_PAUSEFRAME_ENABLE_V1 to eth1's register NvRegTxPauseFrame
    - write NVREG_MISC1_PAUSE_TX | NVREG_MISC1_FORCE to eth1's register NvRegMisc1
    - write 0 to eth1's register NvRegTransmitterControl
   This is what this patch avoids.

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: fix buffer overflow

Found by manual code inspection.

Tested: compile, reboot, ethtool -d ethX

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdev/phy: add MDIO bus multiplexer driven by a memory-mapped device

Add support for an MDIO bus multiplexer controlled by a simple memory-mapped
device, like an FPGA. The device must be memory-mapped and contain only
8-bit registers (which keeps things simple).

Tested on a Freescale P5020DS board which uses the "PIXIS" FPGA attached
to the localbus.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: ipmr_expire_timer causes crash when removing net namespace

When tearing down a net namespace, ipv4 mr_table structures are freed
without first deactivating their timers. This can result in a crash in
run_timer_softirq.
This patch mimics the corresponding behaviour in ipv6.
Locking and synchronization seem to be adequate.
We are about to kfree mrt, so existing code should already make sure that
no other references to mrt are pending or can be created by incoming traffic.
The functions invoked here do not cause new references to mrt or other
race conditions to be created.
Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
Both ipmr_expire_process (whose completion we may have to wait in
del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
or other synchronizations when needed, and they both only modify mrt.

Tested in Linux 3.4.8.

Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: DoS while TSO enabled caused by link partner with small MSS

With a low enough MSS on the link partner and TSO enabled locally, the
networking stack can periodically send a very large (e.g.  64KB) TCP
message for which the driver will attempt to use more Tx descriptors than
are available by default in the Tx ring.  This is due to a workaround in
the code that imposes a limit of only 4 MSS-sized segments per descriptor
which appears to be a carry-over from the older e1000 driver and may be
applicable only to some older PCI or PCIx parts which are not supported in
e1000e.  When the driver gets a message that is too large to fit across the
configured number of Tx descriptors, it stops the upper stack from queueing
any more and gets stuck in this state.  After a timeout, the upper stack
assumes the adapter is hung and calls the driver to reset it.

Remove the unnecessary limitation of using up to only 4 MSS-sized segments
per Tx descriptor, and put in a hard failure test to catch when attempting
to check for message sizes larger than would fit in the whole Tx ring.
Refactor the remaining logic that limits the size of data per Tx descriptor
from a seemingly arbitrary 8KB to a limit based on the dynamic size of the
Tx packet buffer as described in the hardware specification.

Also, fix the logic in the check for space in the Tx ring for the next
largest possible packet after the current one has been successfully queued
for transmit, and use the appropriate defines for default ring sizes in
e1000_probe instead of magic values.

This issue goes back to the introduction of e1000e in 2.6.24 when it was
split off from e1000.

Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Cc: Stable <stable@vger.kernel.org> [2.6.24+]
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

of/mdio-gpio: Simplify the way device tree support is implemented.

This patch cleans up the way device tree support is added in mdio-gpio
driver. I found lot of code duplication which is not necessary.
Also strangely a new platform driver was also introduced for device tree
support. All this forced me to do this cleanup patch.
After this patch, the driver probe checks the of_node pointer to get the
data from device tree.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

of/mdio: Add dummy functions in of_mdio.h.

This patch adds dummy functions in of_mdio.h, so that driver need not
ifdef there code with CONFIG_OF.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netpoll: provide an IP ident in UDP frames

Let's fill IP header ident field with a meaningful value,
it might help some setups.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: avoid to use synchronize_rcu in tunnel free function

Avoid to use synchronize_rcu in l2tp_tunnel_free because context may be
atomic.

Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: fix default tx vlan offload feature flag

Commit -
"b852b72 gianfar: fix bug caused by
87c288c6e9aa31720b72e2bc2d665e24e1653c3e"
disables by default (on mac init) the hw vlan tag insertion.
The "features" flags were not updated to reflect this, and
"ethtool -K" shows tx-vlan-offload to be "on" by default.

Cc: Sebastian Poehn <sebastian.poehn@belden.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation

We're hitting bug while trying to reinsert an already existing
expectation:

kernel BUG at kernel/timer.c:895!
invalid opcode: 0000 [#1] SMP
[...]
Call Trace:
<IRQ>
[<ffffffffa0069563>] nf_ct_expect_related_report+0x4a0/0x57a [nf_conntrack]
[<ffffffff812d423a>] ? in4_pton+0x72/0x131
[<ffffffffa00ca69e>] ip_nat_sdp_media+0xeb/0x185 [nf_nat_sip]
[<ffffffffa00b5b9b>] set_expected_rtp_rtcp+0x32d/0x39b [nf_conntrack_sip]
[<ffffffffa00b5f15>] process_sdp+0x30c/0x3ec [nf_conntrack_sip]
[<ffffffff8103f1eb>] ? irq_exit+0x9a/0x9c
[<ffffffffa00ca738>] ? ip_nat_sdp_media+0x185/0x185 [nf_nat_sip]

We have to remove the RTP expectation if the RTCP expectation hits EBUSY
since we keep trying with other ports until we succeed.

Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX

I'm slightly concerned by the "only in exceptional circumstances"
comment on __pskb_pull_tail but the structure of an skb just created
by netfront shouldn't hit any of the especially slow cases.

This approach still does slightly more work than the old way, since if
we pull up the entire first frag we now have to shuffle everything
down where before we just received into the right place in the first
place.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: xen-devel@lists.xensource.com
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dev: fix the incorrect hold of net namespace's lo device

When moving a net device from one net namespace to another
net namespace,dev_change_net_namespace calls NETDEV_DOWN
event,so the original net namespace's dst entries which
beloned to this net device will be put into dst_garbage
list.

then dev_change_net_namespace will set this net device's
net to the new net namespace.

If we unregister this net device's driver, this will trigger
the NETDEV_UNREGISTER_FINAL event, dst_ifdown will be called,
and get this net device's dst entries from dst_garbage list,
put these entries' dev to the new net namespace's lo device.

It's not what we want,actually we need these dst entries hold
the original net namespace's lo device,this incorrect device
holding will trigger emg message like below.
unregister_netdevice: waiting for lo to become free. Usage count = 1

so we should call NETDEV_UNREGISTER_FINAL event in
dev_change_net_namespace too,in order to make sure dst entries
already in the dst_garbage list, we need rcu_barrier before we
call NETDEV_UNREGISTER_FINAL event.

With help form Eric Dumazet.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nfnetlink_log: fix error return code in init path

Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ctnetlink: fix error return code in init path

Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: fix error return code

Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target

Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: nf_nat: support IPv6 in TFTP NAT helper

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: nf_nat: support IPv6 in IRC NAT helper

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>

netfilter: nf_nat: support IPv6 in SIP NAT helper

Add IPv6 support to the SIP NAT helper. There are no functional differences
to IPv4 NAT, just different formats for addresses.

Signed-off-by: Patrick McHardy <kaber@trash.net>