Richard Cochran [Fri, 21 Oct 2011 00:49:16 +0000 (00:49 +0000)]
dp83640: use proper function to free transmit time stamping packets
The previous commit enforces a new rule for handling the cloned packets
for transmit time stamping. These packets must not be freed using any other
function than skb_complete_tx_timestamp. This commit fixes the one and only
driver using this API.
The driver first appeared in v3.0.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: Do not use routes from locally generated RAs
When hybrid mode is enabled (accept_ra == 2), the kernel also sees RAs
generated locally. This is useful since it allows the kernel to auto-configure
its own interface addresses.
However, if 'accept_ra_defrtr' and/or 'accept_ra_rtr_pref' are set and the
locally generated RAs announce the default route and/or other route information,
the kernel happily inserts bogus routes with its own address as gateway.
With this patch, adding routes from an RA will be skiped when the RAs source
address matches any local address, just as if 'accept_ra_defrtr' and
'accept_ra_rtr_pref' were set to 0.
Signed-off-by: Andreas Hofmeister <andi@collax.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 24 Oct 2011 02:45:03 +0000 (02:45 +0000)]
be2net: don't create multiple RX/TX rings in multi channel mode
When the HW is in multi-channel mode based on the skew/IPL, there are
4 functions per port and so not enough resources to create multiple
RX/TX rings for each function.
Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 24 Oct 2011 02:45:02 +0000 (02:45 +0000)]
be2net: don't create multiple TXQs in BE2
Multiple TXQ support is partially broken in BE2. It is fully
supported BE3 onwards and in Lancer.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com> Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 24 Oct 2011 02:45:01 +0000 (02:45 +0000)]
be2net: refactor VF setup/teardown code into be_vf_setup/clear()
Currently the code for VF setup/teardown done by a PF (if_create,
mac_add_config, link_status_query etc) is scattered; this patch
refactors this code into be_vf_setup() and be_vf_clear(). The
if_create/if_destroy/mac_addr_query cmds are now called after the MCCQ
is created; so these cmds are now modified to use the MCCQ instead of
MBOX.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 24 Oct 2011 02:45:00 +0000 (02:45 +0000)]
be2net: add vlan/rx-mode/flow-control config to be_setup()
When a card is reset due to EEH error recovery or due to a suspend,
rx-mode config (promisc/mc) is not being sent to the FW. be_setup() is
called in these flows and is the best place for such config/re-config
cmds. Hence include rx-mode, vlan and flow-control config in
be_setup().
Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 23 Oct 2011 17:59:41 +0000 (17:59 +0000)]
net_sched: cls_flow: use skb_header_pointer()
Dan Siemon would like to add tunnelling support to cls_flow
This preliminary patch introduces use of skb_header_pointer() to help
this task, while avoiding skb head reallocation because of deep packet
inspection.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Wed, 19 Oct 2011 15:34:09 +0000 (15:34 +0000)]
ipv4: avoid useless call of the function check_peer_pmtu
In func ipv4_dst_check,check_peer_pmtu should be called only when peer is updated.
So,if the peer is not updated in ip_rt_frag_needed,we can not inc __rt_peer_genid.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dirk Eibach [Tue, 18 Oct 2011 03:04:11 +0000 (03:04 +0000)]
net: Fix driver name for mdio-gpio.c
Since commit
"7488876... dt/net: Eliminate users of of_platform_{,un}register_driver"
there are two platform drivers named "mdio-gpio" registered.
I renamed the of variant to "mdio-ofgpio".
Signed-off-by: Dirk Eibach <eibach@gdsys.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 24 Oct 2011 07:06:21 +0000 (03:06 -0400)]
ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
There is a long standing bug in linux tcp stack, about ACK messages sent
on behalf of TIME_WAIT sockets.
In the IP header of the ACK message, we choose to reflect TOS field of
incoming message, and this might break some setups.
Example of things that were broken :
- Routing using TOS as a selector
- Firewalls
- Trafic classification / shaping
We now remember in timewait structure the inet tos field and use it in
ACK generation, and route lookup.
Notes :
- We still reflect incoming TOS in RST messages.
- We could extend MuraliRaja Muniraju patch to report TOS value in
netlink messages for TIME_WAIT sockets.
- A patch is needed for IPv6
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
Renato Westphal noticed that since commit a2835763e130c343ace5320c20d33c281e7097b7
"rtnetlink: handle rtnl_link netlink notifications manually" was merged
we no longer send a netlink message when a networking device is moved
from one network namespace to another.
Fix this by adding the missing manual notification in dev_change_net_namespaces.
Since all network devices that are processed by dev_change_net_namspaces are
in the initialized state the complicated tests that guard the manual
rtmsg_ifinfo calls in rollback_registered and register_netdevice are
unnecessary and we can just perform a plain notification.
Cc: stable@kernel.org Tested-by: Renato Westphal <renatowestphal@gmail.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yan, Zheng [Sat, 22 Oct 2011 21:58:20 +0000 (21:58 +0000)]
ipv4: fix ipsec forward performance regression
There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
It makes xfrm4_fill_dst() modify wrong data structure.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Reported-by: Kim Phillips <kim.phillips@freescale.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If the device is down during suspend/resume, interrupts are enabled
without a registered interrupt handler, causing a storm of
unhandled interrupts until the IRQ is disabled because "nobody
cared".
Instead, check that the device is up before touching it in the
suspend/resume code.
Helped-by: Adrian Chadd <adrian@freebsd.org> Helped-by: Mohammed Shafi <shafi.wireless@gmail.com> Signed-off-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: David S. Miller <davem@davemloft.net>
Flavio Leitner [Mon, 24 Oct 2011 06:56:38 +0000 (02:56 -0400)]
route: fix ICMP redirect validation
The commit f39925dbde7788cfb96419c0f092b086aa325c0f
(ipv4: Cache learned redirect information in inetpeer.)
removed some ICMP packet validations which are required by
RFC 1122, section 3.2.2.2:
...
A Redirect message SHOULD be silently discarded if the new
gateway address it specifies is not on the same connected
(sub-) net through which the Redirect arrived [INTRO:2,
Appendix A], or if the source of the Redirect is not the
current first-hop gateway for the specified destination (see
Section 3.3.1).
Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
were designed to allow timestamping in PHY devices. The first
function, called during the MAC driver's hard_xmit method, identifies
PTP protocol packets, clones them, and gives them to the PHY device
driver. The PHY driver may hold onto the packet and deliver it at a
later time using the second function, which adds the packet to the
socket's error queue.
As pointed out by Johannes, nothing prevents the socket from
disappearing while the cloned packet is sitting in the PHY driver
awaiting a timestamp. This patch fixes the issue by taking a reference
on the socket for each such packet. In addition, the comments
regarding the usage of these function are expanded to highlight the
rule that PHY drivers must use skb_complete_tx_timestamp() to release
the packet, in order to release the socket reference, too.
These functions first appeared in v2.6.36.
Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Cc: <stable@vger.kernel.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones [Wed, 19 Oct 2011 08:10:59 +0000 (08:10 +0000)]
Add ethtool -g support to virtio_net
Add support for reporting ring sizes via ethtool -g to the virtio_net
driver.
Signed-off-by: Rick Jones <rick.jones2@hp.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 22 Oct 2011 07:29:53 +0000 (03:29 -0400)]
tg3: fix tigon3_dma_hwbug_workaround()
Ari got kernel panics using tg3 NIC, and bisected to 2669069aacc9 "tg3:
enable transmit time stamping."
This is because tigon3_dma_hwbug_workaround() might alloc a new skb and
free the original. We panic when skb_tx_timestamp() is called on freed
skb.
Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Carolyn Wyborny [Fri, 14 Oct 2011 00:13:49 +0000 (00:13 +0000)]
igb: VFTA Table Fix for i350 devices
Due to a hardware problem, writes to the VFTA register can
theoretically fail. Although the likelihood of this is very low.
This patch adds a shadow vfta in the adapter struct for reading
and adds new write functions for these devices to work around the problem.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Carolyn Wyborny [Thu, 13 Oct 2011 17:28:39 +0000 (17:28 +0000)]
igb: Fix for Alt MAC Address feature on 82580 and later devices
In 82580 and later devices, the alternate MAC address feature is
completely handled by the option ROM and software does not handle
it anymore. This patch changes the check_alt_mac_addr function to
exit immediately if device is 82580 or later.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Update adapter identification strings to properly indicate i350 VF devices
in the VF driver. Change the driver ID string to remove 82576-specific
wording. Update copyright date.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eric Dumazet [Fri, 21 Oct 2011 09:22:42 +0000 (05:22 -0400)]
tcp: add const qualifiers where possible
Adding const qualifiers to pointers can ease code review, and spot some
bugs. It might allow compiler to optimize code further.
For example, is it legal to temporary write a null cksum into tcphdr
in tcp_md5_hash_header() ? I am afraid a sniffer could catch the
temporary null value...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mihai Maruseac [Thu, 20 Oct 2011 20:45:10 +0000 (20:45 +0000)]
dev: use name hash for dev_seq_ops
Instead of using the dev->next chain and trying to resync at each call to
dev_seq_start, use the name hash, keeping the bucket and the offset in
seq->private field.
Tests revealed the following results for ifconfig > /dev/null
* 1000 interfaces:
* 0.114s without patch
* 0.089s with patch
* 3000 interfaces:
* 0.489s without patch
* 0.110s with patch
* 5000 interfaces:
* 1.363s without patch
* 0.250s with patch
* 128000 interfaces (other setup):
* ~100s without patch
* ~30s with patch
Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
On systems that create and delete lots of dynamic devices the
31bit linux ifindex fails to fit in the 16bit macvtap minor,
resulting in unusable macvtap devices. I have systems running
automated tests that that hit this condition in just a few days.
Use a linux idr allocator to track which mavtap minor numbers
are available and and to track the association between macvtap
minor numbers and macvtap network devices.
Remove the unnecessary unneccessary check to see if the network
device we have found is indeed a macvtap device. With macvtap
specific data structures it is impossible to find any other
kind of networking device.
Increase the macvtap minor range from 65536 to the full 20 bits
that is supported by linux device numbers. It doesn't solve the
original problem but there is no penalty for a larger minor
device range.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
macvtap: Rewrite macvtap_newlink so the error handling works.
Place macvlan_common_newlink at the end of macvtap_newlink because
failing in newlink after registering your network device is not
supported.
Move device_create into a netdevice creation notifier. The network device
notifier is the only hook that is called after the network device has been
registered with the device layer and before register_network_device returns
success.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
macvtap: Fix macvtap_open races in the zero copy enable code.
To see if it is appropriate to enable the macvtap zero copy feature
don't test the lowerdev network device flags. Instead test the
macvtap network device flags which are a direct copy of the lowerdev
flags. This is important because nothing holds a reference to lowerdev
and on a very bad day we lowerdev could be a pointer to stale memory.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
macvtap: Close a race between macvtap_open and macvtap_dellink.
There is a small window in macvtap_open between looking up a
networking device and calling macvtap_set_queue in which
macvtap_del_queues called from macvtap_dellink. After
calling macvtap_del_queues it is totally incorrect to
allow macvtap_set_queue to proceed so prevent success by
reporting that all of the available queues are in use.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Oct 2011 23:14:46 +0000 (23:14 +0000)]
virtio_net: fix truesize underestimation
We must account in skb->truesize, the size of the fragments, not the
used part of them.
Doing this work is important to avoid unexpected OOM situations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: "Michael S. Tsirkin" <mst@redhat.com> CC: virtualization@lists.linux-foundation.org CC: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Wed, 19 Oct 2011 23:01:48 +0000 (23:01 +0000)]
cxgbi: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: James Bottomley <James.Bottomley@suse.de> Cc: Karen Xie <kxie@chelsio.com> Cc: linux-scsi@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Wed, 19 Oct 2011 23:01:47 +0000 (23:01 +0000)]
cxgb4vf: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Casey Leedom <leedom@chelsio.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Wed, 19 Oct 2011 23:01:46 +0000 (23:01 +0000)]
cxgb4: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Dimitris Michailidis <dm@chelsio.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
net: allow CAP_NET_RAW to set socket options IP{,V6}_TRANSPARENT
Up till now the IP{,V6}_TRANSPARENT socket options (which actually set
the same bit in the socket struct) have required CAP_NET_ADMIN
privileges to set or clear the option.
- we make clearing the bit not require any privileges.
- we allow CAP_NET_ADMIN to set the bit (as before this change)
- we allow CAP_NET_RAW to set this bit, because raw
sockets already pretty much effectively allow you
to emulate socket transparency.
Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
RongQing Li [Tue, 18 Oct 2011 22:52:35 +0000 (22:52 +0000)]
igb: fix a compile warning
control these three function declarations and
definitions with same macro CONFIG_PCI_IOV
drivers/net/ethernet/intel/igb/igb_main.c:165:
warning: ‘igb_vf_configure’ declared ‘static’ but never defined
drivers/net/ethernet/intel/igb/igb_main.c:166:
warning: ‘igb_find_enabled_vfs’ declared ‘static’ but never defined
drivers/net/ethernet/intel/igb/igb_main.c:167:
warning: ‘igb_check_vf_assignment’ declared ‘static’ but never defined
Signed-off-by: RongQing Li <roy.qing.li@gmail.com> Acked-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2011 10:10:03 +0000 (10:10 +0000)]
myri10ge: fix truesize underestimation
skb->truesize must account for allocated memory, not the used part of
it. Doing this work is important to avoid unexpected OOM situations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jon Mason <mason@myri.com> Acked-by: Jon Mason <mason@myri.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2011 09:22:18 +0000 (09:22 +0000)]
igbvf: fix truesize underestimation
igbvf allocates half a page per skb fragment. We must account
PAGE_SIZE/2 increments on skb->truesize, not the actual frag length.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2011 21:00:21 +0000 (17:00 -0400)]
pktgen: remove ndelay() call
Daniel Turull reported inaccuracies in pktgen when using low packet
rates, because we call ndelay(val) with values bigger than 20000.
Instead of calling ndelay() for delays < 100us, we can instead loop
calling ktime_now() only.
Reported-by: Daniel Turull <daniel.turull@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2011 20:53:56 +0000 (16:53 -0400)]
tcp: use TCP_DEFAULT_INIT_RCVWND in tcp_fixup_rcvbuf()
Since commit 356f039822b (TCP: increase default initial receive
window.), we allow sender to send 10 (TCP_DEFAULT_INIT_RCVWND) segments.
Change tcp_fixup_rcvbuf() to reflect this change, even if no real change
is expected, since sysctl_tcp_rmem[1] = 87380 and this value
is bigger than tcp_fixup_rcvbuf() computed rcvmem (~23720)
Note: Since commit 356f039822b limited default window to maximum of
10*1460 and 2*MSS, we use same heuristic in this patch.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE()
-> we crash later because skb head is exhausted.
Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route
header_len in max_headroom calculation)
Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Timo Teräs <timo.teras@iki.fi> CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
fib_rules: fix unresolved_rules counting
r8169: fix wrong eee setting for rlt8111evl
r8169: fix driver shutdown WoL regression.
ehea: Change maintainer to me
pptp: pptp_rcv_core() misses pskb_may_pull() call
tproxy: copy transparent flag when creating a time wait
pptp: fix skb leak in pptp_xmit()
bonding: use local function pointer of bond->recv_probe in bond_handle_frame
smsc911x: Add support for SMSC LAN89218
tg3: negate USE_PHYLIB flag check
netconsole: enable netconsole can make net_device refcnt incorrent
bluetooth: Properly clone LSM attributes to newly created child connections
l2tp: fix a potential skb leak in l2tp_xmit_skb()
bridge: fix hang on removal of bridge via netlink
x25: Prevent skb overreads when checking call user data
x25: Handle undersized/fragmented skbs
x25: Validate incoming call user data lengths
udplite: fast-path computation of checksum coverage
IPVS netns shutdown/startup dead-lock
netfilter: nf_conntrack: fix event flooding in GRE protocol tracker
Ian Campbell [Thu, 20 Oct 2011 08:58:32 +0000 (04:58 -0400)]
mm: add a "struct page_frag" type containing a page, offset and length
A few network drivers currently use skb_frag_struct for this purpose but I have
patches which add additional fields and semantics there which these other uses
do not want.
A structure for reference sub-page regions seems like a generally useful thing
so do so instead of adding a network subsystem specific structure.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jens Axboe <jaxboe@fusionio.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Oct 2011 18:49:52 +0000 (18:49 +0000)]
mlx4_en: fix skb truesize underestimation
skb->truesize must account for allocated memory, not the used part of
it. Doing this work is important to avoid unexpected OOM situations.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Krishna Kumar [Wed, 19 Oct 2011 22:17:27 +0000 (22:17 +0000)]
virtio_net: Clean up set_skb_frag()
Remove manual initialization in set_skb_frag, and instead
use __skb_fill_page_desc() to do the same. Patch tested
on net-next.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hugh Dickins [Wed, 19 Oct 2011 19:50:35 +0000 (12:50 -0700)]
mm: fix race between mremap and removing migration entry
I don't usually pay much attention to the stale "? " addresses in
stack backtraces, but this lucky report from Pawel Sikora hints that
mremap's move_ptes() has inadequate locking against page migration.
mremap's down_write of mmap_sem, together with i_mmap_mutex or lock,
and pagetable locks, were good enough before page migration (with its
requirement that every migration entry be found) came in, and enough
while migration always held mmap_sem; but not enough nowadays, when
there's memory hotremove and compaction.
The danger is that move_ptes() lets a migration entry dodge around
behind remove_migration_pte()'s back, so it's in the old location when
looking at the new, then in the new location when looking at the old.
Either mremap's move_ptes() must additionally take anon_vma lock(), or
migration's remove_migration_pte() must stop peeking for is_swap_entry()
before it takes pagetable lock.
Consensus chooses the latter: we prefer to add overhead to migration
than to mremapping, which gets used by JVMs and by exec stack setup.
Ian Campbell [Tue, 18 Oct 2011 22:55:11 +0000 (22:55 +0000)]
net: do not take an additional reference in skb_frag_set_page
I audited all of the callers in the tree and only one of them (pktgen) expects
it to do so. Taking this reference is pretty obviously confusing and error
prone.
In particular I looked at the following commits which switched callers of
(__)skb_frag_set_page to the skb paged fragment api:
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com> Signed-off-by: RongQing <roy.qing.li@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 17 Oct 2011 21:04:20 +0000 (21:04 +0000)]
filter: use unsigned int to silence static checker warning
This is just a cleanup.
My testing version of Smatch warns about this:
net/core/filter.c +380 check_load_and_stores(6)
warn: check 'flen' for negative values
flen comes from the user. We try to clamp the values here between 1
and BPF_MAXINSNS but the clamp doesn't work because it could be
negative. This is a bug, but it's not exploitable.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: limit max_mtu in case of 4KiB and use __netdev_alloc_skb (V2)
Problem using big mtu around 4096 bytes is you end allocating (4096
+NET_SKB_PAD + NET_IP_ALIGN + sizeof(struct skb_shared_info) bytes ->
8192 bytes : order-1 pages
It's better to limit the mtu to SKB_MAX_HEAD(NET_SKB_PAD),
to have no more than one page per skb.
Also the patch changes the netdev_alloc_skb_ip_align() done in
init_dma_desc_rings() and uses a variant allowing GFP_KERNEL allocations
allowing the driver to load even in case of memory pressure.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enhances the STMMAC driver to support CHAINED mode of
descriptor.
STMMAC supports DMA descriptor to operate both in dual buffer(RING)
and linked-list(CHAINED) mode. In RING mode (default) each descriptor
points to two data buffer pointers whereas in CHAINED mode they point
to only one data buffer pointer.
In CHAINED mode each descriptor will have pointer to next descriptor in
the list, hence creating the explicit chaining in the descriptor itself,
whereas such explicit chaining is not possible in RING mode.
First version of this work has been done by Rayagond.
Then the patch has been reworked avoiding ifdef inside the C code.
A new header file has been added to define all the functions needed for
managing enhanced and normal descriptors.
In fact, these have to be specialized according to the ring/chain usage.
Two new C files have been also added to implement the helper routines
needed to manage: jumbo frames, chain and ring setup (i.e. desc3).
Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: allow mmc usage only if feature actually available (V4)
Enable the MMC support if it is actually available from the
HW capability register.
Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: use predefined macros for HW cap register fields (V4)
Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: allow mtu bigger than 1500 in case of normal desc (V4)
This patch allows to set the mtu bigger than 1500
in case of normal descriptors.
This is helping some SPEAr customers.
Signed-off-by: Deepak SIKRI <deepak.sikri@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a problem raised on Orly ARM SMP platform
where, in case of fragmented frames, the descriptors
in the TX ring resulted broken. This was due to a missing lock
protection in the tx process.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: Stop advertising 1000Base capabilties for non GMII iface (V4).
This patch stops advertising 1000Base capablities if GMAC is either
configured for MII or RMII mode and on board there is a GPHY plugged on.
Without this patch if an GBit switch is connected on MII interface,
Ethernet stops working at all.
Discovered as part of
https://bugzilla.stlinux.com/show_bug.cgi?id=14148 triage
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sysfs: Reject with a warning invalid uses of tagged directories.
sysfs is a core piece of ifrastructure that many people use and
few people have all of the rules in their head on how to use
it correctly. Add warnings for people using tagged directories
improperly to that any misuses can be caught and diagnosed quickly.
A single inexpensive test in sysfs_find_dirent is almost sufficient
to catch all possible misuses. An additional warning is needed
in sysfs_add_dirent so that we actually fail when attempting to
add an untagged dirent in a tagged directory.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
sysfs: Remove support for tagged directories with untagged members.
Now that /sys/class/net/bonding_masters is implemented as a tagged sysfs
file we can remove support for untagged files in tagged directories.
This change removes any ambiguity of what a NULL namespace value
means. A NULL namespace parameter after this patch means
that we are talking about an untagged sysfs dirent.
This makes the sysfs code much less prone to mistakes when during
maintenance.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: Use a per netns implementation of /sys/class/net/bonding_masters.
This fixes a network namespace misfeature that bonding_masters looked at
current instead of the remembering the context where in which
/sys/class/net/bonding_masters was opened in to see which network
namespace to act upon.
This removes the need for sysfs to handle tagged directories with
untagged members allowing for a conceptually simpler sysfs
implementation.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
class: Implement support for class attrs in tagged sysfs directories.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
sysfs: Implement support for tagged files in sysfs.
Looking up files in sysfs is hard to understand and analyize because we
currently allow placing untagged files in tagged directories. In the
implementation of that we have two subtly different meanings of NULL.
NULL meaning there is no tag on a directory entry and NULL meaning
we don't care which namespace the lookup is performed for. This
multiple uses of NULL have resulted in subtle bugs (since fixed)
in the code.
Currently it is only the bonding driver that needs to have an untagged
file in a tagged directory.
To untagle this mess I am adding support for tagged files to sysfs.
Modifying the bonding driver to implement bonding_masters as a tagged
file. Registering bonding_masters once for each network namespace.
Then I am removing support for untagged entries in tagged sysfs
directories.
Resulting in code that is much easier to reason about.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Flavio Leitner [Thu, 13 Oct 2011 07:21:23 +0000 (07:21 +0000)]
bonding: fix wrong port enabling in 802.3ad
The port shouldn't be enabled unless its current MUX
state is DISTRIBUTING which is correctly handled by
ad_mux_machine(), otherwise the packet sent can be
lost because the other end may not be ready.
The issue happens on every port initialization, but
as the ports are expected to move quickly to DISTRIBUTING,
it doesn't cause much problem. However, it does cause
constant packet loss if the other peer has the port
configured to stay in STANDBY (i.e. SYNC set to OFF).
Signed-off-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kjetil Oftedal [Wed, 19 Oct 2011 23:20:50 +0000 (16:20 -0700)]
sparc: Add alignment flag to PCI expansion resources
Currently no type of alignment is specified for PCI expansion roms while
parsing the openfirmware tree. This causes calls to pci_map_rom() to fail.
IORESOURCE_SIZEALIGN is the default alignment used for rom resouces in
pci/probe.c, and has been verified to work with various cards on a ultra 10.
Signed-off-By: Kjetil Oftedal <oftedal@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
françois romieu [Fri, 14 Oct 2011 00:57:45 +0000 (00:57 +0000)]
r8169: fix driver shutdown WoL regression.
Due to commit 92fc43b4159b518f5baae57301f26d770b0834c9 ("r8169: modify the
flow of the hw reset."), rtl8169_hw_reset stomps during driver shutdown on
RxConfig bits which are needed for WOL on some versions of the hardware.
As these bits were formerly set from the r81{0x, 68}_pll_power_down methods,
factor them out for use in the driver shutdown (rtl_shutdown) handler.
I favored __rtl8169_get_wol() -hardware state indication- over
RTL_FEATURE_WOL as the latter has become a good candidate for removal.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Hayes <hayeswang@realtek.com> Tested-by: Marc Ballarin <ballarin.marc@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Wed, 19 Oct 2011 21:00:35 +0000 (17:00 -0400)]
net: validate HWTSTAMP ioctl parameters
This patch adds a sanity check on the values provided by user space for
the hardware time stamping configuration. If the values lie outside of
the absolute limits, then the ioctl request will be denied.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
net: Move rcu_barrier from rollback_registered_many to netdev_run_todo.
This patch moves the rcu_barrier from rollback_registered_many
(inside the rtnl_lock) into netdev_run_todo (just outside the rtnl_lock).
This allows us to gain the full benefit of sychronize_net calling
synchronize_rcu_expedited when the rtnl_lock is held.
The rcu_barrier in rollback_registered_many was originally a synchronize_net
but was promoted to be a rcu_barrier() when it was found that people were
unnecessarily hitting the 250ms wait in netdev_wait_allrefs(). Changing
the rcu_barrier back to a synchronize_net is therefore safe.
Since we only care about waiting for the rcu callbacks before we get
to netdev_wait_allrefs() it is also safe to move the wait into
netdev_run_todo.
This was tested by creating and destroying 1000 tap devices and observing
/proc/lock_stat. /proc/lock_stat reports this change reduces the hold
times of the rtnl_lock by a factor of 10. There was no observable
difference in the amount of time it takes to destroy a network device.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Cc: Breno Leitao <leitao@linux.vnet.ibm.com> Acked-by: Breno Leitão <leitao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Thu, 13 Oct 2011 04:33:55 +0000 (04:33 +0000)]
phylib: Modify Vitesse RGMII skew settings
The Vitesse driver was using the RGMII_ID interface type to determine if
skew was necessary. However, we want to move away from using that
interface type, as it's really a property of the board's PHY connection.
However, some boards depend on it, so we want to support it, while
allowing new boards to use the more flexible "fixups" approach. To do
this, we extract the code which adds skew into its own function, and
call that function when RGMII_ID has been selected.
Another side-effect of this change is that if your PHY has skew set
already, it doesn't clear it. This way, the fixup code can modify the
register without config_init then clearing it.
Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Thu, 13 Oct 2011 04:33:54 +0000 (04:33 +0000)]
net: Allow skb_recycle_check to be done in stages
skb_recycle_check resets the skb if it's eligible for recycling.
However, there are times when a driver might want to optionally
manipulate the skb data with the skb before resetting the skb,
but after it has determined eligibility. We do this by splitting the
eligibility check from the skb reset, creating two inline functions to
accomplish that task.
Signed-off-by: Andy Fleming <afleming@freescale.com> Acked-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 19 Oct 2011 13:43:24 +0000 (06:43 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon/kms/atom: fix handling of FB scratch indices
drm/radeon/kms/DCE4.1: fix Select_CrtcSource EncodeMode setting for DP bridges (v2)
drm/radeon/kms/DCE4.1: ss is not supported on the internal pplls
drm/radeon/kms/DCE4.1: fix dig encoder to transmitter mapping
ttm: Fix error-path using an uninitialized value
Antonio Ospite [Wed, 12 Oct 2011 20:59:26 +0000 (17:59 -0300)]
[media] videodev: fix a NULL pointer dereference in v4l2_device_release()
The change in 8280b66 does not cover the case when v4l2_dev is already
NULL, fix that.
With a Kinect sensor, seen as an USB camera using GSPCA in this context,
a NULL pointer dereference BUG can be triggered by just unplugging the
device after the camera driver has been loaded.
Signed-off-by: Antonio Ospite <ospite@studenti.unina.it> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Alex Deucher [Wed, 19 Oct 2011 00:10:05 +0000 (20:10 -0400)]
drm/radeon/kms/atom: fix handling of FB scratch indices
FB scratch indices are dword indices, but we were treating
them as byte indices. As such, we were getting the wrong
FB scratch data for non-0 indices. Fix the indices and
guard the indexing against indices larger than the scratch
allocation.
Fixes memory corruption on some boards if data was written
past the end of the FB scratch array.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reported-by: Dave Airlie <airlied@redhat.com> Tested-by: Dave Airlie <airlied@redhat.com> Cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
pptp_rcv_core() is a nice example of a function assuming everything it
needs is available in skb head.
Reported-by: Bradley Peterson <despite@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>