David S. Miller [Wed, 17 Aug 2005 21:57:30 +0000 (14:57 -0700)]
[NET]: Implement SKB fast cloning.
Protocols that make extensive use of SKB cloning,
for example TCP, eat at least 2 allocations per
packet sent as a result.
To cut the kmalloc() count in half, we implement
a pre-allocation scheme wherein we allocate
2 sk_buff objects in advance, then use a simple
reference count to free up the memory at the
correct time.
Based upon an initial patch by Thomas Graf and
suggestions from Herbert Xu.
Signed-off-by: David S. Miller <davem@davemloft.net>
CHECK net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 15 Aug 2005 19:32:15 +0000 (12:32 -0700)]
[NETLINK]: Add set/getsockopt options to support more than 32 groups
NETLINK_ADD_MEMBERSHIP/NETLINK_DROP_MEMBERSHIP are used to join/leave
groups, NETLINK_PKTINFO is used to enable nl_pktinfo control messages
for received packets to get the extended destination group number.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 15 Aug 2005 02:31:36 +0000 (19:31 -0700)]
[NETLINK]: Return -EPROTONOSUPPORT in netlink_create() if no kernel socket is registered
This is necessary for dynamic number of netlink groups to make sure we know
the number of possible groups before bind() is called. With this change pure
userspace communication using unused netlink protocols becomes impossible.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 15 Aug 2005 02:27:50 +0000 (19:27 -0700)]
[NETLINK]: Use group numbers instead of bitmasks internally
Using the group number allows increasing the number of groups without
beeing limited by the size of the bitmask. It introduces one limitation
for netlink users: messages can't be broadcasted to multiple groups anymore,
however this feature was never used inside the kernel.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 15 Aug 2005 02:27:13 +0000 (19:27 -0700)]
[NETLINK]: Fix module refcounting problems
Use-after-free: the struct proto_ops containing the module pointer
is freed when a socket with pid=0 is released, which besides for kernel
sockets is true for all unbound sockets.
Module refcount leak: when the kernel socket is closed before all user
sockets have been closed the proto_ops struct for this family is
replaced by the generic one and the module refcount can't be dropped.
The second problem can't be solved cleanly using module refcounting in the
generic socket code, so this patch adds explicit refcounting to
netlink_create/netlink_release.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Evgeniy Polyakov [Mon, 15 Aug 2005 02:24:58 +0000 (19:24 -0700)]
[NETLINK]: w1_int.c: fix default netlink group
w1 does not need to multicast its state to several groups at once,
and upcoming netlink changes will not allow bitmask for groups anyway.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 15 Aug 2005 00:05:53 +0000 (21:05 -0300)]
[DCCP]: Fix compiler warnings
may be a false warning if there always is something on ccid3hcrx_hist:
net/dccp/ccids/ccid3.c: In function 'ccid3_hc_rx_packet_recv':
net/dccp/ccids/ccid3.c:1634: warning: 'tstamp.tv_usec' may be used uninitialized in this function
net/dccp/ccids/ccid3.c:1634: warning: 'tstamp.tv_sec' may be used uninitialized in this function
const on inline functions doesn't have any effect:
net/dccp/dccp.h:64: warning: type qualifiers ignored on function return type
net/dccp/dccp.h:70: warning: type qualifiers ignored on function return type
net/dccp/dccp.h:76: warning: type qualifiers ignored on function return type
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[PACKET_HISTORY]: Add dccphtx_rtt and rename the win_count fields
As requested by Ian.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: Ian McDonald <iam4@cs.waikato.ac.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
Gary Wayne Smith [Mon, 15 Aug 2005 00:33:24 +0000 (17:33 -0700)]
[NETFILTER]: Make NETMAP target usable in OUTPUT
Signed-off-by: Gary Wayne Smith <gary.w.smith@primeexalia.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Sat, 13 Aug 2005 20:56:26 +0000 (13:56 -0700)]
[NETFILTER]: Add new iptables "connbytes" match
This patch ads a new "connbytes" match that utilizes the CONFIG_NF_CT_ACCT
per-connection byte and packet counters. Using it you can do things like
packet classification on average packet size within a connection.
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Sat, 13 Aug 2005 20:55:44 +0000 (13:55 -0700)]
[NETFILTER]: introduce and use aligned_u64 data type
As proposed by Andi Kleen, this is required esp. for x86_64 architecture,
where 64bit code needs 8byte aligned 64bit data types, but 32bit userspace
apps will only align to 4bytes.
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
[INET_DIAG]: Move the tcp_diag interface to the proper place
With this the previous setup is back, i.e. tcp_diag can be built as a module,
as dccp_diag and both share the infrastructure available in inet_diag.
If one selects CONFIG_INET_DIAG as module CONFIG_INET_TCP_DIAG will also be
built as a module, as will CONFIG_INET_DCCP_DIAG, if CONFIG_IP_DCCP was
selected static or as a module, if CONFIG_INET_DIAG is y, being statically
linked CONFIG_INET_TCP_DIAG will follow suit and CONFIG_INET_DCCP_DIAG will be
built in the same manner as CONFIG_IP_DCCP.
Now to aim at UDP, converting it to use inet_hashinfo, so that we can use
iproute2 for UDP sockets as well.
Ah, just to show an example of this new infrastructure working for DCCP :-)
Next changeset will rename tcp_diag to inet_diag and move the tcp_diag code out
of it and into a new tcp_diag.c, similar to the net/dccp/diag.c introduced in
this changeset, completing the transition to a generic inet_diag
infrastructure.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[INET6_HASHTABLES]: Move inet6_lookup functions to net/ipv6/inet6_hashtables.c
Doing this we allow tcp_diag to support IPV6 even if tcp_diag is compiled
statically and IPV6 is compiled as a module, removing the previous restriction
while not building any IPV6 code if it is not selected.
Now to work on the tcpdiag_register infrastructure and then to rename the whole
thing to inetdiag, reflecting its by then completely generic nature.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Hagervall [Wed, 10 Aug 2005 21:18:16 +0000 (14:18 -0700)]
[BNX2]: Possible sparse fixes, take two
This patch contains the following possible cleanups/fixes:
- use C99 struct initializers
- make a few arrays and structs static
- remove a few uses of literal 0 as NULL pointer
- use convenience function instead of cast+dereference in bnx2_ioctl()
- remove superfluous casts to u8 * in calls to readl/writel
Signed-off-by: Peter Hagervall <hager@cs.umu.se> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Benjamin LaHaise [Wed, 10 Aug 2005 21:16:04 +0000 (14:16 -0700)]
[NET]: Make use of ->private_data in sockfd_lookup
Please consider the patch below which makes use of file->private_data to
store the pointer to the socket, which avoids touching several unused
cachelines in the dentry and inode in sockfd_lookup.
Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ICSK]: Move TCP congestion avoidance members to icsk
This changeset basically moves tcp_sk()->{ca_ops,ca_state,etc} to inet_csk(),
minimal renaming/moving done in this changeset to ease review.
Most of it is just changes of struct tcp_sock * to struct sock * parameters.
With this we move to a state closer to two interesting goals:
1. Generalisation of net/ipv4/tcp_diag.c, becoming inet_diag.c, being used
for any INET transport protocol that has struct inet_hashinfo and are
derived from struct inet_connection_sock. Keeps the userspace API, that will
just not display DCCP sockets, while newer versions of tools can support
DCCP.
2. INET generic transport pluggable Congestion Avoidance infrastructure, using
the current TCP CA infrastructure with DCCP.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Using most of the infrastructure TCP uses, with a dccp_death_row,
etc. As per my current interpretation of the draft what we have with
this changeset seems to be all we need (or very close to it 8)).
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
That groups all of the tables and variables associated to the TCP timewait
schedulling/recycling/killing code, that now can be isolated from the TCP
specific code and used by other transport protocols, such as DCCP.
Next changeset will move this code to net/ipv4/inet_timewait_sock.c
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[DCCP]: Introduce dccp_write_xmit from code in dccp_sendmsg
This way it gets closer to the TCP flow, where congestion window
checks are done, it seems we can map ccid_hc_tx_send_packet in
dccp_write_xmit to tcp_snd_wnd_test in tcp_write_xmit, a CCID2
decision should just fit in here as well...
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcel Holtmann [Wed, 10 Aug 2005 03:30:28 +0000 (20:30 -0700)]
[Bluetooth]: Move packet type into the SKB control buffer
This patch moves the usage of packet type into the SKB control
buffer. After this patch it is now possible to shrink the sk_buff
structure and redefine its pkt_type.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Victor Fusco [Wed, 10 Aug 2005 03:29:11 +0000 (20:29 -0700)]
[Bluetooth]: Fix sparse warnings (__nocast type)
This patch fixes the sparse warnings "implicit cast to nocast type"
for the priority or gfp_mask parameters of the memory allocations.
Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Timo Teräs [Wed, 10 Aug 2005 03:28:21 +0000 (20:28 -0700)]
[Bluetooth]: Call tty_hangup() when DCD is de-asserted
The RFCOMM layer does not handle properly the de-assertation
of CD signal. It should call tty_hangup() to work properly.
Signed-off-by: Timo Teräs <ext-timo.teras@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
The HCI page scan repetition mode change event contains the actual
page scan repetition mode for the remote device. It is the same
value that is received from an inquiry response and it can be used
to make further reconnections faster.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcel Holtmann [Wed, 10 Aug 2005 03:27:49 +0000 (20:27 -0700)]
[Bluetooth]: Workaround for inquiry results with RSSI and page scan mode
This patch implements a workaround for buggy Bluetooth 1.2 devices from
Silicon Wave. Their inquiry results with RSSI contain the page scan mode
field. This field was removed in the final Bluetooth 1.2 specification.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 03:26:55 +0000 (20:26 -0700)]
[NETFILTER]: New iptables DCCP protocol header match
Using this new iptables DCCP protocol header match, it is possible to
create simplistic stateless packet filtering rules for DCCP. It
permits matching of port numbers, packet type and options.
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 03:26:03 +0000 (20:26 -0700)]
[DCCP]: make <linux/dccp.h> include-able from userspace
The protocol header files in <linux/foo.h> are usually structured in a
way to be included by userspace code. The top section consists of
general protocol structure definitions, typedefs, enums - followed by
an #ifdef __KERNEL__ section.
Currently <linux/dccp.h> doesn't follow that convention and can
therefore not be used from userspace. However, for example iptables'
libipt_dccp.c actually needs various definitions from there.
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Use const where possible and get rid of EXTRACT() macro
that was never used.
Signed-off-by: Stephen Hemmigner <shemminger@osdl.org> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Olsson [Wed, 10 Aug 2005 03:25:06 +0000 (20:25 -0700)]
[IPV4]: fib_trie: Use ERR_PTR to handle errno return
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Olof Johansson [Wed, 10 Aug 2005 03:24:39 +0000 (20:24 -0700)]
[IPV4]: FIB Trie cleanups.
Below is a patch that cleans up some of this, supposedly without
changing any behaviour:
* Whitespace cleanups
* Introduce DBG()
* BUG_ON() instead of if () { BUG(); }
* Remove some of the deep nesting to make the code flow more
comprehensible
* Some mask operations were simplified
Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
Yasuyuki Kozakai [Wed, 10 Aug 2005 03:24:15 +0000 (20:24 -0700)]
[NETFILTER]: return ENOMEM when ip_conntrack_alloc() fails.
This patch fixes the bug which doesn't return ERR_PTR(-ENOMEM) if it
failed to allocate memory space from slab cache. This bug leads to
erroneously not dropped packets under stress, and wrong statistic
counters ('invalid' is incremented instead of 'drop'). It was
introduced during the ctnetlink merge in the net-2.6.14 tree, so no
stable or mainline releases affected.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 03:23:36 +0000 (20:23 -0700)]
[NETFILTER]: more verbose return codes from nf_{log,queue}
This adds EEXIST to distinguish between the following return values:
0: nobody was registered, registration successful
EEXIST: the exact same handler was already registered, no registration
required
EBUSY: somebody else is registered, registration unsuccessful.
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 03:23:11 +0000 (20:23 -0700)]
[NETFILTER]: add /proc/net/netfilter interface to nf_queue
This patch adds a /proc/net/netfilter/nf_queue file, similar to the
recently-added /proc/net/netfilter/nf_log. It indicates which queue
handler is registered to which protocol family. This is useful since
there are now multiple queue handlers in the treee (ip[6]_queue,
nfnetlink_queue).
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 03:21:49 +0000 (20:21 -0700)]
[NETFILTER]: split net/core/netfilter.c into net/netfilter/*.c
This patch doesn't introduce any code changes, but merely splits the
core netfilter code into four separate files. It also moves it from
it's old location in net/core/ to the recently-created net/netfilter/
directory.
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 03:20:54 +0000 (20:20 -0700)]
[NETFILTER]: ip{6}_queue: prevent unregistration race with nfnetlink_queue
Since nfnetlink_queue can override ip{6}_queue as queue handlers, we
can no longer blindly unregister whoever is registered for PF_INET[6],
but only unregister ourselves.
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Harald Welte [Wed, 10 Aug 2005 03:19:44 +0000 (20:19 -0700)]
[NETFILTER]: move conntrack helper buffers from BSS to kmalloc()ed memory
According to DaveM, it is preferrable to have large data structures be
allocated dynamically from the module init() function rather than
putting them as static global variables into BSS.
This patch moves the conntrack helper packet buffers into dynamically
allocated memory.
Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 10 Aug 2005 03:17:41 +0000 (20:17 -0700)]
[TG3]: Fix bug in setting a tg3_flag
Found a bug while reviewing the patches the second time.
The TG3_FLAG_TXD_MBOX_HWBUG flag is set after the register access
methods have been determined. This patch fixes it by moving it up before
the various access methods are assigned.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 10 Aug 2005 03:17:28 +0000 (20:17 -0700)]
[TG3]: Eliminate one register write in tg3_restart_ints()
The register write to register 0x68 to restart interrupts is unnecessary
as the interrupt wasn't masked in that register by the irq handler. This
will save one register write in the fast path.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 10 Aug 2005 03:17:14 +0000 (20:17 -0700)]
[TG3]: Add indirect register method for 5703 behind ICH
This patch adds the new workaround for 5703 A1/A2 if it is behind
certain ICH bridges. The workaround disables memory and uses config.
cycles only to access all registers. The 5702/03 chips can mistakenly
decode the special cycles from the ICH chipsets as memory write cycles,
causing corruption of register and memory space. Only certain ICH
bridges will drive special cycles with non-zero data during the address
phase which can fall within the 5703's address range. This is not an ICH
bug as the PCI spec allows non-zero address during special cycles.
However, only these ICH bridges are known to drive non-zero addresses
during special cycles.
The indirect_lock is also changed to spin_lock_irqsave from spin_lock_bh
because it is used in irq handler when using the indirect method to
disable interrupts.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 10 Aug 2005 03:16:46 +0000 (20:16 -0700)]
[TG3]: Add various register methods
This patch adds various dedicated register read/write methods for the
existing workarounds, including PCIX target workaround, write with read
flush, etc. The chips that require these workarounds will use these
dedicated access functions.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Wed, 10 Aug 2005 03:16:32 +0000 (20:16 -0700)]
[TG3]: Add basic register access function pointers
This patch adds the basic function pointers to do register accesses in
the fast path. This was suggested by David Miller. The idea is that
various register access methods for different hardware errata can easily
be implemented with these function pointers and performance will not be
degraded on chips that use normal register access methods.
The various register read write macros (e.g. tw32, tr32, tw32_mailbox)
are redefined to call the function pointers.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yoshifumi Nishida <nishida@csl.sony.co.jp> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
[ICSK]: Move generalised functions from tcp to inet_connection_sock
This also improves reqsk_queue_prune and renames it to
inet_csk_reqsk_queue_prune, as it deals with both inet_connection_sock
and inet_request_sock objects, not just with request_sock ones thus
belonging to inet_request_sock.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
[ICSK]: Introduce reqsk_queue_prune from code in tcp_synack_timer
With this we're very close to getting all of the current TCP
refactorings in my dccp-2.6 tree merged, next changeset will export
some functions needed by the current DCCP code and then dccp-2.6.git
will be born!
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
This also moved inet_iif from tcp to inet_hashtables.h, as it is
needed by the inet_lookup callers, perhaps this needs a bit of
polishing, but for now seems fine.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>