]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/blame - Documentation/networking/tcp.txt
powerpc/numa: document topology_updates_enabled, disable by default
[mirror_ubuntu-bionic-kernel.git] / Documentation / networking / tcp.txt
CommitLineData
9d7bcfc6
SH
1TCP protocol
2============
3
1e0ce2a1 4Last updated: 3 June 2017
9d7bcfc6
SH
5
6Contents
7========
8
9- Congestion control
10- How the new TCP output machine [nyi] works
11
12Congestion control
13==================
14
15The following variables are used in the tcp_sock for congestion control:
16snd_cwnd The size of the congestion window
17snd_ssthresh Slow start threshold. We are in slow start if
18 snd_cwnd is less than this.
19snd_cwnd_cnt A counter used to slow down the rate of increase
20 once we exceed slow start threshold.
21snd_cwnd_clamp This is the maximum size that snd_cwnd can grow to.
22snd_cwnd_stamp Timestamp for when congestion window last validated.
23snd_cwnd_used Used as a highwater mark for how much of the
24 congestion window is in use. It is used to adjust
25 snd_cwnd down when the link is limited by the
26 application rather than the network.
27
28As of 2.6.13, Linux supports pluggable congestion control algorithms.
29A congestion control mechanism can be registered through functions in
30tcp_cong.c. The functions used by the congestion control mechanism are
31registered via passing a tcp_congestion_ops struct to
1e0ce2a1
AS
32tcp_register_congestion_control. As a minimum, the congestion control
33mechanism must provide a valid name and must implement either ssthresh,
34cong_avoid and undo_cwnd hooks or the "omnipotent" cong_control hook.
1da177e4 35
9d7bcfc6
SH
36Private data for a congestion control mechanism is stored in tp->ca_priv.
37tcp_ca(tp) returns a pointer to this space. This is preallocated space - it
38is important to check the size of your private data will fit this space, or
1e0ce2a1 39alternatively, space could be allocated elsewhere and a pointer to it could
9d7bcfc6
SH
40be stored here.
41
42There are three kinds of congestion control algorithms currently: The
43simplest ones are derived from TCP reno (highspeed, scalable) and just
1e0ce2a1 44provide an alternative congestion window calculation. More complex
9d7bcfc6
SH
45ones like BIC try to look at other events to provide better
46heuristics. There are also round trip time based algorithms like
47Vegas and Westwood+.
48
49Good TCP congestion control is a complex problem because the algorithm
50needs to maintain fairness and performance. Please review current
51research and RFC's before developing new modules.
52
1e0ce2a1
AS
53The default congestion control mechanism is chosen based on the
54DEFAULT_TCP_CONG Kconfig parameter. If you really want a particular default
55value then you can set it using sysctl net.ipv4.tcp_congestion_control. The
56module will be autoloaded if needed and you will get the expected protocol. If
57you ask for an unknown congestion method, then the sysctl attempt will fail.
9d7bcfc6 58
1e0ce2a1 59If you remove a TCP congestion control module, then you will get the next
84eb8d06 60available one. Since reno cannot be built as a module, and cannot be
1e0ce2a1 61removed, it will always be available.
9d7bcfc6
SH
62
63How the new TCP output machine [nyi] works.
64===========================================
1da177e4
LT
65
66Data is kept on a single queue. The skb->users flag tells us if the frame is
67one that has been queued already. To add a frame we throw it on the end. Ack
68walks down the list from the start.
69
70We keep a set of control flags
71
72
73 sk->tcp_pend_event
74
75 TCP_PEND_ACK Ack needed
76 TCP_ACK_NOW Needed now
77 TCP_WINDOW Window update check
78 TCP_WINZERO Zero probing
79
80
81 sk->transmit_queue The transmission frame begin
82 sk->transmit_new First new frame pointer
83 sk->transmit_end Where to add frames
84
85 sk->tcp_last_tx_ack Last ack seen
86 sk->tcp_dup_ack Dup ack count for fast retransmit
87
88
89Frames are queued for output by tcp_write. We do our best to send the frames
90off immediately if possible, but otherwise queue and compute the body
91checksum in the copy.
92
93When a write is done we try to clear any pending events and piggy back them.
94If the window is full we queue full sized frames. On the first timeout in
95zero window we split this.
96
97On a timer we walk the retransmit list to send any retransmits, update the
98backoff timers etc. A change of route table stamp causes a change of header
99and recompute. We add any new tcp level headers and refinish the checksum
100before sending.
101