]> git.proxmox.com Git - mirror_ovs.git/blame - Documentation/topics/bonding.rst
Clean up some minor spelling and typos.
[mirror_ovs.git] / Documentation / topics / bonding.rst
CommitLineData
925f6697
SF
1..
2 Licensed under the Apache License, Version 2.0 (the "License"); you may
3 not use this file except in compliance with the License. You may obtain
4 a copy of the License at
5
6 http://www.apache.org/licenses/LICENSE-2.0
7
8 Unless required by applicable law or agreed to in writing, software
9 distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
10 WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
11 License for the specific language governing permissions and limitations
12 under the License.
13
14 Convention for heading levels in Open vSwitch documentation:
15
16 ======= Heading 0 (reserved for the title in a document)
17 ------- Heading 1
18 ~~~~~~~ Heading 2
19 +++++++ Heading 3
20 ''''''' Heading 4
21
22 Avoid deeper levels because they do not render well.
23
7c9afefd 24=======
925f6697 25Bonding
7c9afefd 26=======
925f6697
SF
27
28Bonding allows two or more interfaces (the "slaves") to share network traffic.
29From a high-level point of view, bonded interfaces act like a single port, but
30they have the bandwidth of multiple network devices, e.g. two 1 GB physical
31interfaces act like a single 2 GB interface. Bonds also increase robustness:
32the bonded port does not go down as long as at least one of its slaves is up.
33
34In vswitchd, a bond always has at least two slaves (and may have more). If a
35configuration error, etc. would cause a bond to have only one slave, the port
36becomes an ordinary port, not a bonded port, and none of the special features
37of bonded ports described in this section apply.
38
39There are many forms of bonding of which ovs-vswitchd implements only a few.
40The most complex bond ovs-vswitchd implements is called "source load balancing"
41or SLB bonding. SLB bonding divides traffic among the slaves based on the
42Ethernet source address. This is useful only if the traffic over the bond has
43multiple Ethernet source addresses, for example if network traffic from
44multiple VMs are multiplexed over the bond.
45
7c9afefd
SF
46.. note::
47
48 Most of the ovs-vswitchd implementation is in ``vswitchd/bridge.c``, so code
49 references below should be assumed to refer to that file except as otherwise
50 specified.
51
52
925f6697 53Enabling and Disabling Slaves
7c9afefd 54-----------------------------
925f6697
SF
55
56When a bond is created, a slave is initially enabled or disabled based on
57whether carrier is detected on the NIC (see ``iface_create()``). After that, a
58slave is disabled if its carrier goes down for a period of time longer than the
59downdelay, and it is enabled if carrier comes up for longer than the updelay
60(see ``bond_link_status_update()``). There is one exception where the updelay
61is skipped: if no slaves at all are currently enabled, then the first slave on
62which carrier comes up is enabled immediately.
63
64The updelay should be set to a time longer than the STP forwarding delay of the
65physical switch to which the bond port is connected (if STP is enabled on that
66switch). Otherwise, the slave will be enabled, and load may be shifted to it,
67before the physical switch starts forwarding packets on that port, which can
68cause some data to be "blackholed" for a time. The exception for a single
69enabled slave does not cause any problem in this regard because when no slaves
70are enabled all output packets are blackholed anyway.
71
72When a slave becomes disabled, the vswitch immediately chooses a new output
73port for traffic that was destined for that slave (see
74``bond_enable_slave()``). It also sends a "gratuitous learning packet",
75specifically a RARP, on the bond port (on the newly chosen slave) for each MAC
76address that the vswitch has learned on a port other than the bond (see
33e36f1e 77``bundle_send_learning_packets()``), to teach the physical switch that the new
925f6697
SF
78slave should be used in place of the one that is now disabled. (This behavior
79probably makes sense only for a vswitch that has only one port (the bond)
80connected to a physical switch; vswitchd should probably provide a way to
81disable or configure it in other scenarios.)
82
83Bond Packet Input
7c9afefd 84-----------------
925f6697
SF
85
86Bonding accepts unicast packets on any bond slave. This can occasionally cause
87packet duplication for the first few packets sent to a given MAC, if the
88physical switch attached to the bond is flooding packets to that MAC because it
89has not yet learned the correct slave for that MAC.
90
91Bonding only accepts multicast (and broadcast) packets on a single bond slave
92(the "active slave") at any given time. Multicast packets received on other
93slaves are dropped. Otherwise, every multicast packet would be duplicated,
94once for every bond slave, because the physical switch attached to the bond
95will flood those packets.
96
97Bonding also drops received packets when the vswitch has learned that the
98packet's MAC is on a port other than the bond port itself. This is because it
99is likely that the vswitch itself sent the packet out the bond port on a
100different slave and is now receiving the packet back. This occurs when the
101packet is multicast or the physical switch has not yet learned the MAC and is
102flooding it. However, the vswitch makes an exception to this rule for
103broadcast ARP replies, which indicate that the MAC has moved to another switch,
104probably due to VM migration. (ARP replies are normally unicast, so this
105exception does not match normal ARP replies. It will match the learning
106packets sent on bond fail-over.)
107
108The active slave is simply the first slave to be enabled after the bond is
33e36f1e 109created (see ``bond_choose_active_slave()``). If the active slave is disabled,
925f6697
SF
110then a new active slave is chosen among the slaves that remain active.
111Currently due to the way that configuration works, this tends to be the
112remaining slave whose interface name is first alphabetically, but this is by no
113means guaranteed.
114
115Bond Packet Output
7c9afefd 116------------------
925f6697
SF
117
118When a packet is sent out a bond port, the bond slave actually used is selected
33e36f1e 119based on the packet's source MAC and VLAN tag (see ``bond_choose_output_slave()``).
925f6697
SF
120In particular, the source MAC and VLAN tag are hashed into one of 256 values,
121and that value is looked up in a hash table (the "bond hash") kept in the
122``bond_hash`` member of struct port. The hash table entry identifies a bond
123slave. If no bond slave has yet been chosen for that hash table entry,
124vswitchd chooses one arbitrarily.
125
126Every 10 seconds, vswitchd rebalances the bond slaves (see
33e36f1e 127``bond_rebalance()``). To rebalance, vswitchd examines the statistics for
925f6697
SF
128the number of bytes transmitted by each slave over approximately the past
129minute, with data sent more recently weighted more heavily than data sent less
130recently. It considers each of the slaves in order from most-loaded to
131least-loaded. If highly loaded slave H is significantly more heavily loaded
132than the least-loaded slave L, and slave H carries at least two hashes, then
133vswitchd shifts one of H's hashes to L. However, vswitchd will only shift a
134hash from H to L if it will decrease the ratio of the load between H and L by
135at least 0.1.
136
137Currently, "significantly more loaded" means that H must carry at least 1 Mbps
138more traffic, and that traffic must be at least 3% greater than L's.
139
140Bond Balance Modes
7c9afefd 141------------------
925f6697
SF
142
143Each bond balancing mode has different considerations, described below.
144
145LACP Bonding
7c9afefd 146~~~~~~~~~~~~
925f6697
SF
147
148LACP bonding requires the remote switch to implement LACP, but it is otherwise
149very simple in that, after LACP negotiation is complete, there is no need for
150special handling of received packets.
151
152Several of the physical switches that support LACP block all traffic for ports
153that are configured to use LACP, until LACP is negotiated with the host. When
154configuring a LACP bond on a OVS host (eg: XenServer), this means that there
155will be an interruption of the network connectivity between the time the ports
156on the physical switch and the bond on the OVS host are configured. The
157interruption may be relatively long, if different people are responsible for
158managing the switches and the OVS host.
159
160Such network connectivity failure can be avoided if LACP can be configured on
161the OVS host before configuring the physical switch, and having the OVS host
162fall back to a bond mode (active-backup) till the physical switch LACP
163configuration is complete. An option "lacp-fallback-ab" exists to provide such
dfec5030 164behavior on Open vSwitch.
925f6697
SF
165
166Active Backup Bonding
7c9afefd 167~~~~~~~~~~~~~~~~~~~~~
925f6697
SF
168
169Active Backup bonds send all traffic out one "active" slave until that slave
170becomes unavailable. Since they are significantly less complicated than SLB
171bonds, they are preferred when LACP is not an option. Additionally, they are
172the only bond mode which supports attaching each slave to a different upstream
173switch.
174
175SLB Bonding
7c9afefd 176~~~~~~~~~~~
925f6697
SF
177
178SLB bonding allows a limited form of load balancing without the remote switch's
179knowledge or cooperation. The basics of SLB are simple. SLB assigns each
180source MAC+VLAN pair to a link and transmits all packets from that MAC+VLAN
181through that link. Learning in the remote switch causes it to send packets to
182that MAC+VLAN through the same link.
183
184SLB bonding has the following complications:
185
1860. When the remote switch has not learned the MAC for the destination of a
187 unicast packet and hence floods the packet to all of the links on the SLB
188 bond, Open vSwitch will forward duplicate packets, one per link, to each
189 other switch port.
190
191 Open vSwitch does not solve this problem.
192
1931. When the remote switch receives a multicast or broadcast packet from a port
194 not on the SLB bond, it will forward it to all of the links in the SLB bond.
195 This would cause packet duplication if not handled specially.
196
197 Open vSwitch avoids packet duplication by accepting multicast and broadcast
198 packets on only the active slave, and dropping multicast and broadcast
199 packets on all other slaves.
200
2012. When Open vSwitch forwards a multicast or broadcast packet to a link in the
202 SLB bond other than the active slave, the remote switch will forward it to
203 all of the other links in the SLB bond, including the active slave. Without
204 special handling, this would mean that Open vSwitch would forward a second
205 copy of the packet to each switch port (other than the bond), including the
206 port that originated the packet.
207
208 Open vSwitch deals with this case by dropping packets received on any SLB
209 bonded link that have a source MAC+VLAN that has been learned on any other
210 port. (This means that SLB as implemented in Open vSwitch relies critically
211 on MAC learning. Notably, SLB is incompatible with the "flood_vlans"
212 feature.)
213
2143. Suppose that a MAC+VLAN moves to an SLB bond from another port (e.g. when a
215 VM is migrated from this hypervisor to a different one). Without additional
216 special handling, Open vSwitch will not notice until the MAC learning entry
217 expires, up to 60 seconds later as a consequence of rule #2.
218
219 Open vSwitch avoids a 60-second delay by listening for gratuitous ARPs,
220 which VMs commonly emit upon migration. As an exception to rule #2, a
221 gratuitous ARP received on an SLB bond is not dropped and updates the MAC
222 learning table in the usual way. (If a move does not trigger a gratuitous
223 ARP, or if the gratuitous ARP is lost in the network, then a 60-second delay
224 still occurs.)
225
2264. Suppose that a MAC+VLAN moves from an SLB bond to another port (e.g. when a
227 VM is migrated from a different hypervisor to this one), that the MAC+VLAN
228 emits a gratuitous ARP, and that Open vSwitch forwards that gratuitous ARP
229 to a link in the SLB bond other than the active slave. The remote switch
230 will forward the gratuitous ARP to all of the other links in the SLB bond,
231 including the active slave. Without additional special handling, this would
232 mean that Open vSwitch would learn that the MAC+VLAN was located on the SLB
233 bond, as a consequence of rule #3.
234
235 Open vSwitch avoids this problem by "locking" the MAC learning table entry
236 for a MAC+VLAN from which a gratuitous ARP was received from a non-SLB bond
237 port. For 5 seconds, a locked MAC learning table entry will not be updated
238 based on a gratuitous ARP received on a SLB bond.