]>
Commit | Line | Data |
---|---|---|
925f6697 SF |
1 | .. |
2 | Licensed under the Apache License, Version 2.0 (the "License"); you may | |
3 | not use this file except in compliance with the License. You may obtain | |
4 | a copy of the License at | |
5 | ||
6 | http://www.apache.org/licenses/LICENSE-2.0 | |
7 | ||
8 | Unless required by applicable law or agreed to in writing, software | |
9 | distributed under the License is distributed on an "AS IS" BASIS, WITHOUT | |
10 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the | |
11 | License for the specific language governing permissions and limitations | |
12 | under the License. | |
13 | ||
14 | Convention for heading levels in Open vSwitch documentation: | |
15 | ||
16 | ======= Heading 0 (reserved for the title in a document) | |
17 | ------- Heading 1 | |
18 | ~~~~~~~ Heading 2 | |
19 | +++++++ Heading 3 | |
20 | ''''''' Heading 4 | |
21 | ||
22 | Avoid deeper levels because they do not render well. | |
23 | ||
7c9afefd | 24 | ======= |
925f6697 | 25 | Bonding |
7c9afefd | 26 | ======= |
925f6697 SF |
27 | |
28 | Bonding allows two or more interfaces (the "slaves") to share network traffic. | |
29 | From a high-level point of view, bonded interfaces act like a single port, but | |
30 | they have the bandwidth of multiple network devices, e.g. two 1 GB physical | |
31 | interfaces act like a single 2 GB interface. Bonds also increase robustness: | |
32 | the bonded port does not go down as long as at least one of its slaves is up. | |
33 | ||
34 | In vswitchd, a bond always has at least two slaves (and may have more). If a | |
35 | configuration error, etc. would cause a bond to have only one slave, the port | |
36 | becomes an ordinary port, not a bonded port, and none of the special features | |
37 | of bonded ports described in this section apply. | |
38 | ||
39 | There are many forms of bonding of which ovs-vswitchd implements only a few. | |
40 | The most complex bond ovs-vswitchd implements is called "source load balancing" | |
41 | or SLB bonding. SLB bonding divides traffic among the slaves based on the | |
42 | Ethernet source address. This is useful only if the traffic over the bond has | |
43 | multiple Ethernet source addresses, for example if network traffic from | |
44 | multiple VMs are multiplexed over the bond. | |
45 | ||
7c9afefd SF |
46 | .. note:: |
47 | ||
48 | Most of the ovs-vswitchd implementation is in ``vswitchd/bridge.c``, so code | |
49 | references below should be assumed to refer to that file except as otherwise | |
50 | specified. | |
51 | ||
52 | ||
925f6697 | 53 | Enabling and Disabling Slaves |
7c9afefd | 54 | ----------------------------- |
925f6697 SF |
55 | |
56 | When a bond is created, a slave is initially enabled or disabled based on | |
57 | whether carrier is detected on the NIC (see ``iface_create()``). After that, a | |
58 | slave is disabled if its carrier goes down for a period of time longer than the | |
59 | downdelay, and it is enabled if carrier comes up for longer than the updelay | |
60 | (see ``bond_link_status_update()``). There is one exception where the updelay | |
61 | is skipped: if no slaves at all are currently enabled, then the first slave on | |
62 | which carrier comes up is enabled immediately. | |
63 | ||
64 | The updelay should be set to a time longer than the STP forwarding delay of the | |
65 | physical switch to which the bond port is connected (if STP is enabled on that | |
66 | switch). Otherwise, the slave will be enabled, and load may be shifted to it, | |
67 | before the physical switch starts forwarding packets on that port, which can | |
68 | cause some data to be "blackholed" for a time. The exception for a single | |
69 | enabled slave does not cause any problem in this regard because when no slaves | |
70 | are enabled all output packets are blackholed anyway. | |
71 | ||
72 | When a slave becomes disabled, the vswitch immediately chooses a new output | |
73 | port for traffic that was destined for that slave (see | |
74 | ``bond_enable_slave()``). It also sends a "gratuitous learning packet", | |
75 | specifically a RARP, on the bond port (on the newly chosen slave) for each MAC | |
76 | address that the vswitch has learned on a port other than the bond (see | |
33e36f1e | 77 | ``bundle_send_learning_packets()``), to teach the physical switch that the new |
925f6697 SF |
78 | slave should be used in place of the one that is now disabled. (This behavior |
79 | probably makes sense only for a vswitch that has only one port (the bond) | |
80 | connected to a physical switch; vswitchd should probably provide a way to | |
81 | disable or configure it in other scenarios.) | |
82 | ||
83 | Bond Packet Input | |
7c9afefd | 84 | ----------------- |
925f6697 SF |
85 | |
86 | Bonding accepts unicast packets on any bond slave. This can occasionally cause | |
87 | packet duplication for the first few packets sent to a given MAC, if the | |
88 | physical switch attached to the bond is flooding packets to that MAC because it | |
89 | has not yet learned the correct slave for that MAC. | |
90 | ||
91 | Bonding only accepts multicast (and broadcast) packets on a single bond slave | |
92 | (the "active slave") at any given time. Multicast packets received on other | |
93 | slaves are dropped. Otherwise, every multicast packet would be duplicated, | |
94 | once for every bond slave, because the physical switch attached to the bond | |
95 | will flood those packets. | |
96 | ||
97 | Bonding also drops received packets when the vswitch has learned that the | |
98 | packet's MAC is on a port other than the bond port itself. This is because it | |
99 | is likely that the vswitch itself sent the packet out the bond port on a | |
100 | different slave and is now receiving the packet back. This occurs when the | |
101 | packet is multicast or the physical switch has not yet learned the MAC and is | |
102 | flooding it. However, the vswitch makes an exception to this rule for | |
103 | broadcast ARP replies, which indicate that the MAC has moved to another switch, | |
104 | probably due to VM migration. (ARP replies are normally unicast, so this | |
105 | exception does not match normal ARP replies. It will match the learning | |
106 | packets sent on bond fail-over.) | |
107 | ||
108 | The active slave is simply the first slave to be enabled after the bond is | |
33e36f1e | 109 | created (see ``bond_choose_active_slave()``). If the active slave is disabled, |
925f6697 SF |
110 | then a new active slave is chosen among the slaves that remain active. |
111 | Currently due to the way that configuration works, this tends to be the | |
112 | remaining slave whose interface name is first alphabetically, but this is by no | |
113 | means guaranteed. | |
114 | ||
115 | Bond Packet Output | |
7c9afefd | 116 | ------------------ |
925f6697 SF |
117 | |
118 | When a packet is sent out a bond port, the bond slave actually used is selected | |
33e36f1e | 119 | based on the packet's source MAC and VLAN tag (see ``bond_choose_output_slave()``). |
925f6697 SF |
120 | In particular, the source MAC and VLAN tag are hashed into one of 256 values, |
121 | and that value is looked up in a hash table (the "bond hash") kept in the | |
122 | ``bond_hash`` member of struct port. The hash table entry identifies a bond | |
123 | slave. If no bond slave has yet been chosen for that hash table entry, | |
124 | vswitchd chooses one arbitrarily. | |
125 | ||
126 | Every 10 seconds, vswitchd rebalances the bond slaves (see | |
33e36f1e | 127 | ``bond_rebalance()``). To rebalance, vswitchd examines the statistics for |
925f6697 SF |
128 | the number of bytes transmitted by each slave over approximately the past |
129 | minute, with data sent more recently weighted more heavily than data sent less | |
130 | recently. It considers each of the slaves in order from most-loaded to | |
131 | least-loaded. If highly loaded slave H is significantly more heavily loaded | |
132 | than the least-loaded slave L, and slave H carries at least two hashes, then | |
133 | vswitchd shifts one of H's hashes to L. However, vswitchd will only shift a | |
134 | hash from H to L if it will decrease the ratio of the load between H and L by | |
135 | at least 0.1. | |
136 | ||
137 | Currently, "significantly more loaded" means that H must carry at least 1 Mbps | |
138 | more traffic, and that traffic must be at least 3% greater than L's. | |
139 | ||
140 | Bond Balance Modes | |
7c9afefd | 141 | ------------------ |
925f6697 SF |
142 | |
143 | Each bond balancing mode has different considerations, described below. | |
144 | ||
145 | LACP Bonding | |
7c9afefd | 146 | ~~~~~~~~~~~~ |
925f6697 SF |
147 | |
148 | LACP bonding requires the remote switch to implement LACP, but it is otherwise | |
149 | very simple in that, after LACP negotiation is complete, there is no need for | |
150 | special handling of received packets. | |
151 | ||
152 | Several of the physical switches that support LACP block all traffic for ports | |
153 | that are configured to use LACP, until LACP is negotiated with the host. When | |
154 | configuring a LACP bond on a OVS host (eg: XenServer), this means that there | |
155 | will be an interruption of the network connectivity between the time the ports | |
156 | on the physical switch and the bond on the OVS host are configured. The | |
157 | interruption may be relatively long, if different people are responsible for | |
158 | managing the switches and the OVS host. | |
159 | ||
160 | Such network connectivity failure can be avoided if LACP can be configured on | |
161 | the OVS host before configuring the physical switch, and having the OVS host | |
162 | fall back to a bond mode (active-backup) till the physical switch LACP | |
163 | configuration is complete. An option "lacp-fallback-ab" exists to provide such | |
dfec5030 | 164 | behavior on Open vSwitch. |
925f6697 SF |
165 | |
166 | Active Backup Bonding | |
7c9afefd | 167 | ~~~~~~~~~~~~~~~~~~~~~ |
925f6697 SF |
168 | |
169 | Active Backup bonds send all traffic out one "active" slave until that slave | |
170 | becomes unavailable. Since they are significantly less complicated than SLB | |
171 | bonds, they are preferred when LACP is not an option. Additionally, they are | |
172 | the only bond mode which supports attaching each slave to a different upstream | |
173 | switch. | |
174 | ||
175 | SLB Bonding | |
7c9afefd | 176 | ~~~~~~~~~~~ |
925f6697 SF |
177 | |
178 | SLB bonding allows a limited form of load balancing without the remote switch's | |
179 | knowledge or cooperation. The basics of SLB are simple. SLB assigns each | |
180 | source MAC+VLAN pair to a link and transmits all packets from that MAC+VLAN | |
181 | through that link. Learning in the remote switch causes it to send packets to | |
182 | that MAC+VLAN through the same link. | |
183 | ||
184 | SLB bonding has the following complications: | |
185 | ||
186 | 0. When the remote switch has not learned the MAC for the destination of a | |
187 | unicast packet and hence floods the packet to all of the links on the SLB | |
188 | bond, Open vSwitch will forward duplicate packets, one per link, to each | |
189 | other switch port. | |
190 | ||
191 | Open vSwitch does not solve this problem. | |
192 | ||
193 | 1. When the remote switch receives a multicast or broadcast packet from a port | |
194 | not on the SLB bond, it will forward it to all of the links in the SLB bond. | |
195 | This would cause packet duplication if not handled specially. | |
196 | ||
197 | Open vSwitch avoids packet duplication by accepting multicast and broadcast | |
198 | packets on only the active slave, and dropping multicast and broadcast | |
199 | packets on all other slaves. | |
200 | ||
201 | 2. When Open vSwitch forwards a multicast or broadcast packet to a link in the | |
202 | SLB bond other than the active slave, the remote switch will forward it to | |
203 | all of the other links in the SLB bond, including the active slave. Without | |
204 | special handling, this would mean that Open vSwitch would forward a second | |
205 | copy of the packet to each switch port (other than the bond), including the | |
206 | port that originated the packet. | |
207 | ||
208 | Open vSwitch deals with this case by dropping packets received on any SLB | |
209 | bonded link that have a source MAC+VLAN that has been learned on any other | |
210 | port. (This means that SLB as implemented in Open vSwitch relies critically | |
211 | on MAC learning. Notably, SLB is incompatible with the "flood_vlans" | |
212 | feature.) | |
213 | ||
214 | 3. Suppose that a MAC+VLAN moves to an SLB bond from another port (e.g. when a | |
215 | VM is migrated from this hypervisor to a different one). Without additional | |
216 | special handling, Open vSwitch will not notice until the MAC learning entry | |
217 | expires, up to 60 seconds later as a consequence of rule #2. | |
218 | ||
219 | Open vSwitch avoids a 60-second delay by listening for gratuitous ARPs, | |
220 | which VMs commonly emit upon migration. As an exception to rule #2, a | |
221 | gratuitous ARP received on an SLB bond is not dropped and updates the MAC | |
222 | learning table in the usual way. (If a move does not trigger a gratuitous | |
223 | ARP, or if the gratuitous ARP is lost in the network, then a 60-second delay | |
224 | still occurs.) | |
225 | ||
226 | 4. Suppose that a MAC+VLAN moves from an SLB bond to another port (e.g. when a | |
227 | VM is migrated from a different hypervisor to this one), that the MAC+VLAN | |
228 | emits a gratuitous ARP, and that Open vSwitch forwards that gratuitous ARP | |
229 | to a link in the SLB bond other than the active slave. The remote switch | |
230 | will forward the gratuitous ARP to all of the other links in the SLB bond, | |
231 | including the active slave. Without additional special handling, this would | |
232 | mean that Open vSwitch would learn that the MAC+VLAN was located on the SLB | |
233 | bond, as a consequence of rule #3. | |
234 | ||
235 | Open vSwitch avoids this problem by "locking" the MAC learning table entry | |
236 | for a MAC+VLAN from which a gratuitous ARP was received from a non-SLB bond | |
237 | port. For 5 seconds, a locked MAC learning table entry will not be updated | |
238 | based on a gratuitous ARP received on a SLB bond. |