]> git.proxmox.com Git - ovs.git/blame - vswitchd/INTERNALS
bonding: Balance bond slaves based on ratio.
[ovs.git] / vswitchd / INTERNALS
CommitLineData
b16fdafe
BP
1 ========================
2 ovs-vswitchd Internals
3 ========================
4
5This document describes some of the internals of the ovs-vswitchd
6process. It is not complete. It tends to be updated on demand, so if
7you have questions about the vswitchd implementation, ask them and
8perhaps we'll add some appropriate documentation here.
9
10Most of the ovs-vswitchd implementation is in vswitchd/bridge.c, so
11code references below should be assumed to refer to that file except
12as otherwise specified.
13
14Bonding
15=======
16
17Bonding allows two or more interfaces (the "slaves") to share network
18traffic. From a high-level point of view, bonded interfaces act like
19a single port, but they have the bandwidth of multiple network
20devices, e.g. two 1 GB physical interfaces act like a single 2 GB
21interface. Bonds also increase robustness: the bonded port does not
22go down as long as at least one of its slaves is up.
23
24In vswitchd, a bond always has at least two slaves (and may have
25more). If a configuration error, etc. would cause a bond to have only
26one slave, the port becomes an ordinary port, not a bonded port, and
27none of the special features of bonded ports described in this section
28apply.
29
30There are many forms of bonding, but ovs-vswitchd currently implements
31only a single kind, called "source load balancing" or SLB bonding.
32SLB bonding divides traffic among the slaves based on the Ethernet
33source address. This is useful only if the traffic over the bond has
34multiple Ethernet source addresses, for example if network traffic
35from multiple VMs are multiplexed over the bond.
36
37Enabling and Disabling Slaves
38-----------------------------
39
40When a bond is created, a slave is initially enabled or disabled based
41on whether carrier is detected on the NIC (see iface_create()). After
42that, a slave is disabled if its carrier goes down for a period of
43time longer than the downdelay, and it is enabled if carrier comes up
44for longer than the updelay (see bond_link_status_update()). There is
45one exception where the updelay is skipped: if no slaves at all are
46currently enabled, then the first slave on which carrier comes up is
47enabled immediately.
48
49The updelay should be set to a time longer than the STP forwarding
50delay of the physical switch to which the bond port is connected (if
51STP is enabled on that switch). Otherwise, the slave will be enabled,
52and load may be shifted to it, before the physical switch starts
53forwarding packets on that port, which can cause some data to be
54"blackholed" for a time. The exception for a single enabled slave
55does not cause any problem in this regard because when no slaves are
56enabled all output packets are blackholed anyway.
57
58When a slave becomes disabled, the vswitch immediately chooses a new
59output port for traffic that was destined for that slave (see
60bond_enable_slave()). It also sends a "gratuitous learning packet" on
61the bond port (on the newly chosen slave) for each MAC address that
62the vswitch has learned on a port other than the bond (see
63bond_send_learning_packets()), to teach the physical switch that the
64new slave should be used in place of the one that is now disabled.
65(This behavior probably makes sense only for a vswitch that has only
66one port (the bond) connected to a physical switch; vswitchd should
67probably provide a way to disable or configure it in other scenarios.)
68
69Bond Packet Input
70-----------------
71
72Bond packet input processing takes place in process_flow().
73
74Bonding accepts unicast packets on any bond slave. This can
75occasionally cause packet duplication for the first few packets sent
76to a given MAC, if the physical switch attached to the bond is
77flooding packets to that MAC because it has not yet learned the
78correct slave for that MAC.
79
80Bonding only accepts multicast (and broadcast) packets on a single
81bond slave (the "active slave") at any given time. Multicast packets
82received on other slaves are dropped. Otherwise, every multicast
83packet would be duplicated, once for every bond slave, because the
84physical switch attached to the bond will flood those packets.
85
3a55ef14
JG
86Bonding also drops received packets when the vswitch has learned that
87the packet's MAC is on a port other than the bond port itself. This is
88because it is likely that the vswitch itself sent the packet out the
89bond port on a different slave and is now receiving the packet back.
90This occurs when the packet is multicast or the physical switch has not
91yet learned the MAC and is flooding it. However, the vswitch makes an
b16fdafe
BP
92exception to this rule for broadcast ARP replies, which indicate that
93the MAC has moved to another switch, probably due to VM migration.
94(ARP replies are normally unicast, so this exception does not match
95normal ARP replies. It will match the learning packets sent on bond
96fail-over.)
97
98The active slave is simply the first slave to be enabled after the
99bond is created (see bond_choose_active_iface()). If the active slave
100is disabled, then a new active slave is chosen among the slaves that
101remain active. Currently due to the way that configuration works,
102this tends to be the remaining slave whose interface name is first
103alphabetically, but this is by no means guaranteed.
104
105Bond Packet Output
106------------------
107
108When a packet is sent out a bond port, the bond slave actually used is
109selected based on the packet's source MAC (see choose_output_iface()).
110In particular, the source MAC is hashed into one of 256 values, and
111that value is looked up in a hash table (the "bond hash") kept in the
112"bond_hash" member of struct port. The hash table entry identifies a
113bond slave. If no bond slave has yet been chosen for that hash table
114entry, vswitchd chooses one arbitrarily.
115
116Every 10 seconds, vswitchd rebalances the bond slaves (see
117bond_rebalance_port()). To rebalance, vswitchd examines the
118statistics for the number of bytes transmitted by each slave over
119approximately the past minute, with data sent more recently weighted
120more heavily than data sent less recently. It considers each of the
121slaves in order from most-loaded to least-loaded. If highly loaded
122slave H is significantly more heavily loaded than the least-loaded
123slave L, and slave H carries at least two hashes, then vswitchd shifts
5422a9e1
JG
124one of H's hashes to L. However, vswitchd will only shift a hash from
125H to L if it will decrease the ratio of the load between H and L by at
126least 0.1.
b16fdafe
BP
127
128Currently, "significantly more loaded" means that H must carry at
129least 1 Mbps more traffic, and that traffic must be at least 3%
130greater than L's.