From: Ben Pfaff Date: Thu, 12 Feb 2015 07:34:50 +0000 (-0800) Subject: mac-learning: Implement per-port MAC learning fairness. X-Git-Tag: v2.12.3~7416 X-Git-Url: https://git.proxmox.com/?a=commitdiff_plain;h=9d078ec2f17adc7cceb2793687f8faa1f1c6f4f3;p=mirror_ovs.git mac-learning: Implement per-port MAC learning fairness. In "MAC flooding", an attacker transmits an overwhelming number of frames with unique Ethernet source address on a switch port. The goal is to force the switch to evict all useful MAC learning table entries, so that its behavior degenerates to that of a hub, flooding all traffic. In turn, that allows an attacker to eavesdrop on the traffic of other hosts attached to the switch, with all the risks that that entails. Before this commit, the Open vSwitch "normal" action that implements its standalone switch behavior (and that can be used by OpenFlow controllers as well) was vulnerable to MAC flooding attacks. This commit fixes the problem by implementing per-port fairness for MAC table entries: when the MAC table is at its maximum size, MAC table eviction always deletes an entry from the port with the most entries. Thus, MAC entries will never be evicted from ports with only a few entries if a port with a huge number of entries exists. Controllers could introduce their own MAC flooding vulnerabilities into OVS. For a controller that adds destination MAC based flows to an OpenFlow flow table as a reaction to "packet-in" events, such a bug, if it exists, would be in the controller code itself and would need to be fixed in the controller. For a controller that relies on the Open vSwitch "learn" action to add destination MAC based flows, Open vSwitch has existing support for eviction policy similar to that implemented in this commit through the "groups" column in the Flow_Table table documented in ovs-vswitchd.conf.db(5); we recommend that users of "learn" not already familiar with eviction groups to read that documentation. In addition to implementation of per-port MAC learning fairness, this commit includes some closely related changes: - Access to client-provided "port" data in struct mac_entry is now abstracted through helper functions, which makes it easier to ensure that the per-port data structures are maintained consistently. - The mac_learning_changed() function, which had become trivial, vestigial, and confusing, was removed. Its functionality was folded into the new function mac_entry_set_port(). - Many comments were added and improved; there had been a lot of comment rot in previous versions. CERT: VU#784996 Reported-by: "Ronny L. Bull - bullrl" Reported-at: http://www.irongeek.com/i.php?page=videos/derbycon4/t314-exploring-layer-2-network-security-in-virtualized-environments-ronny-l-bull-dr-jeanna-n-matthews Signed-off-by: Ben Pfaff Acked-by: Ethan Jackson --- diff --git a/AUTHORS b/AUTHORS index 1903423c1..8b833dd69 100644 --- a/AUTHORS +++ b/AUTHORS @@ -307,6 +307,7 @@ Roger Leigh rleigh@codelibre.net Rogério Vinhal Nunes Roman Sokolkov rsokolkov@gmail.com Ronaldo A. Ferreira ronaldof@CS.Princeton.EDU +Ronny L. Bull bullrl@clarkson.edu Sander Eikelenboom linux@eikelenboom.it Saul St. John sstjohn@cs.wisc.edu Scott Hendricks shendricks@nicira.com diff --git a/NEWS b/NEWS index adfa1d1ac..2b0c347ae 100644 --- a/NEWS +++ b/NEWS @@ -1,5 +1,7 @@ Post-v2.3.0 --------------------- + - The MAC learning feature now includes per-port fairness to mitigate + MAC flooding attacks. - New support for a "conjunctive match" OpenFlow extension, which allows constructing OpenFlow matches of the form "field1 in {a,b,c...} AND field2 in {d,e,f...}" and generalizations. For details, diff --git a/lib/learning-switch.c b/lib/learning-switch.c index 1423ac46a..d03e52ed7 100644 --- a/lib/learning-switch.c +++ b/lib/learning-switch.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014 Nicira, Inc. + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -105,6 +105,13 @@ static enum ofperr process_switch_features(struct lswitch *, static void process_packet_in(struct lswitch *, const struct ofp_header *); static void process_echo_request(struct lswitch *, const struct ofp_header *); +static ofp_port_t get_mac_entry_ofp_port(const struct mac_learning *ml, + const struct mac_entry *) + OVS_REQ_RDLOCK(ml->rwlock); +static void set_mac_entry_ofp_port(struct mac_learning *ml, + struct mac_entry *, ofp_port_t) + OVS_REQ_WRLOCK(ml->rwlock); + /* Creates and returns a new learning switch whose configuration is given by * 'cfg'. * @@ -526,14 +533,14 @@ lswitch_choose_destination(struct lswitch *sw, const struct flow *flow) if (mac_learning_may_learn(sw->ml, flow->dl_src, 0)) { struct mac_entry *mac = mac_learning_insert(sw->ml, flow->dl_src, 0); - if (mac->port.ofp_port != flow->in_port.ofp_port) { + if (get_mac_entry_ofp_port(sw->ml, mac) + != flow->in_port.ofp_port) { VLOG_DBG_RL(&rl, "%016llx: learned that "ETH_ADDR_FMT" is on " "port %"PRIu16, sw->datapath_id, ETH_ADDR_ARGS(flow->dl_src), flow->in_port.ofp_port); - mac->port.ofp_port = flow->in_port.ofp_port; - mac_learning_changed(sw->ml); + set_mac_entry_ofp_port(sw->ml, mac, flow->in_port.ofp_port); } } ovs_rwlock_unlock(&sw->ml->rwlock); @@ -551,7 +558,7 @@ lswitch_choose_destination(struct lswitch *sw, const struct flow *flow) ovs_rwlock_rdlock(&sw->ml->rwlock); mac = mac_learning_lookup(sw->ml, flow->dl_dst, 0); if (mac) { - out_port = mac->port.ofp_port; + out_port = get_mac_entry_ofp_port(sw->ml, mac); if (out_port == flow->in_port.ofp_port) { /* Don't send a packet back out its input port. */ ovs_rwlock_unlock(&sw->ml->rwlock); @@ -691,3 +698,20 @@ process_echo_request(struct lswitch *sw, const struct ofp_header *rq) { queue_tx(sw, make_echo_reply(rq)); } + +static ofp_port_t +get_mac_entry_ofp_port(const struct mac_learning *ml, + const struct mac_entry *e) + OVS_REQ_RDLOCK(ml->rwlock) +{ + void *port = mac_entry_get_port(ml, e); + return (OVS_FORCE ofp_port_t) (uintptr_t) port; +} + +static void +set_mac_entry_ofp_port(struct mac_learning *ml, + struct mac_entry *e, ofp_port_t ofp_port) + OVS_REQ_WRLOCK(ml->rwlock) +{ + mac_entry_set_port(ml, e, (void *) (OVS_FORCE uintptr_t) ofp_port); +} diff --git a/lib/mac-learning.c b/lib/mac-learning.c index dbb457bab..190920b97 100644 --- a/lib/mac-learning.c +++ b/lib/mac-learning.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014 Nicira, Inc. + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -69,6 +69,90 @@ mac_entry_lookup(const struct mac_learning *ml, return NULL; } +static struct mac_learning_port * +mac_learning_port_lookup(struct mac_learning *ml, void *port) +{ + struct mac_learning_port *mlport; + + HMAP_FOR_EACH_IN_BUCKET (mlport, hmap_node, hash_pointer(port, ml->secret), + &ml->ports_by_ptr) { + if (mlport->port == port) { + return mlport; + } + } + return NULL; +} + +/* Changes the client-owned pointer for entry 'e' in 'ml' to 'port'. The + * pointer can be retrieved with mac_entry_get_port(). + * + * The MAC-learning implementation treats the data that 'port' points to as + * opaque and never tries to dereference it. However, when a MAC learning + * table becomes overfull, so that eviction is required, the implementation + * does first evict MAC entries for the most common 'port's values in 'ml', so + * that there is a degree of fairness, that is, each port is entitled to its + * fair share of MAC entries. */ +void +mac_entry_set_port(struct mac_learning *ml, struct mac_entry *e, void *port) + OVS_REQ_WRLOCK(ml->rwlock) +{ + if (mac_entry_get_port(ml, e) != port) { + ml->need_revalidate = true; + + if (e->mlport) { + struct mac_learning_port *mlport = e->mlport; + list_remove(&e->port_lru_node); + + if (list_is_empty(&mlport->port_lrus)) { + ovs_assert(mlport->heap_node.priority == 1); + hmap_remove(&ml->ports_by_ptr, &mlport->hmap_node); + heap_remove(&ml->ports_by_usage, &mlport->heap_node); + free(mlport); + } else { + ovs_assert(mlport->heap_node.priority > 1); + heap_change(&ml->ports_by_usage, &mlport->heap_node, + mlport->heap_node.priority - 1); + } + e->mlport = NULL; + } + + if (port) { + struct mac_learning_port *mlport; + + mlport = mac_learning_port_lookup(ml, port); + if (!mlport) { + mlport = xzalloc(sizeof *mlport); + hmap_insert(&ml->ports_by_ptr, &mlport->hmap_node, + hash_pointer(port, ml->secret)); + heap_insert(&ml->ports_by_usage, &mlport->heap_node, 1); + mlport->port = port; + list_init(&mlport->port_lrus); + } else { + heap_change(&ml->ports_by_usage, &mlport->heap_node, + mlport->heap_node.priority + 1); + } + list_push_back(&mlport->port_lrus, &e->port_lru_node); + e->mlport = mlport; + } + } +} + +/* Finds one of the ports with the most MAC entries and evicts its least + * recently used entry. */ +static void +evict_mac_entry_fairly(struct mac_learning *ml) + OVS_REQ_WRLOCK(ml->rwlock) +{ + struct mac_learning_port *mlport; + struct mac_entry *e; + + mlport = CONTAINER_OF(heap_max(&ml->ports_by_usage), + struct mac_learning_port, heap_node); + e = CONTAINER_OF(list_front(&mlport->port_lrus), + struct mac_entry, port_lru_node); + mac_learning_expire(ml, e); +} + /* If the LRU list is not empty, stores the least-recently-used entry in '*e' * and returns true. Otherwise, if the LRU list is empty, stores NULL in '*e' * and return false. */ @@ -109,6 +193,8 @@ mac_learning_create(unsigned int idle_time) ml->idle_time = normalize_idle_time(idle_time); ml->max_entries = MAC_DEFAULT_MAX; ml->need_revalidate = false; + hmap_init(&ml->ports_by_ptr); + heap_init(&ml->ports_by_usage); ovs_refcount_init(&ml->ref_cnt); ovs_rwlock_init(&ml->rwlock); return ml; @@ -131,13 +217,16 @@ mac_learning_unref(struct mac_learning *ml) if (ml && ovs_refcount_unref(&ml->ref_cnt) == 1) { struct mac_entry *e, *next; + ovs_rwlock_wrlock(&ml->rwlock); HMAP_FOR_EACH_SAFE (e, next, hmap_node, &ml->table) { - hmap_remove(&ml->table, &e->hmap_node); - free(e); + mac_learning_expire(ml, e); } hmap_destroy(&ml->table); + hmap_destroy(&ml->ports_by_ptr); + heap_destroy(&ml->ports_by_usage); bitmap_free(ml->flood_vlans); + ovs_rwlock_unlock(&ml->rwlock); ovs_rwlock_destroy(&ml->rwlock); free(ml); } @@ -207,11 +296,9 @@ mac_learning_may_learn(const struct mac_learning *ml, * by calling mac_learning_may_learn(), that 'src_mac' and 'vlan' are * learnable. * - * If the returned MAC entry is new (as may be determined by calling - * mac_entry_is_new()), then the caller must pass the new entry to - * mac_learning_changed(). The caller must also initialize the new entry's - * 'port' member. Otherwise calling those functions is at the caller's - * discretion. */ + * If the returned MAC entry is new (that is, if it has a NULL client-provided + * port, as returned by mac_entry_get_port()), then the caller must initialize + * the new entry's port to a nonnull value with mac_entry_set_port(). */ struct mac_entry * mac_learning_insert(struct mac_learning *ml, const uint8_t src_mac[ETH_ADDR_LEN], uint16_t vlan) @@ -223,8 +310,7 @@ mac_learning_insert(struct mac_learning *ml, uint32_t hash = mac_table_hash(ml, src_mac, vlan); if (hmap_count(&ml->table) >= ml->max_entries) { - get_lru(ml, &e); - mac_learning_expire(ml, e); + evict_mac_entry_fairly(ml); } e = xmalloc(sizeof *e); @@ -232,37 +318,25 @@ mac_learning_insert(struct mac_learning *ml, memcpy(e->mac, src_mac, ETH_ADDR_LEN); e->vlan = vlan; e->grat_arp_lock = TIME_MIN; - e->port.p = NULL; + e->mlport = NULL; + COVERAGE_INC(mac_learning_learned); } else { list_remove(&e->lru_node); } /* Mark 'e' as recently used. */ list_push_back(&ml->lrus, &e->lru_node); + if (e->mlport) { + list_remove(&e->port_lru_node); + list_push_back(&e->mlport->port_lrus, &e->port_lru_node); + } e->expires = time_now() + ml->idle_time; return e; } -/* Changes 'e''s tag to a new, randomly selected one. Causes - * mac_learning_run() to flag for revalidation the tag that would have been - * previously used for this entry's MAC and VLAN (either before 'e' was - * inserted, if it is new, or otherwise before its port was updated.) - * - * The client should call this function after obtaining a MAC learning entry - * from mac_learning_insert(), if the entry is either new or if its learned - * port has changed. */ -void -mac_learning_changed(struct mac_learning *ml) -{ - COVERAGE_INC(mac_learning_learned); - ml->need_revalidate = true; -} - /* Looks up MAC 'dst' for VLAN 'vlan' in 'ml' and returns the associated MAC - * learning entry, if any. If 'tag' is nonnull, then the tag that associates - * 'dst' and 'vlan' with its currently learned port will be OR'd into - * '*tag'. */ + * learning entry, if any. */ struct mac_entry * mac_learning_lookup(const struct mac_learning *ml, const uint8_t dst[ETH_ADDR_LEN], uint16_t vlan) @@ -278,7 +352,7 @@ mac_learning_lookup(const struct mac_learning *ml, } else { struct mac_entry *e = mac_entry_lookup(ml, dst, vlan); - ovs_assert(e == NULL || e->port.p != NULL); + ovs_assert(e == NULL || mac_entry_get_port(ml, e) != NULL); return e; } } @@ -287,21 +361,19 @@ mac_learning_lookup(const struct mac_learning *ml, void mac_learning_expire(struct mac_learning *ml, struct mac_entry *e) { + ml->need_revalidate = true; + mac_entry_set_port(ml, e, NULL); hmap_remove(&ml->table, &e->hmap_node); list_remove(&e->lru_node); free(e); } -/* Expires all the mac-learning entries in 'ml'. If not NULL, the tags in 'ml' - * are added to 'tags'. Otherwise the tags in 'ml' are discarded. The client - * is responsible for revalidating any flows that depend on 'ml', if - * necessary. */ +/* Expires all the mac-learning entries in 'ml'. */ void mac_learning_flush(struct mac_learning *ml) { struct mac_entry *e; while (get_lru(ml, &e)){ - ml->need_revalidate = true; mac_learning_expire(ml, e); } hmap_shrink(&ml->table); @@ -319,7 +391,6 @@ mac_learning_run(struct mac_learning *ml) && (hmap_count(&ml->table) > ml->max_entries || time_now() >= e->expires)) { COVERAGE_INC(mac_learning_expired); - ml->need_revalidate = true; mac_learning_expire(ml, e); } diff --git a/lib/mac-learning.h b/lib/mac-learning.h index 7f38339d5..079b04339 100644 --- a/lib/mac-learning.h +++ b/lib/mac-learning.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013 Nicira, Inc. + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2015 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -18,6 +18,7 @@ #define MAC_LEARNING_H 1 #include +#include "heap.h" #include "hmap.h" #include "list.h" #include "ovs-atomic.h" @@ -25,6 +26,66 @@ #include "packets.h" #include "timeval.h" +/* MAC learning table + * ================== + * + * A MAC learning table is a dictionary data structure that is specialized to + * map from an (Ethernet address, VLAN ID) pair to a user-provided pointer. In + * an Ethernet switch implementation, it used to keep track of the port on + * which a packet from a given Ethernet address was last seen. This knowledge + * is useful when the switch receives a packet to such an Ethernet address, so + * that the switch can send the packet directly to the correct port instead of + * having to flood it to every port. + * + * A few complications make the implementation into more than a simple wrapper + * around a hash table. First, and most simply, MAC learning can be disabled + * on a per-VLAN basis. (This is most useful for RSPAN; see + * ovs-vswitchd.conf.db(5) documentation of the "output_vlan" column in the + * Mirror table for more information.). The data structure maintains a bitmap + * to track such VLANs. + * + * Second, the implementation has the ability to "lock" a MAC table entry + * updated by a gratuitous ARP. This is a simple feature but the rationale for + * it is complicated. Please refer to the description of SLB bonding in + * vswitchd/INTERNALS for an explanation. + * + * Third, the implementation expires entries that are idle for longer than a + * configurable amount of time. This is implemented by keeping all of the + * current table entries on a list ordered from least recently used (LRU) to + * most recently used (MRU). Each time a MAC entry is used, it is moved to the + * MRU end of the list. Periodically mac_learning_run() sweeps through the + * list starting from the LRU end, deleting each entry that has been idle too + * long. + * + * Finally, the number of MAC learning table entries has a configurable maximum + * size to prevent memory exhaustion. When a new entry must be inserted but + * the table is already full, the implementation uses an eviction strategy + * based on fairness: it chooses the port that currently has greatest number of + * learned MACs (choosing arbitrarily in case of a tie), and among that port's + * entries it evicts the least recently used. (This is a security feature + * because it prevents an attacker from forcing other ports' MACs out of the + * MAC learning table with a "MAC flooding attack" that causes the other ports' + * traffic to be flooded so that the attacker can easily sniff it.) The + * implementation of this feature is like a specialized form of the + * general-purpose "eviction groups" that OVS implements in OpenFlow (see the + * documentation of the "groups" column in the Flow_Table table in + * ovs-vswitchd.conf.db(5) for details). + * + * + * Thread-safety + * ============= + * + * Many operations require the caller to take the MAC learning table's rwlock + * for writing (please refer to the Clang thread safety annotations). The + * important exception to this is mac_learning_lookup(), which only needs a + * read lock. This is useful for the common case where a MAC learning entry + * being looked up already exists and does not need an update. However, + * there's no deadlock-free way to upgrade a read lock to a write lock, so in + * the case where the lookup result means that an update is required, the + * caller must drop the read lock, take the write lock, and then repeat the + * lookup (in case some other thread has already made a change). + */ + struct mac_learning; /* Default maximum size of a MAC learning table, in entries. */ @@ -38,7 +99,7 @@ struct mac_learning; #define MAC_GRAT_ARP_LOCK_TIME 5 /* A MAC learning table entry. - * Guarded by owning 'mac_learning''s rwlock */ + * Guarded by owning 'mac_learning''s rwlock. */ struct mac_entry { struct hmap_node hmap_node; /* Node in a mac_learning hmap. */ time_t expires; /* Expiration time. */ @@ -47,14 +108,30 @@ struct mac_entry { uint16_t vlan; /* VLAN tag. */ /* The following are marked guarded to prevent users from iterating over or - * accessing a mac_entry without hodling the parent mac_learning rwlock. */ + * accessing a mac_entry without holding the parent mac_learning rwlock. */ struct ovs_list lru_node OVS_GUARDED; /* Element in 'lrus' list. */ - /* Learned port. */ - union { - void *p; - ofp_port_t ofp_port; - } port OVS_GUARDED; + /* Learned port. + * + * The client-specified data is mlport->port. */ + struct mac_learning_port *mlport; + struct ovs_list port_lru_node; /* In mac_learning_port's "port_lru"s. */ +}; + +static inline void *mac_entry_get_port(const struct mac_learning *ml, + const struct mac_entry *); +void mac_entry_set_port(struct mac_learning *, struct mac_entry *, void *port); + +/* Information about client-provided port pointers (the 'port' member), to + * allow for per-port fairness. + * + * The client-provided pointer is opaque to the MAC-learning table, which never + * dereferences it. */ +struct mac_learning_port { + struct hmap_node hmap_node; /* In mac_learning's "ports_by_ptr". */ + struct heap_node heap_node; /* In mac_learning's "ports_by_usage". */ + void *port; /* Client-provided port pointer. */ + struct ovs_list port_lrus; /* Contains "struct mac_entry"s by port_lru. */ }; /* Sets a gratuitous ARP lock on 'mac' that will expire in @@ -74,8 +151,7 @@ static inline bool mac_entry_is_grat_arp_locked(const struct mac_entry *mac) /* MAC learning table. */ struct mac_learning { struct hmap table; /* Learning table. */ - struct ovs_list lrus OVS_GUARDED; /* In-use entries, least recently used at the - front, most recently used at the back. */ + struct ovs_list lrus OVS_GUARDED; /* In-use entries, LRU at front. */ uint32_t secret; /* Secret for randomizing hash table. */ unsigned long *flood_vlans; /* Bitmap of learning disabled VLANs. */ unsigned int idle_time; /* Max age before deleting an entry. */ @@ -83,6 +159,21 @@ struct mac_learning { struct ovs_refcount ref_cnt; struct ovs_rwlock rwlock; bool need_revalidate; + + /* Fairness. + * + * Both of these data structures include the same "struct + * mac_learning_port" but indexed differently. + * + * ports_by_usage is a per-port max-heap, in which the priority is the + * number of MAC addresses for the port. When the MAC learning table + * overflows, this allows us to evict a MAC entry from one of the ports + * that have the largest number of MAC entries, achieving a form of + * fairness. + * + * ports_by_ptr is a hash table indexed by the client-provided pointer. */ + struct hmap ports_by_ptr; /* struct mac_learning_port hmap_nodes. */ + struct heap ports_by_usage; /* struct mac_learning_port heap_nodes. */ }; int mac_entry_age(const struct mac_learning *ml, const struct mac_entry *e) @@ -116,7 +207,6 @@ struct mac_entry *mac_learning_insert(struct mac_learning *ml, const uint8_t src[ETH_ADDR_LEN], uint16_t vlan) OVS_REQ_WRLOCK(ml->rwlock); -void mac_learning_changed(struct mac_learning *ml) OVS_REQ_WRLOCK(ml->rwlock); /* Lookup. */ struct mac_entry *mac_learning_lookup(const struct mac_learning *ml, @@ -128,5 +218,15 @@ struct mac_entry *mac_learning_lookup(const struct mac_learning *ml, void mac_learning_expire(struct mac_learning *ml, struct mac_entry *e) OVS_REQ_WRLOCK(ml->rwlock); void mac_learning_flush(struct mac_learning *ml) OVS_REQ_WRLOCK(ml->rwlock); + +/* Inlines. */ + +static inline void * +mac_entry_get_port(const struct mac_learning *ml OVS_UNUSED, + const struct mac_entry *e) + OVS_REQ_RDLOCK(ml->rwlock) +{ + return e->mlport ? e->mlport->port : NULL; +} #endif /* mac-learning.h */ diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index 7bcecc8ef..a0a10b86b 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -1802,9 +1802,10 @@ is_admissible(struct xlate_ctx *ctx, struct xport *in_port, case BV_DROP_IF_MOVED: ovs_rwlock_rdlock(&xbridge->ml->rwlock); mac = mac_learning_lookup(xbridge->ml, flow->dl_src, vlan); - if (mac && mac->port.p != in_xbundle->ofbundle && - (!is_gratuitous_arp(flow, &ctx->xout->wc) - || mac_entry_is_grat_arp_locked(mac))) { + if (mac + && mac_entry_get_port(xbridge->ml, mac) != in_xbundle->ofbundle + && (!is_gratuitous_arp(flow, &ctx->xout->wc) + || mac_entry_is_grat_arp_locked(mac))) { ovs_rwlock_unlock(&xbridge->ml->rwlock); xlate_report(ctx, "SLB bond thinks this packet looped back, " "dropping"); @@ -1856,7 +1857,7 @@ OVS_REQ_RDLOCK(ml->rwlock) } } - return mac->port.p != in_xbundle->ofbundle; + return mac_entry_get_port(ml, mac) != in_xbundle->ofbundle; } @@ -1892,7 +1893,7 @@ OVS_REQ_WRLOCK(xbridge->ml->rwlock) } } - if (mac->port.p != in_xbundle->ofbundle) { + if (mac_entry_get_port(xbridge->ml, mac) != in_xbundle->ofbundle) { /* The log messages here could actually be useful in debugging, * so keep the rate limit relatively high. */ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(30, 300); @@ -1902,8 +1903,7 @@ OVS_REQ_WRLOCK(xbridge->ml->rwlock) xbridge->name, ETH_ADDR_ARGS(flow->dl_src), in_xbundle->name, vlan); - mac->port.p = in_xbundle->ofbundle; - mac_learning_changed(xbridge->ml); + mac_entry_set_port(xbridge->ml, mac, in_xbundle->ofbundle); } } @@ -2269,7 +2269,7 @@ xlate_normal(struct xlate_ctx *ctx) } else { ovs_rwlock_rdlock(&ctx->xbridge->ml->rwlock); mac = mac_learning_lookup(ctx->xbridge->ml, flow->dl_dst, vlan); - mac_port = mac ? mac->port.p : NULL; + mac_port = mac ? mac_entry_get_port(ctx->xbridge->ml, mac) : NULL; ovs_rwlock_unlock(&ctx->xbridge->ml->rwlock); if (mac_port) { diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index 069f08725..d83f8872c 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014 Nicira, Inc. + * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -2564,7 +2564,7 @@ bundle_flush_macs(struct ofbundle *bundle, bool all_ofprotos) ofproto->backer->need_revalidate = REV_RECONFIGURE; ovs_rwlock_wrlock(&ml->rwlock); LIST_FOR_EACH_SAFE (mac, next_mac, lru_node, &ml->lrus) { - if (mac->port.p == bundle) { + if (mac_entry_get_port(ml, mac) == bundle) { if (all_ofprotos) { struct ofproto_dpif *o; @@ -2600,8 +2600,8 @@ bundle_move(struct ofbundle *old, struct ofbundle *new) ofproto->backer->need_revalidate = REV_RECONFIGURE; ovs_rwlock_wrlock(&ml->rwlock); LIST_FOR_EACH_SAFE (mac, next_mac, lru_node, &ml->lrus) { - if (mac->port.p == old) { - mac->port.p = new; + if (mac_entry_get_port(ml, mac) == old) { + mac_entry_set_port(ml, mac, new); } } ovs_rwlock_unlock(&ml->rwlock); @@ -2960,7 +2960,7 @@ bundle_send_learning_packets(struct ofbundle *bundle) list_init(&packets); ovs_rwlock_rdlock(&ofproto->ml->rwlock); LIST_FOR_EACH (e, lru_node, &ofproto->ml->lrus) { - if (e->port.p != bundle) { + if (mac_entry_get_port(ofproto->ml, e) != bundle) { void *port_void; learning_packet = bond_compose_learning_packet(bundle->bond, @@ -4352,7 +4352,7 @@ ofproto_unixctl_fdb_show(struct unixctl_conn *conn, int argc OVS_UNUSED, ds_put_cstr(&ds, " port VLAN MAC Age\n"); ovs_rwlock_rdlock(&ofproto->ml->rwlock); LIST_FOR_EACH (e, lru_node, &ofproto->ml->lrus) { - struct ofbundle *bundle = e->port.p; + struct ofbundle *bundle = mac_entry_get_port(ofproto->ml, e); char name[OFP_MAX_PORT_NAME_LEN]; ofputil_port_to_string(ofbundle_get_a_port(bundle)->up.ofp_port, diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at index defb5fc92..cdbd7fae5 100644 --- a/tests/ofproto-dpif.at +++ b/tests/ofproto-dpif.at @@ -4531,6 +4531,72 @@ AT_CHECK_UNQUOTED([ovs-appctl fdb/show br0 | sed 's/[[0-9]]\{1,\}$/?/' | sort], OVS_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([ofproto-dpif - MAC table overflow fairness]) +OVS_VSWITCHD_START( + [set bridge br0 fail-mode=standalone other-config:mac-table-size=10]) +ADD_OF_PORTS([br0], 1, 2, 3, 4, 5, 6) + +arp='eth_type(0x0806),arp(sip=192.168.0.1,tip=192.168.0.2,op=1,sha=50:54:00:00:00:05,tha=00:00:00:00:00:00)' + +AT_CHECK([ovs-appctl time/stop]) + +# Trace packets with 2 different source MACs arriving on each of the 5 +# ports, filling up the 10-entry learning table. +for i in 0 1 2 3 4 5 6 7 8 9; do + p=`expr $i / 2 + 1` + ovs-appctl ofproto/trace ovs-dummy "in_port($p),eth(src=50:54:00:00:00:0$i,dst=ff:ff:ff:ff:ff:ff),$arp" -generate + ovs-appctl time/warp 1000 +done + +# Check for the MAC learning entries. +AT_CHECK_UNQUOTED([ovs-appctl fdb/show br0 | sed 's/ *[[0-9]]\{1,\}$//' | sort], + [0], [dnl + 1 0 50:54:00:00:00:00 + 1 0 50:54:00:00:00:01 + 2 0 50:54:00:00:00:02 + 2 0 50:54:00:00:00:03 + 3 0 50:54:00:00:00:04 + 3 0 50:54:00:00:00:05 + 4 0 50:54:00:00:00:06 + 4 0 50:54:00:00:00:07 + 5 0 50:54:00:00:00:08 + 5 0 50:54:00:00:00:09 + port VLAN MAC Age +]) + +# Now trace 16 new MACs on another port. +for i in 0 1 2 3 4 5 6 7 8 9 a b c d e f; do + ovs-appctl ofproto/trace ovs-dummy "in_port(6),eth(src=50:54:00:00:0$i:ff,dst=ff:ff:ff:ff:ff:ff),$arp" -generate + ovs-appctl time/warp 1000 +done + +# Check the results. +# +# Our eviction algorithm on overflow is that an arbitrary (but deterministic) +# one of the ports with the most learned MACs loses the least recently used +# one. Thus, the new port will end up with 3 MACs, 3 of the old ports with 1 +# MAC each, and the other 2 of the old ports with 2 MACs each. +# +# (If someone changes lib/heap.c to do something different with equal-priority +# nodes, then the output below could change, but it would still follow the +# rules explained above.) +AT_CHECK_UNQUOTED([ovs-appctl fdb/show br0 | sed 's/ *[[0-9]]\{1,\}$//' | sort], + [0], [dnl + 1 0 50:54:00:00:00:01 + 2 0 50:54:00:00:00:03 + 3 0 50:54:00:00:00:04 + 3 0 50:54:00:00:00:05 + 4 0 50:54:00:00:00:07 + 5 0 50:54:00:00:00:08 + 5 0 50:54:00:00:00:09 + 6 0 50:54:00:00:0d:ff + 6 0 50:54:00:00:0e:ff + 6 0 50:54:00:00:0f:ff + port VLAN MAC Age +]) +OVS_VSWITCHD_STOP +AT_CLEANUP + # CHECK_SFLOW_SAMPLING_PACKET(LOOPBACK_ADDR, ADDR_WITHOUT_BRACKETS) # # Test that sFlow samples packets correctly using IPv4/IPv6 sFlow collector