src/libstd/collections/mod.rs

   1 // Copyright 2013-2014 The Rust Project Developers. See the COPYRIGHT
   2 // file at the top-level directory of this distribution and at
   3 // http://rust-lang.org/COPYRIGHT.
   4 //
   5 // Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
   6 // http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
   7 // <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
   8 // option. This file may not be copied, modified, or distributed
   9 // except according to those terms.
  10
  11 //! Collection types.
  12 //!
  13 //! Rust's standard collection library provides efficient implementations of the
  14 //! most common general purpose programming data structures. By using the
  15 //! standard implementations, it should be possible for two libraries to
  16 //! communicate without significant data conversion.
  17 //!
  18 //! To get this out of the way: you should probably just use [`Vec`] or [`HashMap`].
  19 //! These two collections cover most use cases for generic data storage and
  20 //! processing. They are exceptionally good at doing what they do. All the other
  21 //! collections in the standard library have specific use cases where they are
  22 //! the optimal choice, but these cases are borderline *niche* in comparison.
  23 //! Even when `Vec` and `HashMap` are technically suboptimal, they're probably a
  24 //! good enough choice to get started.
  25 //!
  26 //! Rust's collections can be grouped into four major categories:
  27 //!
  28 //! * Sequences: [`Vec`], [`VecDeque`], [`LinkedList`]
  29 //! * Maps: [`HashMap`], [`BTreeMap`]
  30 //! * Sets: [`HashSet`], [`BTreeSet`]
  31 //! * Misc: [`BinaryHeap`]
  32 //!
  33 //! # When Should You Use Which Collection?
  34 //!
  35 //! These are fairly high-level and quick break-downs of when each collection
  36 //! should be considered. Detailed discussions of strengths and weaknesses of
  37 //! individual collections can be found on their own documentation pages.
  38 //!
  39 //! ### Use a `Vec` when:
  40 //! * You want to collect items up to be processed or sent elsewhere later, and
  41 //!   don't care about any properties of the actual values being stored.
  42 //! * You want a sequence of elements in a particular order, and will only be
  43 //!   appending to (or near) the end.
  44 //! * You want a stack.
  45 //! * You want a resizable array.
  46 //! * You want a heap-allocated array.
  47 //!
  48 //! ### Use a `VecDeque` when:
  49 //! * You want a [`Vec`] that supports efficient insertion at both ends of the
  50 //!   sequence.
  51 //! * You want a queue.
  52 //! * You want a double-ended queue (deque).
  53 //!
  54 //! ### Use a `LinkedList` when:
  55 //! * You want a [`Vec`] or [`VecDeque`] of unknown size, and can't tolerate
  56 //!   amortization.
  57 //! * You want to efficiently split and append lists.
  58 //! * You are *absolutely* certain you *really*, *truly*, want a doubly linked
  59 //!   list.
  60 //!
  61 //! ### Use a `HashMap` when:
  62 //! * You want to associate arbitrary keys with an arbitrary value.
  63 //! * You want a cache.
  64 //! * You want a map, with no extra functionality.
  65 //!
  66 //! ### Use a `BTreeMap` when:
  67 //! * You're interested in what the smallest or largest key-value pair is.
  68 //! * You want to find the largest or smallest key that is smaller or larger
  69 //!   than something.
  70 //! * You want to be able to get all of the entries in order on-demand.
  71 //! * You want a map sorted by its keys.
  72 //!
  73 //! ### Use the `Set` variant of any of these `Map`s when:
  74 //! * You just want to remember which keys you've seen.
  75 //! * There is no meaningful value to associate with your keys.
  76 //! * You just want a set.
  77 //!
  78 //! ### Use a `BinaryHeap` when:
  79 //!
  80 //! * You want to store a bunch of elements, but only ever want to process the
  81 //!   "biggest" or "most important" one at any given time.
  82 //! * You want a priority queue.
  83 //!
  84 //! # Performance
  85 //!
  86 //! Choosing the right collection for the job requires an understanding of what
  87 //! each collection is good at. Here we briefly summarize the performance of
  88 //! different collections for certain important operations. For further details,
  89 //! see each type's documentation, and note that the names of actual methods may
  90 //! differ from the tables below on certain collections.
  91 //!
  92 //! Throughout the documentation, we will follow a few conventions. For all
  93 //! operations, the collection's size is denoted by n. If another collection is
  94 //! involved in the operation, it contains m elements. Operations which have an
  95 //! *amortized* cost are suffixed with a `*`. Operations with an *expected*
  96 //! cost are suffixed with a `~`.
  97 //!
  98 //! All amortized costs are for the potential need to resize when capacity is
  99 //! exhausted. If a resize occurs it will take O(n) time. Our collections never
 100 //! automatically shrink, so removal operations aren't amortized. Over a
 101 //! sufficiently large series of operations, the average cost per operation will
 102 //! deterministically equal the given cost.
 103 //!
 104 //! Only [`HashMap`] has expected costs, due to the probabilistic nature of hashing.
 105 //! It is theoretically possible, though very unlikely, for [`HashMap`] to
 106 //! experience worse performance.
 107 //!
 108 //! ## Sequences
 109 //!
 110 //! |                | get(i)         | insert(i)       | remove(i)      | append | split_off(i)   |
 111 //! |----------------|----------------|-----------------|----------------|--------|----------------|
 112 //! | [`Vec`]        | O(1)           | O(n-i)*         | O(n-i)         | O(m)*  | O(n-i)         |
 113 //! | [`VecDeque`]   | O(1)           | O(min(i, n-i))* | O(min(i, n-i)) | O(m)*  | O(min(i, n-i)) |
 114 //! | [`LinkedList`] | O(min(i, n-i)) | O(min(i, n-i))  | O(min(i, n-i)) | O(1)   | O(min(i, n-i)) |
 115 //!
 116 //! Note that where ties occur, [`Vec`] is generally going to be faster than [`VecDeque`], and
 117 //! [`VecDeque`] is generally going to be faster than [`LinkedList`].
 118 //!
 119 //! ## Maps
 120 //!
 121 //! For Sets, all operations have the cost of the equivalent Map operation.
 122 //!
 123 //! |              | get       | insert   | remove   | predecessor | append |
 124 //! |--------------|-----------|----------|----------|-------------|--------|
 125 //! | [`HashMap`]  | O(1)~     | O(1)~*   | O(1)~    | N/A         | N/A    |
 126 //! | [`BTreeMap`] | O(log n)  | O(log n) | O(log n) | O(log n)    | O(n+m) |
 127 //!
 128 //! # Correct and Efficient Usage of Collections
 129 //!
 130 //! Of course, knowing which collection is the right one for the job doesn't
 131 //! instantly permit you to use it correctly. Here are some quick tips for
 132 //! efficient and correct usage of the standard collections in general. If
 133 //! you're interested in how to use a specific collection in particular, consult
 134 //! its documentation for detailed discussion and code examples.
 135 //!
 136 //! ## Capacity Management
 137 //!
 138 //! Many collections provide several constructors and methods that refer to
 139 //! "capacity". These collections are generally built on top of an array.
 140 //! Optimally, this array would be exactly the right size to fit only the
 141 //! elements stored in the collection, but for the collection to do this would
 142 //! be very inefficient. If the backing array was exactly the right size at all
 143 //! times, then every time an element is inserted, the collection would have to
 144 //! grow the array to fit it. Due to the way memory is allocated and managed on
 145 //! most computers, this would almost surely require allocating an entirely new
 146 //! array and copying every single element from the old one into the new one.
 147 //! Hopefully you can see that this wouldn't be very efficient to do on every
 148 //! operation.
 149 //!
 150 //! Most collections therefore use an *amortized* allocation strategy. They
 151 //! generally let themselves have a fair amount of unoccupied space so that they
 152 //! only have to grow on occasion. When they do grow, they allocate a
 153 //! substantially larger array to move the elements into so that it will take a
 154 //! while for another grow to be required. While this strategy is great in
 155 //! general, it would be even better if the collection *never* had to resize its
 156 //! backing array. Unfortunately, the collection itself doesn't have enough
 157 //! information to do this itself. Therefore, it is up to us programmers to give
 158 //! it hints.
 159 //!
 160 //! Any `with_capacity()` constructor will instruct the collection to allocate
 161 //! enough space for the specified number of elements. Ideally this will be for
 162 //! exactly that many elements, but some implementation details may prevent
 163 //! this. [`Vec`] and [`VecDeque`] can be relied on to allocate exactly the
 164 //! requested amount, though. Use `with_capacity()` when you know exactly how many
 165 //! elements will be inserted, or at least have a reasonable upper-bound on that
 166 //! number.
 167 //!
 168 //! When anticipating a large influx of elements, the `reserve()` family of
 169 //! methods can be used to hint to the collection how much room it should make
 170 //! for the coming items. As with `with_capacity()`, the precise behavior of
 171 //! these methods will be specific to the collection of interest.
 172 //!
 173 //! For optimal performance, collections will generally avoid shrinking
 174 //! themselves. If you believe that a collection will not soon contain any more
 175 //! elements, or just really need the memory, the `shrink_to_fit()` method prompts
 176 //! the collection to shrink the backing array to the minimum size capable of
 177 //! holding its elements.
 178 //!
 179 //! Finally, if ever you're interested in what the actual capacity of the
 180 //! collection is, most collections provide a `capacity()` method to query this
 181 //! information on demand. This can be useful for debugging purposes, or for
 182 //! use with the `reserve()` methods.
 183 //!
 184 //! ## Iterators
 185 //!
 186 //! Iterators are a powerful and robust mechanism used throughout Rust's
 187 //! standard libraries. Iterators provide a sequence of values in a generic,
 188 //! safe, efficient and convenient way. The contents of an iterator are usually
 189 //! *lazily* evaluated, so that only the values that are actually needed are
 190 //! ever actually produced, and no allocation need be done to temporarily store
 191 //! them. Iterators are primarily consumed using a `for` loop, although many
 192 //! functions also take iterators where a collection or sequence of values is
 193 //! desired.
 194 //!
 195 //! All of the standard collections provide several iterators for performing
 196 //! bulk manipulation of their contents. The three primary iterators almost
 197 //! every collection should provide are `iter()`, `iter_mut()`, and `into_iter()`.
 198 //! Some of these are not provided on collections where it would be unsound or
 199 //! unreasonable to provide them.
 200 //!
 201 //! `iter()` provides an iterator of immutable references to all the contents of a
 202 //! collection in the most "natural" order. For sequence collections like [`Vec`],
 203 //! this means the items will be yielded in increasing order of index starting
 204 //! at 0. For ordered collections like [`BTreeMap`], this means that the items
 205 //! will be yielded in sorted order. For unordered collections like [`HashMap`],
 206 //! the items will be yielded in whatever order the internal representation made
 207 //! most convenient. This is great for reading through all the contents of the
 208 //! collection.
 209 //!
 210 //! ```
 211 //! let vec = vec![1, 2, 3, 4];
 212 //! for x in vec.iter() {
 213 //!    println!("vec contained {}", x);
 214 //! }
 215 //! ```
 216 //!
 217 //! `iter_mut()` provides an iterator of *mutable* references in the same order as
 218 //! `iter()`. This is great for mutating all the contents of the collection.
 219 //!
 220 //! ```
 221 //! let mut vec = vec![1, 2, 3, 4];
 222 //! for x in vec.iter_mut() {
 223 //!    *x += 1;
 224 //! }
 225 //! ```
 226 //!
 227 //! `into_iter()` transforms the actual collection into an iterator over its
 228 //! contents by-value. This is great when the collection itself is no longer
 229 //! needed, and the values are needed elsewhere. Using `extend()` with `into_iter()`
 230 //! is the main way that contents of one collection are moved into another.
 231 //! `extend()` automatically calls `into_iter()`, and takes any `T: `[`IntoIterator`].
 232 //! Calling `collect()` on an iterator itself is also a great way to convert one
 233 //! collection into another. Both of these methods should internally use the
 234 //! capacity management tools discussed in the previous section to do this as
 235 //! efficiently as possible.
 236 //!
 237 //! ```
 238 //! let mut vec1 = vec![1, 2, 3, 4];
 239 //! let vec2 = vec![10, 20, 30, 40];
 240 //! vec1.extend(vec2);
 241 //! ```
 242 //!
 243 //! ```
 244 //! use std::collections::VecDeque;
 245 //!
 246 //! let vec = vec![1, 2, 3, 4];
 247 //! let buf: VecDeque<_> = vec.into_iter().collect();
 248 //! ```
 249 //!
 250 //! Iterators also provide a series of *adapter* methods for performing common
 251 //! threads to sequences. Among the adapters are functional favorites like `map()`,
 252 //! `fold()`, `skip()` and `take()`. Of particular interest to collections is the
 253 //! `rev()` adapter, that reverses any iterator that supports this operation. Most
 254 //! collections provide reversible iterators as the way to iterate over them in
 255 //! reverse order.
 256 //!
 257 //! ```
 258 //! let vec = vec![1, 2, 3, 4];
 259 //! for x in vec.iter().rev() {
 260 //!    println!("vec contained {}", x);
 261 //! }
 262 //! ```
 263 //!
 264 //! Several other collection methods also return iterators to yield a sequence
 265 //! of results but avoid allocating an entire collection to store the result in.
 266 //! This provides maximum flexibility as `collect()` or `extend()` can be called to
 267 //! "pipe" the sequence into any collection if desired. Otherwise, the sequence
 268 //! can be looped over with a `for` loop. The iterator can also be discarded
 269 //! after partial use, preventing the computation of the unused items.
 270 //!
 271 //! ## Entries
 272 //!
 273 //! The `entry()` API is intended to provide an efficient mechanism for
 274 //! manipulating the contents of a map conditionally on the presence of a key or
 275 //! not. The primary motivating use case for this is to provide efficient
 276 //! accumulator maps. For instance, if one wishes to maintain a count of the
 277 //! number of times each key has been seen, they will have to perform some
 278 //! conditional logic on whether this is the first time the key has been seen or
 279 //! not. Normally, this would require a `find()` followed by an `insert()`,
 280 //! effectively duplicating the search effort on each insertion.
 281 //!
 282 //! When a user calls `map.entry(&key)`, the map will search for the key and
 283 //! then yield a variant of the `Entry` enum.
 284 //!
 285 //! If a `Vacant(entry)` is yielded, then the key *was not* found. In this case
 286 //! the only valid operation is to `insert()` a value into the entry. When this is
 287 //! done, the vacant entry is consumed and converted into a mutable reference to
 288 //! the value that was inserted. This allows for further manipulation of the
 289 //! value beyond the lifetime of the search itself. This is useful if complex
 290 //! logic needs to be performed on the value regardless of whether the value was
 291 //! just inserted.
 292 //!
 293 //! If an `Occupied(entry)` is yielded, then the key *was* found. In this case,
 294 //! the user has several options: they can `get()`, `insert()` or `remove()` the
 295 //! value of the occupied entry. Additionally, they can convert the occupied
 296 //! entry into a mutable reference to its value, providing symmetry to the
 297 //! vacant `insert()` case.
 298 //!
 299 //! ### Examples
 300 //!
 301 //! Here are the two primary ways in which `entry()` is used. First, a simple
 302 //! example where the logic performed on the values is trivial.
 303 //!
 304 //! #### Counting the number of times each character in a string occurs
 305 //!
 306 //! ```
 307 //! use std::collections::btree_map::BTreeMap;
 308 //!
 309 //! let mut count = BTreeMap::new();
 310 //! let message = "she sells sea shells by the sea shore";
 311 //!
 312 //! for c in message.chars() {
 313 //!     *count.entry(c).or_insert(0) += 1;
 314 //! }
 315 //!
 316 //! assert_eq!(count.get(&'s'), Some(&8));
 317 //!
 318 //! println!("Number of occurrences of each character");
 319 //! for (char, count) in &count {
 320 //!     println!("{}: {}", char, count);
 321 //! }
 322 //! ```
 323 //!
 324 //! When the logic to be performed on the value is more complex, we may simply
 325 //! use the `entry()` API to ensure that the value is initialized and perform the
 326 //! logic afterwards.
 327 //!
 328 //! #### Tracking the inebriation of customers at a bar
 329 //!
 330 //! ```
 331 //! use std::collections::btree_map::BTreeMap;
 332 //!
 333 //! // A client of the bar. They have a blood alcohol level.
 334 //! struct Person { blood_alcohol: f32 }
 335 //!
 336 //! // All the orders made to the bar, by client id.
 337 //! let orders = vec![1,2,1,2,3,4,1,2,2,3,4,1,1,1];
 338 //!
 339 //! // Our clients.
 340 //! let mut blood_alcohol = BTreeMap::new();
 341 //!
 342 //! for id in orders {
 343 //!     // If this is the first time we've seen this customer, initialize them
 344 //!     // with no blood alcohol. Otherwise, just retrieve them.
 345 //!     let person = blood_alcohol.entry(id).or_insert(Person { blood_alcohol: 0.0 });
 346 //!
 347 //!     // Reduce their blood alcohol level. It takes time to order and drink a beer!
 348 //!     person.blood_alcohol *= 0.9;
 349 //!
 350 //!     // Check if they're sober enough to have another beer.
 351 //!     if person.blood_alcohol > 0.3 {
 352 //!         // Too drunk... for now.
 353 //!         println!("Sorry {}, I have to cut you off", id);
 354 //!     } else {
 355 //!         // Have another!
 356 //!         person.blood_alcohol += 0.1;
 357 //!     }
 358 //! }
 359 //! ```
 360 //!
 361 //! # Insert and complex keys
 362 //!
 363 //! If we have a more complex key, calls to `insert()` will
 364 //! not update the value of the key. For example:
 365 //!
 366 //! ```
 367 //! use std::cmp::Ordering;
 368 //! use std::collections::BTreeMap;
 369 //! use std::hash::{Hash, Hasher};
 370 //!
 371 //! #[derive(Debug)]
 372 //! struct Foo {
 373 //!     a: u32,
 374 //!     b: &'static str,
 375 //! }
 376 //!
 377 //! // we will compare `Foo`s by their `a` value only.
 378 //! impl PartialEq for Foo {
 379 //!     fn eq(&self, other: &Self) -> bool { self.a == other.a }
 380 //! }
 381 //!
 382 //! impl Eq for Foo {}
 383 //!
 384 //! // we will hash `Foo`s by their `a` value only.
 385 //! impl Hash for Foo {
 386 //!     fn hash<H: Hasher>(&self, h: &mut H) { self.a.hash(h); }
 387 //! }
 388 //!
 389 //! impl PartialOrd for Foo {
 390 //!     fn partial_cmp(&self, other: &Self) -> Option<Ordering> { self.a.partial_cmp(&other.a) }
 391 //! }
 392 //!
 393 //! impl Ord for Foo {
 394 //!     fn cmp(&self, other: &Self) -> Ordering { self.a.cmp(&other.a) }
 395 //! }
 396 //!
 397 //! let mut map = BTreeMap::new();
 398 //! map.insert(Foo { a: 1, b: "baz" }, 99);
 399 //!
 400 //! // We already have a Foo with an a of 1, so this will be updating the value.
 401 //! map.insert(Foo { a: 1, b: "xyz" }, 100);
 402 //!
 403 //! // The value has been updated...
 404 //! assert_eq!(map.values().next().unwrap(), &100);
 405 //!
 406 //! // ...but the key hasn't changed. b is still "baz", not "xyz".
 407 //! assert_eq!(map.keys().next().unwrap().b, "baz");
 408 //! ```
 409 //!
 410 //! [`Vec`]: ../../std/vec/struct.Vec.html
 411 //! [`HashMap`]: ../../std/collections/struct.HashMap.html
 412 //! [`VecDeque`]: ../../std/collections/struct.VecDeque.html
 413 //! [`LinkedList`]: ../../std/collections/struct.LinkedList.html
 414 //! [`BTreeMap`]: ../../std/collections/struct.BTreeMap.html
 415 //! [`HashSet`]: ../../std/collections/struct.HashSet.html
 416 //! [`BTreeSet`]: ../../std/collections/struct.BTreeSet.html
 417 //! [`BinaryHeap`]: ../../std/collections/struct.BinaryHeap.html
 418 //! [`IntoIterator`]: ../../std/iter/trait.IntoIterator.html
 419
 420 #![stable(feature = "rust1", since = "1.0.0")]
 421
 422 #[stable(feature = "rust1", since = "1.0.0")]
 423 pub use core_collections::Bound;
 424 #[stable(feature = "rust1", since = "1.0.0")]
 425 pub use core_collections::{BinaryHeap, BTreeMap, BTreeSet};
 426 #[stable(feature = "rust1", since = "1.0.0")]
 427 pub use core_collections::{LinkedList, VecDeque};
 428 #[stable(feature = "rust1", since = "1.0.0")]
 429 pub use core_collections::{binary_heap, btree_map, btree_set};
 430 #[stable(feature = "rust1", since = "1.0.0")]
 431 pub use core_collections::{linked_list, vec_deque};
 432
 433 #[stable(feature = "rust1", since = "1.0.0")]
 434 pub use self::hash_map::HashMap;
 435 #[stable(feature = "rust1", since = "1.0.0")]
 436 pub use self::hash_set::HashSet;
 437
 438 #[stable(feature = "rust1", since = "1.0.0")]
 439 pub use core_collections::range;
 440
 441 mod hash;
 442
 443 #[stable(feature = "rust1", since = "1.0.0")]
 444 pub mod hash_map {
 445     //! A hash map implementation which uses linear probing with Robin
 446     //! Hood bucket stealing.
 447     #[stable(feature = "rust1", since = "1.0.0")]
 448     pub use super::hash::map::*;
 449 }
 450
 451 #[stable(feature = "rust1", since = "1.0.0")]
 452 pub mod hash_set {
 453     //! An implementation of a hash set using the underlying representation of a
 454     //! HashMap where the value is ().
 455     #[stable(feature = "rust1", since = "1.0.0")]
 456     pub use super::hash::set::*;
 457 }