3 A Rust library for parsing, compiling, and executing regular expressions. Its
4 syntax is similar to Perl-style regular expressions, but lacks a few features
5 like look around and backreferences. In exchange, all searches execute in
6 linear time with respect to the size of the regular expression and search text.
7 Much of the syntax and implementation is inspired
8 by [RE2](https://github.com/google/re2).
10 [![Build status](https://github.com/rust-lang/regex/workflows/ci/badge.svg)](https://github.com/rust-lang/regex/actions)
11 [![Crates.io](https://img.shields.io/crates/v/regex.svg)](https://crates.io/crates/regex)
12 [![Rust](https://img.shields.io/badge/rust-1.41.1%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex)
16 [Module documentation with examples](https://docs.rs/regex).
17 The module documentation also includes a comprehensive description of the
20 Documentation with examples for the various matching functions and iterators
22 [`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html).
26 To bring this crate into your repository, either add `regex` to your
27 `Cargo.toml`, or run `cargo add regex`.
29 Here's a simple example that matches a date in YYYY-MM-DD format and prints the
36 let re = Regex::new(r"(?x)
37 (?P<year>\d{4}) # the year
39 (?P<month>\d{2}) # the month
41 (?P<day>\d{2}) # the day
43 let caps = re.captures("2010-03-14").unwrap();
45 assert_eq!("2010", &caps["year"]);
46 assert_eq!("03", &caps["month"]);
47 assert_eq!("14", &caps["day"]);
51 If you have lots of dates in text that you'd like to iterate over, then it's
52 easy to adapt the above example with an iterator:
57 const TO_SEARCH: &'static str = "
58 On 2010-03-14, foo happened. On 2014-10-14, bar happened.
62 let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
64 for caps in re.captures_iter(TO_SEARCH) {
65 // Note that all of the unwraps are actually OK for this regex
66 // because the only way for the regex to match is if all of the
67 // capture groups match. This is not true in general though!
68 println!("year: {}, month: {}, day: {}",
69 caps.get(1).unwrap().as_str(),
70 caps.get(2).unwrap().as_str(),
71 caps.get(3).unwrap().as_str());
79 year: 2010, month: 03, day: 14
80 year: 2014, month: 10, day: 14
83 ### Usage: Avoid compiling the same regex in a loop
85 It is an anti-pattern to compile the same regular expression in a loop since
86 compilation is typically expensive. (It takes anywhere from a few microseconds
87 to a few **milliseconds** depending on the size of the regex.) Not only is
88 compilation itself expensive, but this also prevents optimizations that reuse
89 allocations internally to the matching engines.
91 In Rust, it can sometimes be a pain to pass regular expressions around if
92 they're used from inside a helper function. Instead, we recommend using the
93 [`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that
94 regular expressions are compiled exactly once.
101 fn some_helper_function(text: &str) -> bool {
103 static ref RE: Regex = Regex::new("...").unwrap();
109 Specifically, in this example, the regex will be compiled when it is used for
110 the first time. On subsequent uses, it will reuse the previous compilation.
112 ### Usage: match regular expressions on `&[u8]`
114 The main API of this crate (`regex::Regex`) requires the caller to pass a
115 `&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which
116 means the main API can't be used for searching arbitrary bytes.
118 To match on arbitrary bytes, use the `regex::bytes::Regex` API. The API
119 is identical to the main API, except that it takes an `&[u8]` to search
120 on instead of an `&str`. By default, `.` will match any *byte* using
121 `regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar
122 value* using the main API.
124 This example shows how to find all null-terminated strings in a slice of bytes:
127 use regex::bytes::Regex;
129 let re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap();
130 let text = b"foo\x00bar\x00baz\x00";
132 // Extract all of the strings without the null terminator from each match.
133 // The unwrap is OK here since a match requires the `cstr` capture to match.
134 let cstrs: Vec<&[u8]> =
135 re.captures_iter(text)
136 .map(|c| c.name("cstr").unwrap().as_bytes())
138 assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs);
141 Notice here that the `[^\x00]+` will match any *byte* except for `NUL`. When
142 using the main API, `[^\x00]+` would instead match any valid UTF-8 sequence
145 ### Usage: match multiple regular expressions simultaneously
147 This demonstrates how to use a `RegexSet` to match multiple (possibly
148 overlapping) regular expressions in a single scan of the search text:
153 let set = RegexSet::new(&[
163 // Iterate over and collect all of the matches.
164 let matches: Vec<_> = set.matches("foobar").into_iter().collect();
165 assert_eq!(matches, vec![0, 2, 3, 4, 6]);
167 // You can also test whether a particular regex matched:
168 let matches = set.matches("foobar");
169 assert!(!matches.matched(5));
170 assert!(matches.matched(6));
173 ### Usage: enable SIMD optimizations
175 SIMD optimizations are enabled automatically on Rust stable 1.27 and newer.
176 For nightly versions of Rust, this requires a recent version with the SIMD
180 ### Usage: a regular expression parser
182 This repository contains a crate that provides a well tested regular expression
183 parser, abstract syntax and a high-level intermediate representation for
184 convenient analysis. It provides no facilities for compilation or execution.
185 This may be useful if you're implementing your own regex engine or otherwise
186 need to do analysis on the syntax of a regular expression. It is otherwise not
187 recommended for general use.
189 [Documentation `regex-syntax`.](https://docs.rs/regex-syntax)
194 This crate comes with several features that permit tweaking the trade off
195 between binary size, compilation time and runtime performance. Users of this
196 crate can selectively disable Unicode tables, or choose from a variety of
197 optimizations performed by this crate to disable.
199 When all of these features are disabled, runtime match performance may be much
200 worse, but if you're matching on short strings, or if high performance isn't
201 necessary, then such a configuration is perfectly serviceable. To disable
202 all such features, use the following `Cargo.toml` dependency configuration:
207 default-features = false
208 # regex currently requires the standard library, you must re-enable it.
212 This will reduce the dependency tree of `regex` down to a single crate
215 The full set of features one can disable are
216 [in the "Crate features" section of the documentation](https://docs.rs/regex/*/#crate-features).
219 ### Minimum Rust version policy
221 This crate's minimum supported `rustc` version is `1.41.1`.
223 The current **tentative** policy is that the minimum Rust version required
224 to use this crate can be increased in minor version updates. For example, if
225 regex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will
226 also require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a
227 newer minimum version of Rust.
229 In general, this crate will be conservative with respect to the minimum
230 supported version of Rust.
235 This project is licensed under either of
237 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
238 https://www.apache.org/licenses/LICENSE-2.0)
239 * MIT license ([LICENSE-MIT](LICENSE-MIT) or
240 https://opensource.org/licenses/MIT)
244 The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode
246 ([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)).