]>
Commit | Line | Data |
---|---|---|
8bb4bdeb XL |
1 | regex |
2 | ===== | |
3 | A Rust library for parsing, compiling, and executing regular expressions. Its | |
4 | syntax is similar to Perl-style regular expressions, but lacks a few features | |
5 | like look around and backreferences. In exchange, all searches execute in | |
6 | linear time with respect to the size of the regular expression and search text. | |
7 | Much of the syntax and implementation is inspired | |
8 | by [RE2](https://github.com/google/re2). | |
9 | ||
10 | [![Build Status](https://travis-ci.org/rust-lang/regex.svg?branch=master)](https://travis-ci.org/rust-lang/regex) | |
11 | [![Build status](https://ci.appveyor.com/api/projects/status/github/rust-lang/regex?svg=true)](https://ci.appveyor.com/project/rust-lang-libs/regex) | |
12 | [![Coverage Status](https://coveralls.io/repos/github/rust-lang/regex/badge.svg?branch=master)](https://coveralls.io/github/rust-lang/regex?branch=master) | |
13 | [![](http://meritbadge.herokuapp.com/regex)](https://crates.io/crates/regex) | |
94b46f34 | 14 | [![Rust](https://img.shields.io/badge/rust-1.20%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex) |
8bb4bdeb XL |
15 | |
16 | ### Documentation | |
17 | ||
2c00a5a8 | 18 | [Module documentation with examples](https://docs.rs/regex). |
0531ce1d XL |
19 | The module documentation also includes a comprehensive description of the |
20 | syntax supported. | |
8bb4bdeb XL |
21 | |
22 | Documentation with examples for the various matching functions and iterators | |
23 | can be found on the | |
2c00a5a8 | 24 | [`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html). |
8bb4bdeb XL |
25 | |
26 | ### Usage | |
27 | ||
28 | Add this to your `Cargo.toml`: | |
29 | ||
30 | ```toml | |
31 | [dependencies] | |
b7449926 | 32 | regex = "1" |
8bb4bdeb XL |
33 | ``` |
34 | ||
35 | and this to your crate root: | |
36 | ||
37 | ```rust | |
38 | extern crate regex; | |
39 | ``` | |
40 | ||
41 | Here's a simple example that matches a date in YYYY-MM-DD format and prints the | |
42 | year, month and day: | |
43 | ||
44 | ```rust | |
45 | extern crate regex; | |
46 | ||
47 | use regex::Regex; | |
48 | ||
49 | fn main() { | |
50 | let re = Regex::new(r"(?x) | |
51 | (?P<year>\d{4}) # the year | |
52 | - | |
53 | (?P<month>\d{2}) # the month | |
54 | - | |
55 | (?P<day>\d{2}) # the day | |
56 | ").unwrap(); | |
57 | let caps = re.captures("2010-03-14").unwrap(); | |
58 | ||
041b39d2 XL |
59 | assert_eq!("2010", &caps["year"]); |
60 | assert_eq!("03", &caps["month"]); | |
61 | assert_eq!("14", &caps["day"]); | |
8bb4bdeb XL |
62 | } |
63 | ``` | |
64 | ||
65 | If you have lots of dates in text that you'd like to iterate over, then it's | |
66 | easy to adapt the above example with an iterator: | |
67 | ||
68 | ```rust | |
69 | extern crate regex; | |
70 | ||
71 | use regex::Regex; | |
72 | ||
73 | const TO_SEARCH: &'static str = " | |
74 | On 2010-03-14, foo happened. On 2014-10-14, bar happened. | |
75 | "; | |
76 | ||
77 | fn main() { | |
78 | let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap(); | |
79 | ||
80 | for caps in re.captures_iter(TO_SEARCH) { | |
81 | // Note that all of the unwraps are actually OK for this regex | |
82 | // because the only way for the regex to match is if all of the | |
83 | // capture groups match. This is not true in general though! | |
84 | println!("year: {}, month: {}, day: {}", | |
85 | caps.get(1).unwrap().as_str(), | |
86 | caps.get(2).unwrap().as_str(), | |
87 | caps.get(3).unwrap().as_str()); | |
88 | } | |
89 | } | |
90 | ``` | |
91 | ||
92 | This example outputs: | |
93 | ||
94 | ``` | |
95 | year: 2010, month: 03, day: 14 | |
96 | year: 2014, month: 10, day: 14 | |
97 | ``` | |
98 | ||
99 | ### Usage: Avoid compiling the same regex in a loop | |
100 | ||
101 | It is an anti-pattern to compile the same regular expression in a loop since | |
102 | compilation is typically expensive. (It takes anywhere from a few microseconds | |
103 | to a few **milliseconds** depending on the size of the regex.) Not only is | |
104 | compilation itself expensive, but this also prevents optimizations that reuse | |
105 | allocations internally to the matching engines. | |
106 | ||
107 | In Rust, it can sometimes be a pain to pass regular expressions around if | |
108 | they're used from inside a helper function. Instead, we recommend using the | |
109 | [`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that | |
110 | regular expressions are compiled exactly once. | |
111 | ||
112 | For example: | |
113 | ||
114 | ```rust | |
115 | #[macro_use] extern crate lazy_static; | |
116 | extern crate regex; | |
117 | ||
118 | use regex::Regex; | |
119 | ||
120 | fn some_helper_function(text: &str) -> bool { | |
121 | lazy_static! { | |
122 | static ref RE: Regex = Regex::new("...").unwrap(); | |
123 | } | |
124 | RE.is_match(text) | |
125 | } | |
126 | ``` | |
127 | ||
128 | Specifically, in this example, the regex will be compiled when it is used for | |
129 | the first time. On subsequent uses, it will reuse the previous compilation. | |
130 | ||
131 | ### Usage: match regular expressions on `&[u8]` | |
132 | ||
133 | The main API of this crate (`regex::Regex`) requires the caller to pass a | |
134 | `&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which | |
135 | means the main API can't be used for searching arbitrary bytes. | |
136 | ||
137 | To match on arbitrary bytes, use the `regex::bytes::Regex` API. The API | |
138 | is identical to the main API, except that it takes an `&[u8]` to search | |
139 | on instead of an `&str`. By default, `.` will match any *byte* using | |
140 | `regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar | |
141 | value* using the main API. | |
142 | ||
143 | This example shows how to find all null-terminated strings in a slice of bytes: | |
144 | ||
145 | ```rust | |
146 | use regex::bytes::Regex; | |
147 | ||
148 | let re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap(); | |
149 | let text = b"foo\x00bar\x00baz\x00"; | |
150 | ||
151 | // Extract all of the strings without the null terminator from each match. | |
152 | // The unwrap is OK here since a match requires the `cstr` capture to match. | |
153 | let cstrs: Vec<&[u8]> = | |
154 | re.captures_iter(text) | |
155 | .map(|c| c.name("cstr").unwrap().as_bytes()) | |
156 | .collect(); | |
157 | assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs); | |
158 | ``` | |
159 | ||
160 | Notice here that the `[^\x00]+` will match any *byte* except for `NUL`. When | |
161 | using the main API, `[^\x00]+` would instead match any valid UTF-8 sequence | |
162 | except for `NUL`. | |
163 | ||
164 | ### Usage: match multiple regular expressions simultaneously | |
165 | ||
166 | This demonstrates how to use a `RegexSet` to match multiple (possibly | |
167 | overlapping) regular expressions in a single scan of the search text: | |
168 | ||
169 | ```rust | |
170 | use regex::RegexSet; | |
171 | ||
172 | let set = RegexSet::new(&[ | |
173 | r"\w+", | |
174 | r"\d+", | |
175 | r"\pL+", | |
176 | r"foo", | |
177 | r"bar", | |
178 | r"barfoo", | |
179 | r"foobar", | |
180 | ]).unwrap(); | |
181 | ||
182 | // Iterate over and collect all of the matches. | |
183 | let matches: Vec<_> = set.matches("foobar").into_iter().collect(); | |
184 | assert_eq!(matches, vec![0, 2, 3, 4, 6]); | |
185 | ||
186 | // You can also test whether a particular regex matched: | |
187 | let matches = set.matches("foobar"); | |
188 | assert!(!matches.matched(5)); | |
189 | assert!(matches.matched(6)); | |
190 | ``` | |
191 | ||
0531ce1d XL |
192 | ### Usage: enable SIMD optimizations |
193 | ||
b7449926 XL |
194 | SIMD optimizations are enabled automatically on Rust stable 1.27 and newer. |
195 | For nightly versions of Rust, this requires a recent version with the SIMD | |
196 | features stabilized. | |
0531ce1d | 197 | |
8bb4bdeb XL |
198 | |
199 | ### Usage: a regular expression parser | |
200 | ||
201 | This repository contains a crate that provides a well tested regular expression | |
0531ce1d XL |
202 | parser, abstract syntax and a high-level intermediate representation for |
203 | convenient analysis. It provides no facilities for compilation or execution. | |
204 | This may be useful if you're implementing your own regex engine or otherwise | |
205 | need to do analysis on the syntax of a regular expression. It is otherwise not | |
206 | recommended for general use. | |
8bb4bdeb | 207 | |
0531ce1d | 208 | [Documentation `regex-syntax`.](https://docs.rs/regex-syntax) |
8bb4bdeb | 209 | |
94b46f34 XL |
210 | |
211 | ### Minimum Rust version policy | |
212 | ||
0731742a | 213 | This crate's minimum supported `rustc` version is `1.24.1`. |
94b46f34 | 214 | |
b7449926 XL |
215 | The current **tentative** policy is that the minimum Rust version required |
216 | to use this crate can be increased in minor version updates. For example, if | |
217 | regex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will | |
218 | also require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a | |
219 | newer minimum version of Rust. | |
94b46f34 XL |
220 | |
221 | In general, this crate will be conservative with respect to the minimum | |
222 | supported version of Rust. | |
223 | ||
224 | ||
225 | ### License | |
8bb4bdeb | 226 | |
ff7c6d11 | 227 | This project is licensed under either of |
8bb4bdeb | 228 | |
ff7c6d11 XL |
229 | * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or |
230 | http://www.apache.org/licenses/LICENSE-2.0) | |
231 | * MIT license ([LICENSE-MIT](LICENSE-MIT) or | |
232 | http://opensource.org/licenses/MIT) | |
233 | ||
234 | at your option. | |
0731742a XL |
235 | |
236 | The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode | |
237 | License Agreement | |
238 | ([LICENSE-UNICODE](http://www.unicode.org/copyright.html#License)). |