]> git.proxmox.com Git - cargo.git/blob - vendor/regex/README.md
New upstream version 0.35.0
[cargo.git] / vendor / regex / README.md
1 regex
2 =====
3 A Rust library for parsing, compiling, and executing regular expressions. Its
4 syntax is similar to Perl-style regular expressions, but lacks a few features
5 like look around and backreferences. In exchange, all searches execute in
6 linear time with respect to the size of the regular expression and search text.
7 Much of the syntax and implementation is inspired
8 by [RE2](https://github.com/google/re2).
9
10 [![Build Status](https://travis-ci.com/rust-lang/regex.svg?branch=master)](https://travis-ci.com/rust-lang/regex)
11 [![Build status](https://ci.appveyor.com/api/projects/status/github/rust-lang/regex?svg=true)](https://ci.appveyor.com/project/rust-lang-libs/regex)
12 [![Coverage Status](https://coveralls.io/repos/github/rust-lang/regex/badge.svg?branch=master)](https://coveralls.io/github/rust-lang/regex?branch=master)
13 [![](https://meritbadge.herokuapp.com/regex)](https://crates.io/crates/regex)
14 [![Rust](https://img.shields.io/badge/rust-1.24.1%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex)
15
16 ### Documentation
17
18 [Module documentation with examples](https://docs.rs/regex).
19 The module documentation also includes a comprehensive description of the
20 syntax supported.
21
22 Documentation with examples for the various matching functions and iterators
23 can be found on the
24 [`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html).
25
26 ### Usage
27
28 Add this to your `Cargo.toml`:
29
30 ```toml
31 [dependencies]
32 regex = "1"
33 ```
34
35 and this to your crate root (if you're using Rust 2015):
36
37 ```rust
38 extern crate regex;
39 ```
40
41 Here's a simple example that matches a date in YYYY-MM-DD format and prints the
42 year, month and day:
43
44 ```rust
45 use regex::Regex;
46
47 fn main() {
48 let re = Regex::new(r"(?x)
49 (?P<year>\d{4}) # the year
50 -
51 (?P<month>\d{2}) # the month
52 -
53 (?P<day>\d{2}) # the day
54 ").unwrap();
55 let caps = re.captures("2010-03-14").unwrap();
56
57 assert_eq!("2010", &caps["year"]);
58 assert_eq!("03", &caps["month"]);
59 assert_eq!("14", &caps["day"]);
60 }
61 ```
62
63 If you have lots of dates in text that you'd like to iterate over, then it's
64 easy to adapt the above example with an iterator:
65
66 ```rust
67 use regex::Regex;
68
69 const TO_SEARCH: &'static str = "
70 On 2010-03-14, foo happened. On 2014-10-14, bar happened.
71 ";
72
73 fn main() {
74 let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
75
76 for caps in re.captures_iter(TO_SEARCH) {
77 // Note that all of the unwraps are actually OK for this regex
78 // because the only way for the regex to match is if all of the
79 // capture groups match. This is not true in general though!
80 println!("year: {}, month: {}, day: {}",
81 caps.get(1).unwrap().as_str(),
82 caps.get(2).unwrap().as_str(),
83 caps.get(3).unwrap().as_str());
84 }
85 }
86 ```
87
88 This example outputs:
89
90 ```
91 year: 2010, month: 03, day: 14
92 year: 2014, month: 10, day: 14
93 ```
94
95 ### Usage: Avoid compiling the same regex in a loop
96
97 It is an anti-pattern to compile the same regular expression in a loop since
98 compilation is typically expensive. (It takes anywhere from a few microseconds
99 to a few **milliseconds** depending on the size of the regex.) Not only is
100 compilation itself expensive, but this also prevents optimizations that reuse
101 allocations internally to the matching engines.
102
103 In Rust, it can sometimes be a pain to pass regular expressions around if
104 they're used from inside a helper function. Instead, we recommend using the
105 [`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that
106 regular expressions are compiled exactly once.
107
108 For example:
109
110 ```rust
111 use regex::Regex;
112
113 fn some_helper_function(text: &str) -> bool {
114 lazy_static! {
115 static ref RE: Regex = Regex::new("...").unwrap();
116 }
117 RE.is_match(text)
118 }
119 ```
120
121 Specifically, in this example, the regex will be compiled when it is used for
122 the first time. On subsequent uses, it will reuse the previous compilation.
123
124 ### Usage: match regular expressions on `&[u8]`
125
126 The main API of this crate (`regex::Regex`) requires the caller to pass a
127 `&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which
128 means the main API can't be used for searching arbitrary bytes.
129
130 To match on arbitrary bytes, use the `regex::bytes::Regex` API. The API
131 is identical to the main API, except that it takes an `&[u8]` to search
132 on instead of an `&str`. By default, `.` will match any *byte* using
133 `regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar
134 value* using the main API.
135
136 This example shows how to find all null-terminated strings in a slice of bytes:
137
138 ```rust
139 use regex::bytes::Regex;
140
141 let re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap();
142 let text = b"foo\x00bar\x00baz\x00";
143
144 // Extract all of the strings without the null terminator from each match.
145 // The unwrap is OK here since a match requires the `cstr` capture to match.
146 let cstrs: Vec<&[u8]> =
147 re.captures_iter(text)
148 .map(|c| c.name("cstr").unwrap().as_bytes())
149 .collect();
150 assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs);
151 ```
152
153 Notice here that the `[^\x00]+` will match any *byte* except for `NUL`. When
154 using the main API, `[^\x00]+` would instead match any valid UTF-8 sequence
155 except for `NUL`.
156
157 ### Usage: match multiple regular expressions simultaneously
158
159 This demonstrates how to use a `RegexSet` to match multiple (possibly
160 overlapping) regular expressions in a single scan of the search text:
161
162 ```rust
163 use regex::RegexSet;
164
165 let set = RegexSet::new(&[
166 r"\w+",
167 r"\d+",
168 r"\pL+",
169 r"foo",
170 r"bar",
171 r"barfoo",
172 r"foobar",
173 ]).unwrap();
174
175 // Iterate over and collect all of the matches.
176 let matches: Vec<_> = set.matches("foobar").into_iter().collect();
177 assert_eq!(matches, vec![0, 2, 3, 4, 6]);
178
179 // You can also test whether a particular regex matched:
180 let matches = set.matches("foobar");
181 assert!(!matches.matched(5));
182 assert!(matches.matched(6));
183 ```
184
185 ### Usage: enable SIMD optimizations
186
187 SIMD optimizations are enabled automatically on Rust stable 1.27 and newer.
188 For nightly versions of Rust, this requires a recent version with the SIMD
189 features stabilized.
190
191
192 ### Usage: a regular expression parser
193
194 This repository contains a crate that provides a well tested regular expression
195 parser, abstract syntax and a high-level intermediate representation for
196 convenient analysis. It provides no facilities for compilation or execution.
197 This may be useful if you're implementing your own regex engine or otherwise
198 need to do analysis on the syntax of a regular expression. It is otherwise not
199 recommended for general use.
200
201 [Documentation `regex-syntax`.](https://docs.rs/regex-syntax)
202
203
204 ### Minimum Rust version policy
205
206 This crate's minimum supported `rustc` version is `1.24.1`.
207
208 The current **tentative** policy is that the minimum Rust version required
209 to use this crate can be increased in minor version updates. For example, if
210 regex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will
211 also require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a
212 newer minimum version of Rust.
213
214 In general, this crate will be conservative with respect to the minimum
215 supported version of Rust.
216
217
218 ### License
219
220 This project is licensed under either of
221
222 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
223 http://www.apache.org/licenses/LICENSE-2.0)
224 * MIT license ([LICENSE-MIT](LICENSE-MIT) or
225 http://opensource.org/licenses/MIT)
226
227 at your option.
228
229 The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode
230 License Agreement
231 ([LICENSE-UNICODE](http://www.unicode.org/copyright.html#License)).