]> git.proxmox.com Git - rustc.git/blame - vendor/regex/README.md
New upstream version 1.33.0+dfsg1
[rustc.git] / vendor / regex / README.md
CommitLineData
8bb4bdeb
XL
1regex
2=====
3A Rust library for parsing, compiling, and executing regular expressions. Its
4syntax is similar to Perl-style regular expressions, but lacks a few features
5like look around and backreferences. In exchange, all searches execute in
6linear time with respect to the size of the regular expression and search text.
7Much of the syntax and implementation is inspired
8by [RE2](https://github.com/google/re2).
9
10[![Build Status](https://travis-ci.org/rust-lang/regex.svg?branch=master)](https://travis-ci.org/rust-lang/regex)
11[![Build status](https://ci.appveyor.com/api/projects/status/github/rust-lang/regex?svg=true)](https://ci.appveyor.com/project/rust-lang-libs/regex)
12[![Coverage Status](https://coveralls.io/repos/github/rust-lang/regex/badge.svg?branch=master)](https://coveralls.io/github/rust-lang/regex?branch=master)
13[![](http://meritbadge.herokuapp.com/regex)](https://crates.io/crates/regex)
94b46f34 14[![Rust](https://img.shields.io/badge/rust-1.20%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex)
8bb4bdeb
XL
15
16### Documentation
17
2c00a5a8 18[Module documentation with examples](https://docs.rs/regex).
0531ce1d
XL
19The module documentation also includes a comprehensive description of the
20syntax supported.
8bb4bdeb
XL
21
22Documentation with examples for the various matching functions and iterators
23can be found on the
2c00a5a8 24[`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html).
8bb4bdeb
XL
25
26### Usage
27
28Add this to your `Cargo.toml`:
29
30```toml
31[dependencies]
b7449926 32regex = "1"
8bb4bdeb
XL
33```
34
35and this to your crate root:
36
37```rust
38extern crate regex;
39```
40
41Here's a simple example that matches a date in YYYY-MM-DD format and prints the
42year, month and day:
43
44```rust
45extern crate regex;
46
47use regex::Regex;
48
49fn main() {
50 let re = Regex::new(r"(?x)
51(?P<year>\d{4}) # the year
52-
53(?P<month>\d{2}) # the month
54-
55(?P<day>\d{2}) # the day
56").unwrap();
57 let caps = re.captures("2010-03-14").unwrap();
58
041b39d2
XL
59 assert_eq!("2010", &caps["year"]);
60 assert_eq!("03", &caps["month"]);
61 assert_eq!("14", &caps["day"]);
8bb4bdeb
XL
62}
63```
64
65If you have lots of dates in text that you'd like to iterate over, then it's
66easy to adapt the above example with an iterator:
67
68```rust
69extern crate regex;
70
71use regex::Regex;
72
73const TO_SEARCH: &'static str = "
74On 2010-03-14, foo happened. On 2014-10-14, bar happened.
75";
76
77fn main() {
78 let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
79
80 for caps in re.captures_iter(TO_SEARCH) {
81 // Note that all of the unwraps are actually OK for this regex
82 // because the only way for the regex to match is if all of the
83 // capture groups match. This is not true in general though!
84 println!("year: {}, month: {}, day: {}",
85 caps.get(1).unwrap().as_str(),
86 caps.get(2).unwrap().as_str(),
87 caps.get(3).unwrap().as_str());
88 }
89}
90```
91
92This example outputs:
93
94```
95year: 2010, month: 03, day: 14
96year: 2014, month: 10, day: 14
97```
98
99### Usage: Avoid compiling the same regex in a loop
100
101It is an anti-pattern to compile the same regular expression in a loop since
102compilation is typically expensive. (It takes anywhere from a few microseconds
103to a few **milliseconds** depending on the size of the regex.) Not only is
104compilation itself expensive, but this also prevents optimizations that reuse
105allocations internally to the matching engines.
106
107In Rust, it can sometimes be a pain to pass regular expressions around if
108they're used from inside a helper function. Instead, we recommend using the
109[`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that
110regular expressions are compiled exactly once.
111
112For example:
113
114```rust
115#[macro_use] extern crate lazy_static;
116extern crate regex;
117
118use regex::Regex;
119
120fn some_helper_function(text: &str) -> bool {
121 lazy_static! {
122 static ref RE: Regex = Regex::new("...").unwrap();
123 }
124 RE.is_match(text)
125}
126```
127
128Specifically, in this example, the regex will be compiled when it is used for
129the first time. On subsequent uses, it will reuse the previous compilation.
130
131### Usage: match regular expressions on `&[u8]`
132
133The main API of this crate (`regex::Regex`) requires the caller to pass a
134`&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which
135means the main API can't be used for searching arbitrary bytes.
136
137To match on arbitrary bytes, use the `regex::bytes::Regex` API. The API
138is identical to the main API, except that it takes an `&[u8]` to search
139on instead of an `&str`. By default, `.` will match any *byte* using
140`regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar
141value* using the main API.
142
143This example shows how to find all null-terminated strings in a slice of bytes:
144
145```rust
146use regex::bytes::Regex;
147
148let re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap();
149let text = b"foo\x00bar\x00baz\x00";
150
151// Extract all of the strings without the null terminator from each match.
152// The unwrap is OK here since a match requires the `cstr` capture to match.
153let cstrs: Vec<&[u8]> =
154 re.captures_iter(text)
155 .map(|c| c.name("cstr").unwrap().as_bytes())
156 .collect();
157assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs);
158```
159
160Notice here that the `[^\x00]+` will match any *byte* except for `NUL`. When
161using the main API, `[^\x00]+` would instead match any valid UTF-8 sequence
162except for `NUL`.
163
164### Usage: match multiple regular expressions simultaneously
165
166This demonstrates how to use a `RegexSet` to match multiple (possibly
167overlapping) regular expressions in a single scan of the search text:
168
169```rust
170use regex::RegexSet;
171
172let set = RegexSet::new(&[
173 r"\w+",
174 r"\d+",
175 r"\pL+",
176 r"foo",
177 r"bar",
178 r"barfoo",
179 r"foobar",
180]).unwrap();
181
182// Iterate over and collect all of the matches.
183let matches: Vec<_> = set.matches("foobar").into_iter().collect();
184assert_eq!(matches, vec![0, 2, 3, 4, 6]);
185
186// You can also test whether a particular regex matched:
187let matches = set.matches("foobar");
188assert!(!matches.matched(5));
189assert!(matches.matched(6));
190```
191
0531ce1d
XL
192### Usage: enable SIMD optimizations
193
b7449926
XL
194SIMD optimizations are enabled automatically on Rust stable 1.27 and newer.
195For nightly versions of Rust, this requires a recent version with the SIMD
196features stabilized.
0531ce1d 197
8bb4bdeb
XL
198
199### Usage: a regular expression parser
200
201This repository contains a crate that provides a well tested regular expression
0531ce1d
XL
202parser, abstract syntax and a high-level intermediate representation for
203convenient analysis. It provides no facilities for compilation or execution.
204This may be useful if you're implementing your own regex engine or otherwise
205need to do analysis on the syntax of a regular expression. It is otherwise not
206recommended for general use.
8bb4bdeb 207
0531ce1d 208[Documentation `regex-syntax`.](https://docs.rs/regex-syntax)
8bb4bdeb 209
94b46f34
XL
210
211### Minimum Rust version policy
212
0731742a 213This crate's minimum supported `rustc` version is `1.24.1`.
94b46f34 214
b7449926
XL
215The current **tentative** policy is that the minimum Rust version required
216to use this crate can be increased in minor version updates. For example, if
217regex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will
218also require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a
219newer minimum version of Rust.
94b46f34
XL
220
221In general, this crate will be conservative with respect to the minimum
222supported version of Rust.
223
224
225### License
8bb4bdeb 226
ff7c6d11 227This project is licensed under either of
8bb4bdeb 228
ff7c6d11
XL
229 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
230 http://www.apache.org/licenses/LICENSE-2.0)
231 * MIT license ([LICENSE-MIT](LICENSE-MIT) or
232 http://opensource.org/licenses/MIT)
233
234at your option.
0731742a
XL
235
236The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode
237License Agreement
238([LICENSE-UNICODE](http://www.unicode.org/copyright.html#License)).