[rustc.git] / vendor / regex / README.md

regex
=====
A Rust library for parsing, compiling, and executing regular expressions. Its
syntax is similar to Perl-style regular expressions, but lacks a few features
like look around and backreferences. In exchange, all searches execute in
linear time with respect to the size of the regular expression and search text.
Much of the syntax and implementation is inspired
by [RE2](https://github.com/google/re2).

[![Build Status](https://travis-ci.org/rust-lang/regex.svg?branch=master)](https://travis-ci.org/rust-lang/regex)
[![Build status](https://ci.appveyor.com/api/projects/status/github/rust-lang/regex?svg=true)](https://ci.appveyor.com/project/rust-lang-libs/regex)
[![Coverage Status](https://coveralls.io/repos/github/rust-lang/regex/badge.svg?branch=master)](https://coveralls.io/github/rust-lang/regex?branch=master)
[![](http://meritbadge.herokuapp.com/regex)](https://crates.io/crates/regex)
[![Rust](https://img.shields.io/badge/rust-1.20%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex)

### Documentation

[Module documentation with examples](https://docs.rs/regex).
The module documentation also includes a comprehensive description of the
syntax supported.

Documentation with examples for the various matching functions and iterators
can be found on the
[`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html).

### Usage

Add this to your `Cargo.toml`:

```toml
[dependencies]
regex = "1"
```

and this to your crate root:

```rust
extern crate regex;
```

Here's a simple example that matches a date in YYYY-MM-DD format and prints the
year, month and day:

```rust
extern crate regex;

use regex::Regex;

fn main() {
    let re = Regex::new(r"(?x)
(?P<year>\d{4})  # the year
-
(?P<month>\d{2}) # the month
-
(?P<day>\d{2})   # the day
").unwrap();
    let caps = re.captures("2010-03-14").unwrap();

    assert_eq!("2010", &caps["year"]);
    assert_eq!("03", &caps["month"]);
    assert_eq!("14", &caps["day"]);
}
```

If you have lots of dates in text that you'd like to iterate over, then it's
easy to adapt the above example with an iterator:

```rust
extern crate regex;

use regex::Regex;

const TO_SEARCH: &'static str = "
On 2010-03-14, foo happened. On 2014-10-14, bar happened.
";

fn main() {
    let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();

    for caps in re.captures_iter(TO_SEARCH) {
        // Note that all of the unwraps are actually OK for this regex
        // because the only way for the regex to match is if all of the
        // capture groups match. This is not true in general though!
        println!("year: {}, month: {}, day: {}",
                 caps.get(1).unwrap().as_str(),
                 caps.get(2).unwrap().as_str(),
                 caps.get(3).unwrap().as_str());
    }
}
```

This example outputs:

```
year: 2010, month: 03, day: 14
year: 2014, month: 10, day: 14
```

### Usage: Avoid compiling the same regex in a loop

It is an anti-pattern to compile the same regular expression in a loop since
compilation is typically expensive. (It takes anywhere from a few microseconds
to a few **milliseconds** depending on the size of the regex.) Not only is
compilation itself expensive, but this also prevents optimizations that reuse
allocations internally to the matching engines.

In Rust, it can sometimes be a pain to pass regular expressions around if
they're used from inside a helper function. Instead, we recommend using the
[`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that
regular expressions are compiled exactly once.

For example:

```rust
#[macro_use] extern crate lazy_static;
extern crate regex;

use regex::Regex;

fn some_helper_function(text: &str) -> bool {
    lazy_static! {
        static ref RE: Regex = Regex::new("...").unwrap();
    }
    RE.is_match(text)
}
```

Specifically, in this example, the regex will be compiled when it is used for
the first time. On subsequent uses, it will reuse the previous compilation.

### Usage: match regular expressions on `&[u8]`

The main API of this crate (`regex::Regex`) requires the caller to pass a
`&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which
means the main API can't be used for searching arbitrary bytes.

To match on arbitrary bytes, use the `regex::bytes::Regex` API. The API
is identical to the main API, except that it takes an `&[u8]` to search
on instead of an `&str`. By default, `.` will match any *byte* using
`regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar
value* using the main API.

This example shows how to find all null-terminated strings in a slice of bytes:

```rust
use regex::bytes::Regex;

let re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap();
let text = b"foo\x00bar\x00baz\x00";

// Extract all of the strings without the null terminator from each match.
// The unwrap is OK here since a match requires the `cstr` capture to match.
let cstrs: Vec<&[u8]> =
    re.captures_iter(text)
      .map(|c| c.name("cstr").unwrap().as_bytes())
      .collect();
assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs);
```

Notice here that the `[^\x00]+` will match any *byte* except for `NUL`. When
using the main API, `[^\x00]+` would instead match any valid UTF-8 sequence
except for `NUL`.

### Usage: match multiple regular expressions simultaneously

This demonstrates how to use a `RegexSet` to match multiple (possibly
overlapping) regular expressions in a single scan of the search text:

```rust
use regex::RegexSet;

let set = RegexSet::new(&[
    r"\w+",
    r"\d+",
    r"\pL+",
    r"foo",
    r"bar",
    r"barfoo",
    r"foobar",
]).unwrap();

// Iterate over and collect all of the matches.
let matches: Vec<_> = set.matches("foobar").into_iter().collect();
assert_eq!(matches, vec![0, 2, 3, 4, 6]);

// You can also test whether a particular regex matched:
let matches = set.matches("foobar");
assert!(!matches.matched(5));
assert!(matches.matched(6));
```

### Usage: enable SIMD optimizations

SIMD optimizations are enabled automatically on Rust stable 1.27 and newer.
For nightly versions of Rust, this requires a recent version with the SIMD
features stabilized.


### Usage: a regular expression parser

This repository contains a crate that provides a well tested regular expression
parser, abstract syntax and a high-level intermediate representation for
convenient analysis. It provides no facilities for compilation or execution.
This may be useful if you're implementing your own regex engine or otherwise
need to do analysis on the syntax of a regular expression. It is otherwise not
recommended for general use.

[Documentation `regex-syntax`.](https://docs.rs/regex-syntax)


### Minimum Rust version policy

This crate's minimum supported `rustc` version is `1.24.1`.

The current **tentative** policy is that the minimum Rust version required
to use this crate can be increased in minor version updates. For example, if
regex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will
also require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a
newer minimum version of Rust.

In general, this crate will be conservative with respect to the minimum
supported version of Rust.


### License

This project is licensed under either of

 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
   http://www.apache.org/licenses/LICENSE-2.0)
 * MIT license ([LICENSE-MIT](LICENSE-MIT) or
   http://opensource.org/licenses/MIT)

at your option.

The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode
License Agreement
([LICENSE-UNICODE](http://www.unicode.org/copyright.html#License)).
Commit	Line	Data
8bb4bdeb XL	1	regex
	2	=====
	3	A Rust library for parsing, compiling, and executing regular expressions. Its
	4	syntax is similar to Perl-style regular expressions, but lacks a few features
	5	like look around and backreferences. In exchange, all searches execute in
	6	linear time with respect to the size of the regular expression and search text.
	7	Much of the syntax and implementation is inspired
	8	by [RE2](https://github.com/google/re2).
	9
	10	[![Build Status](https://travis-ci.org/rust-lang/regex.svg?branch=master)](https://travis-ci.org/rust-lang/regex)
	11	[![Build status](https://ci.appveyor.com/api/projects/status/github/rust-lang/regex?svg=true)](https://ci.appveyor.com/project/rust-lang-libs/regex)
	12	[![Coverage Status](https://coveralls.io/repos/github/rust-lang/regex/badge.svg?branch=master)](https://coveralls.io/github/rust-lang/regex?branch=master)
	13	[![](http://meritbadge.herokuapp.com/regex)](https://crates.io/crates/regex)
94b46f34	14	[![Rust](https://img.shields.io/badge/rust-1.20%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex)
8bb4bdeb XL	15
	16	### Documentation
	17
2c00a5a8	18	[Module documentation with examples](https://docs.rs/regex).
0531ce1d XL	19	The module documentation also includes a comprehensive description of the
0531ce1d XL	20	syntax supported.
8bb4bdeb XL	21
	22	Documentation with examples for the various matching functions and iterators
	23	can be found on the
2c00a5a8	24	[`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html).
8bb4bdeb XL	25
	26	### Usage
	27
	28	Add this to your `Cargo.toml`:
	29
	30	```toml
	31	[dependencies]
b7449926	32	regex = "1"
8bb4bdeb XL	33	```
	34
	35	and this to your crate root:
	36
	37	```rust
	38	extern crate regex;
	39	```
	40
	41	Here's a simple example that matches a date in YYYY-MM-DD format and prints the
	42	year, month and day:
	43
	44	```rust
	45	extern crate regex;
	46
	47	use regex::Regex;
	48
	49	fn main() {
	50	let re = Regex::new(r"(?x)
	51	(?P<year>\d{4}) # the year
	52	-
	53	(?P<month>\d{2}) # the month
	54	-
	55	(?P<day>\d{2}) # the day
	56	").unwrap();
	57	let caps = re.captures("2010-03-14").unwrap();
	58
041b39d2 XL	59	assert_eq!("2010", &caps["year"]);
	60	assert_eq!("03", &caps["month"]);
	61	assert_eq!("14", &caps["day"]);
8bb4bdeb XL	62	}
	63	```
	64
	65	If you have lots of dates in text that you'd like to iterate over, then it's
	66	easy to adapt the above example with an iterator:
	67
	68	```rust
	69	extern crate regex;
	70
	71	use regex::Regex;
	72
	73	const TO_SEARCH: &'static str = "
	74	On 2010-03-14, foo happened. On 2014-10-14, bar happened.
	75	";
	76
	77	fn main() {
	78	let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
	79
	80	for caps in re.captures_iter(TO_SEARCH) {
	81	// Note that all of the unwraps are actually OK for this regex
	82	// because the only way for the regex to match is if all of the
	83	// capture groups match. This is not true in general though!
	84	println!("year: {}, month: {}, day: {}",
	85	caps.get(1).unwrap().as_str(),
	86	caps.get(2).unwrap().as_str(),
	87	caps.get(3).unwrap().as_str());
	88	}
	89	}
	90	```
	91
	92	This example outputs:
	93
	94	```
	95	year: 2010, month: 03, day: 14
	96	year: 2014, month: 10, day: 14
	97	```
	98
	99	### Usage: Avoid compiling the same regex in a loop
	100
	101	It is an anti-pattern to compile the same regular expression in a loop since
	102	compilation is typically expensive. (It takes anywhere from a few microseconds
	103	to a few milliseconds depending on the size of the regex.) Not only is
	104	compilation itself expensive, but this also prevents optimizations that reuse
	105	allocations internally to the matching engines.
	106
	107	In Rust, it can sometimes be a pain to pass regular expressions around if
	108	they're used from inside a helper function. Instead, we recommend using the
	109	[`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that
	110	regular expressions are compiled exactly once.
	111
	112	For example:
	113
	114	```rust
	115	#[macro_use] extern crate lazy_static;
	116	extern crate regex;
	117
	118	use regex::Regex;
	119
	120	fn some_helper_function(text: &str) -> bool {
	121	lazy_static! {
	122	static ref RE: Regex = Regex::new("...").unwrap();
	123	}
	124	RE.is_match(text)
	125	}
126	```
127
128	Specifically, in this example, the regex will be compiled when it is used for
129	the first time. On subsequent uses, it will reuse the previous compilation.
130
131	### Usage: match regular expressions on `&[u8]`
132
133	The main API of this crate (`regex::Regex`) requires the caller to pass a
134	`&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which
135	means the main API can't be used for searching arbitrary bytes.
136
137	To match on arbitrary bytes, use the `regex::bytes::Regex` API. The API
138	is identical to the main API, except that it takes an `&[u8]` to search
139	on instead of an `&str`. By default, `.` will match any byte using
140	`regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar
141	value* using the main API.
142
143	This example shows how to find all null-terminated strings in a slice of bytes:
144
145	```rust
146	use regex::bytes::Regex;
147
148	let re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap();
149	let text = b"foo\x00bar\x00baz\x00";
150
151	// Extract all of the strings without the null terminator from each match.
152	// The unwrap is OK here since a match requires the `cstr` capture to match.
153	let cstrs: Vec<&[u8]> =
154	re.captures_iter(text)
155	.map(\|c\| c.name("cstr").unwrap().as_bytes())
156	.collect();
157	assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs);
158	```
159
160	Notice here that the `[^\x00]+` will match any byte except for `NUL`. When
161	using the main API, `[^\x00]+` would instead match any valid UTF-8 sequence
162	except for `NUL`.
163
164	### Usage: match multiple regular expressions simultaneously
165
166	This demonstrates how to use a `RegexSet` to match multiple (possibly
167	overlapping) regular expressions in a single scan of the search text:
168
169	```rust
170	use regex::RegexSet;
171
172	let set = RegexSet::new(&[
173	r"\w+",
174	r"\d+",
175	r"\pL+",
176	r"foo",
177	r"bar",
178	r"barfoo",
179	r"foobar",
180	]).unwrap();
181
182	// Iterate over and collect all of the matches.
183	let matches: Vec<_> = set.matches("foobar").into_iter().collect();
184	assert_eq!(matches, vec![0, 2, 3, 4, 6]);
185
186	// You can also test whether a particular regex matched:
187	let matches = set.matches("foobar");
188	assert!(!matches.matched(5));
189	assert!(matches.matched(6));
190	```
191
0531ce1d XL	192	### Usage: enable SIMD optimizations
0531ce1d XL	193
b7449926 XL	194	SIMD optimizations are enabled automatically on Rust stable 1.27 and newer.
	195	For nightly versions of Rust, this requires a recent version with the SIMD
	196	features stabilized.
0531ce1d	197
8bb4bdeb XL	198
	199	### Usage: a regular expression parser
	200
	201	This repository contains a crate that provides a well tested regular expression
0531ce1d XL	202	parser, abstract syntax and a high-level intermediate representation for
	203	convenient analysis. It provides no facilities for compilation or execution.
	204	This may be useful if you're implementing your own regex engine or otherwise
	205	need to do analysis on the syntax of a regular expression. It is otherwise not
	206	recommended for general use.
8bb4bdeb	207
0531ce1d	208	[Documentation `regex-syntax`.](https://docs.rs/regex-syntax)
8bb4bdeb	209
94b46f34 XL	210
	211	### Minimum Rust version policy
	212
0731742a	213	This crate's minimum supported `rustc` version is `1.24.1`.
94b46f34	214
b7449926 XL	215	The current tentative policy is that the minimum Rust version required
	216	to use this crate can be increased in minor version updates. For example, if
	217	regex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will
	218	also require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a
	219	newer minimum version of Rust.
94b46f34 XL	220
	221	In general, this crate will be conservative with respect to the minimum
	222	supported version of Rust.
	223
	224
	225	### License
8bb4bdeb	226
ff7c6d11	227	This project is licensed under either of
8bb4bdeb	228
ff7c6d11 XL	229	* Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
	230	http://www.apache.org/licenses/LICENSE-2.0)
	231	* MIT license ([LICENSE-MIT](LICENSE-MIT) or
	232	http://opensource.org/licenses/MIT)
	233
	234	at your option.
0731742a XL	235
	236	The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode
	237	License Agreement
	238	([LICENSE-UNICODE](http://www.unicode.org/copyright.html#License)).