[rustc.git] / vendor / nom / doc / nom_recipes.md

# Nom Recipes

These are short recipes for accomplishing common tasks with nom.

* [Whitespace](#whitespace)
  + [Wrapper combinators that eat whitespace before and after a parser](#wrapper-combinators-that-eat-whitespace-before-and-after-a-parser)
* [Comments](#comments)
  + [`// C++/EOL-style comments`](#-ceol-style-comments)
  + [`/* C-style comments */`](#-c-style-comments-)
* [Identifiers](#identifiers)
  + [`Rust-Style Identifiers`](#rust-style-identifiers)
* [Literal Values](#literal-values)
  + [Escaped Strings](#escaped-strings)
  + [Integers](#integers)
    - [Hexadecimal](#hexadecimal)
    - [Octal](#octal)
    - [Binary](#binary)
    - [Decimal](#decimal)
  + [Floating Point Numbers](#floating-point-numbers)

## Whitespace


### Wrapper combinators that eat whitespace before and after a parser

```rust
use nom::{
  IResult,
  error::ParseError,
  combinator::value,
  sequence::delimited,
  character::complete::multispace0,
};

/// A combinator that takes a parser `inner` and produces a parser that also consumes both leading and 
/// trailing whitespace, returning the output of `inner`.
fn ws<'a, F: 'a, O, E: ParseError<&'a str>>(inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O, E>
  where
  F: Fn(&'a str) -> IResult<&'a str, O, E>,
{
  delimited(
    multispace0,
    inner,
    multispace0
  )
}
```

To eat only trailing whitespace, replace `delimited(...)` with `terminated(&inner, multispace0)`.
Likewise, the eat only leading whitespace, replace `delimited(...)` with `preceded(multispace0,
&inner)`. You can use your own parser instead of `multispace0` if you want to skip a different set
of lexemes.

## Comments

### `// C++/EOL-style comments`

This version uses `%` to start a comment, does not consume the newline character, and returns an
output of `()`.

```rust
use nom::{
  IResult,
  error::ParseError,
  combinator::value,
  sequence::pair,
  bytes::complete::is_not,
  character::complete::char,
};

pub fn peol_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E>
{
  value(
    (), // Output is thrown away.
    pair(char('%'), is_not("\n\r"))
  )(i)
}
```

### `/* C-style comments */`

Inline comments surrounded with sentinel tags `(*` and `*)`. This version returns an output of `()`
and does not handle nested comments.

```rust
use nom::{
  IResult,
  error::ParseError,
  combinator::value,
  sequence::tuple,
  bytes::complete::{tag, take_until},
};

pub fn pinline_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E> {
  value(
    (), // Output is thrown away.
    tuple((
      tag("(*"),
      take_until("*)"),
      tag("*)")
    ))
  )(i)
}
```

## Identifiers

### `Rust-Style Identifiers`

Parsing identifiers that may start with a letter (or underscore) and may contain underscores,
letters and numbers may be parsed like this:

```rust
use nom::{
  IResult,
  branch::alt,
  multi::many0_count,
  combinator::recognize,
  sequence::pair,
  character::complete::{alpha1, alphanumeric1},
  bytes::complete::tag,
};

pub fn identifier(input: &str) -> IResult<&str, &str> {
  recognize(
    pair(
      alt((alpha1, tag("_"))),
      many0_count(alt((alphanumeric1, tag("_"))))
    )
  )(input)
}
```

Let's say we apply this to the identifier `hello_world123abc`. The first `alt` parser would
recognize `h`. The `pair` combinator ensures that `ello_world123abc` will be piped to the next
`alphanumeric0` parser, which recognizes every remaining character. However, the `pair` combinator
returns a tuple of the results of its sub-parsers. The `recognize` parser produces a `&str` of the
input text that was parsed, which in this case is the entire `&str` `hello_world123abc`.

## Literal Values

### Escaped Strings

This is [one of the examples](https://github.com/Geal/nom/blob/main/examples/string.rs) in the
examples directory.

### Integers

The following recipes all return string slices rather than integer values. How to obtain an
integer value instead is demonstrated for hexadecimal integers. The others are similar.

The parsers allow the grouping character `_`, which allows one to group the digits by byte, for
example: `0xA4_3F_11_28`. If you prefer to exclude the `_` character, the lambda to convert from a
string slice to an integer value is slightly simpler. You can also strip the `_` from the string
slice that is returned, which is demonstrated in the second hexdecimal number parser.

If you wish to limit the number of digits in a valid integer literal, replace `many1` with
`many_m_n` in the recipes.

#### Hexadecimal

The parser outputs the string slice of the digits without the leading `0x`/`0X`.

```rust
use nom::{
  IResult,
  branch::alt,
  multi::{many0, many1},
  combinator::recognize,
  sequence::{preceded, terminated},
  character::complete::{char, one_of},
  bytes::complete::tag,
};

fn hexadecimal(input: &str) -> IResult<&str, &str> { // <'a, E: ParseError<&'a str>>
  preceded(
    alt((tag("0x"), tag("0X"))),
    recognize(
      many1(
        terminated(one_of("0123456789abcdefABCDEF"), many0(char('_')))
      )
    )
  )(input)
}
```

If you want it to return the integer value instead, use map:

```rust
use nom::{
  IResult,
  branch::alt,
  multi::{many0, many1},
  combinator::{map_res, recognize},
  sequence::{preceded, terminated},
  character::complete::{char, one_of},
  bytes::complete::tag,
};

fn hexadecimal_value(input: &str) -> IResult<&str, i64> {
  map_res(
    preceded(
      alt((tag("0x"), tag("0X"))),
      recognize(
        many1(
          terminated(one_of("0123456789abcdefABCDEF"), many0(char('_')))
        )
      )
    ),
    |out: &str| i64::from_str_radix(&str::replace(&out, "_", ""), 16)
  )(input)
}
```

#### Octal

```rust
use nom::{
  IResult,
  branch::alt,
  multi::{many0, many1},
  combinator::recognize,
  sequence::{preceded, terminated},
  character::complete::{char, one_of},
  bytes::complete::tag,
};

fn octal(input: &str) -> IResult<&str, &str> {
  preceded(
    alt((tag("0o"), tag("0O"))),
    recognize(
      many1(
        terminated(one_of("01234567"), many0(char('_')))
      )
    )
  )(input)
}
```

#### Binary

```rust
use nom::{
  IResult,
  branch::alt,
  multi::{many0, many1},
  combinator::recognize,
  sequence::{preceded, terminated},
  character::complete::{char, one_of},
  bytes::complete::tag,
};

fn binary(input: &str) -> IResult<&str, &str> {
  preceded(
    alt((tag("0b"), tag("0B"))),
    recognize(
      many1(
        terminated(one_of("01"), many0(char('_')))
      )
    )
  )(input)
}
```

#### Decimal

```rust
use nom::{
  IResult,
  multi::{many0, many1},
  combinator::recognize,
  sequence::terminated,
  character::complete::{char, one_of},
};

fn decimal(input: &str) -> IResult<&str, &str> {
  recognize(
    many1(
      terminated(one_of("0123456789"), many0(char('_')))
    )
  )(input)
}
```

### Floating Point Numbers

The following is adapted from [the Python parser by Valentin Lorentz (ProgVal)](https://github.com/ProgVal/rust-python-parser/blob/master/src/numbers.rs).

```rust
use nom::{
  IResult,
  branch::alt,
  multi::{many0, many1},
  combinator::{opt, recognize},
  sequence::{preceded, terminated, tuple},
  character::complete::{char, one_of},
};

fn float(input: &str) -> IResult<&str, &str> {
  alt((
    // Case one: .42
    recognize(
      tuple((
        char('.'),
        decimal,
        opt(tuple((
          one_of("eE"),
          opt(one_of("+-")),
          decimal
        )))
      ))
    )
    , // Case two: 42e42 and 42.42e42
    recognize(
      tuple((
        decimal,
        opt(preceded(
          char('.'),
          decimal,
        )),
        one_of("eE"),
        opt(one_of("+-")),
        decimal
      ))
    )
    , // Case three: 42. and 42.42
    recognize(
      tuple((
        decimal,
        char('.'),
        opt(decimal)
      ))
    )
  ))(input)
}

fn decimal(input: &str) -> IResult<&str, &str> {
  recognize(
    many1(
      terminated(one_of("0123456789"), many0(char('_')))
    )
  )(input)
}
```

# implementing FromStr

The [FromStr trait](https://doc.rust-lang.org/std/str/trait.FromStr.html) provides
a common interface to parse from a string.

```rust
use nom::{
  IResult, Finish, error::Error,
  bytes::complete::{tag, take_while},
};
use std::str::FromStr;

// will recognize the name in "Hello, name!"
fn parse_name(input: &str) -> IResult<&str, &str> {
  let (i, _) = tag("Hello, ")(input)?;
  let (i, name) = take_while(|c:char| c.is_alphabetic())(i)?;
  let (i, _) = tag("!")(i)?;

  Ok((i, name))
}

// with FromStr, the result cannot be a reference to the input, it must be owned
#[derive(Debug)]
pub struct Name(pub String);

impl FromStr for Name {
  // the error must be owned as well
  type Err = Error<String>;

  fn from_str(s: &str) -> Result<Self, Self::Err> {
      match parse_name(s).finish() {
          Ok((_remaining, name)) => Ok(Name(name.to_string())),
          Err(Error { input, code }) => Err(Error {
              input: input.to_string(),
              code,
          })
      }
  }
}

fn main() {
  // parsed: Ok(Name("nom"))
  println!("parsed: {:?}", "Hello, nom!".parse::<Name>());

  // parsed: Err(Error { input: "123!", code: Tag })
  println!("parsed: {:?}", "Hello, 123!".parse::<Name>());
}
```
Commit	Line	Data
5099ac24 FG	1	# Nom Recipes
	2
	3	These are short recipes for accomplishing common tasks with nom.
	4
	5	* [Whitespace](#whitespace)
	6	+ [Wrapper combinators that eat whitespace before and after a parser](#wrapper-combinators-that-eat-whitespace-before-and-after-a-parser)
	7	* [Comments](#comments)
	8	+ [`// C++/EOL-style comments`](#-ceol-style-comments)
	9	+ [`/* C-style comments */`](#-c-style-comments-)
	10	* [Identifiers](#identifiers)
	11	+ [`Rust-Style Identifiers`](#rust-style-identifiers)
	12	* [Literal Values](#literal-values)
	13	+ [Escaped Strings](#escaped-strings)
	14	+ [Integers](#integers)
	15	- [Hexadecimal](#hexadecimal)
	16	- [Octal](#octal)
	17	- [Binary](#binary)
	18	- [Decimal](#decimal)
	19	+ [Floating Point Numbers](#floating-point-numbers)
	20
	21	## Whitespace
	22
	23
	24
	25	### Wrapper combinators that eat whitespace before and after a parser
	26
	27	```rust
	28	use nom::{
	29	IResult,
	30	error::ParseError,
	31	combinator::value,
	32	sequence::delimited,
	33	character::complete::multispace0,
	34	};
	35
	36	/// A combinator that takes a parser `inner` and produces a parser that also consumes both leading and
	37	/// trailing whitespace, returning the output of `inner`.
	38	fn ws<'a, F: 'a, O, E: ParseError<&'a str>>(inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O, E>
	39	where
	40	F: Fn(&'a str) -> IResult<&'a str, O, E>,
	41	{
	42	delimited(
	43	multispace0,
	44	inner,
	45	multispace0
	46	)
	47	}
	48	```
	49
	50	To eat only trailing whitespace, replace `delimited(...)` with `terminated(&inner, multispace0)`.
	51	Likewise, the eat only leading whitespace, replace `delimited(...)` with `preceded(multispace0,
	52	&inner)`. You can use your own parser instead of `multispace0` if you want to skip a different set
	53	of lexemes.
	54
	55	## Comments
	56
	57	### `// C++/EOL-style comments`
	58
	59	This version uses `%` to start a comment, does not consume the newline character, and returns an
	60	output of `()`.
	61
	62	```rust
	63	use nom::{
	64	IResult,
65	error::ParseError,
66	combinator::value,
67	sequence::pair,
68	bytes::complete::is_not,
69	character::complete::char,
70	};
71
72	pub fn peol_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E>
73	{
74	value(
75	(), // Output is thrown away.
76	pair(char('%'), is_not("\n\r"))
77	)(i)
78	}
79	```
80
81	### `/* C-style comments */`
82
83	Inline comments surrounded with sentinel tags `(` and `)`. This version returns an output of `()`
84	and does not handle nested comments.
85
86	```rust
87	use nom::{
88	IResult,
89	error::ParseError,
90	combinator::value,
91	sequence::tuple,
92	bytes::complete::{tag, take_until},
93	};
94
95	pub fn pinline_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E> {
96	value(
97	(), // Output is thrown away.
98	tuple((
99	tag("(*"),
100	take_until("*)"),
101	tag("*)")
102	))
103	)(i)
104	}
105	```
106
107	## Identifiers
108
109	### `Rust-Style Identifiers`
110
111	Parsing identifiers that may start with a letter (or underscore) and may contain underscores,
112	letters and numbers may be parsed like this:
113
114	```rust
115	use nom::{
116	IResult,
117	branch::alt,
49aad941	118	multi::many0_count,
5099ac24 FG	119	combinator::recognize,
	120	sequence::pair,
	121	character::complete::{alpha1, alphanumeric1},
	122	bytes::complete::tag,
	123	};
	124
	125	pub fn identifier(input: &str) -> IResult<&str, &str> {
	126	recognize(
	127	pair(
	128	alt((alpha1, tag("_"))),
49aad941	129	many0_count(alt((alphanumeric1, tag("_"))))
5099ac24 FG	130	)
	131	)(input)
	132	}
	133	```
	134
	135	Let's say we apply this to the identifier `hello_world123abc`. The first `alt` parser would
	136	recognize `h`. The `pair` combinator ensures that `ello_world123abc` will be piped to the next
	137	`alphanumeric0` parser, which recognizes every remaining character. However, the `pair` combinator
	138	returns a tuple of the results of its sub-parsers. The `recognize` parser produces a `&str` of the
	139	input text that was parsed, which in this case is the entire `&str` `hello_world123abc`.
	140
	141	## Literal Values
	142
	143	### Escaped Strings
	144
49aad941	145	This is [one of the examples](https://github.com/Geal/nom/blob/main/examples/string.rs) in the
5099ac24 FG	146	examples directory.
	147
	148	### Integers
	149
	150	The following recipes all return string slices rather than integer values. How to obtain an
	151	integer value instead is demonstrated for hexadecimal integers. The others are similar.
	152
	153	The parsers allow the grouping character `_`, which allows one to group the digits by byte, for
	154	example: `0xA4_3F_11_28`. If you prefer to exclude the `_` character, the lambda to convert from a
	155	string slice to an integer value is slightly simpler. You can also strip the `_` from the string
	156	slice that is returned, which is demonstrated in the second hexdecimal number parser.
	157
	158	If you wish to limit the number of digits in a valid integer literal, replace `many1` with
	159	`many_m_n` in the recipes.
	160
	161	#### Hexadecimal
	162
	163	The parser outputs the string slice of the digits without the leading `0x`/`0X`.
	164
	165	```rust
	166	use nom::{
	167	IResult,
	168	branch::alt,
	169	multi::{many0, many1},
	170	combinator::recognize,
	171	sequence::{preceded, terminated},
	172	character::complete::{char, one_of},
	173	bytes::complete::tag,
	174	};
	175
	176	fn hexadecimal(input: &str) -> IResult<&str, &str> { // <'a, E: ParseError<&'a str>>
	177	preceded(
	178	alt((tag("0x"), tag("0X"))),
	179	recognize(
	180	many1(
	181	terminated(one_of("0123456789abcdefABCDEF"), many0(char('_')))
	182	)
	183	)
	184	)(input)
	185	}
	186	```
	187
	188	If you want it to return the integer value instead, use map:
	189
	190	```rust
	191	use nom::{
	192	IResult,
	193	branch::alt,
	194	multi::{many0, many1},
	195	combinator::{map_res, recognize},
	196	sequence::{preceded, terminated},
	197	character::complete::{char, one_of},
	198	bytes::complete::tag,
	199	};
	200
	201	fn hexadecimal_value(input: &str) -> IResult<&str, i64> {
	202	map_res(
	203	preceded(
	204	alt((tag("0x"), tag("0X"))),
	205	recognize(
	206	many1(
	207	terminated(one_of("0123456789abcdefABCDEF"), many0(char('_')))
	208	)
	209	)
210	),
211	\|out: &str\| i64::from_str_radix(&str::replace(&out, "_", ""), 16)
212	)(input)
213	}
214	```
215
216	#### Octal
217
218	```rust
219	use nom::{
220	IResult,
221	branch::alt,
222	multi::{many0, many1},
223	combinator::recognize,
224	sequence::{preceded, terminated},
225	character::complete::{char, one_of},
226	bytes::complete::tag,
227	};
228
229	fn octal(input: &str) -> IResult<&str, &str> {
230	preceded(
231	alt((tag("0o"), tag("0O"))),
232	recognize(
233	many1(
234	terminated(one_of("01234567"), many0(char('_')))
235	)
236	)
237	)(input)
238	}
239	```
240
241	#### Binary
242
243	```rust
244	use nom::{
245	IResult,
246	branch::alt,
247	multi::{many0, many1},
248	combinator::recognize,
249	sequence::{preceded, terminated},
250	character::complete::{char, one_of},
251	bytes::complete::tag,
252	};
253
254	fn binary(input: &str) -> IResult<&str, &str> {
255	preceded(
256	alt((tag("0b"), tag("0B"))),
257	recognize(
258	many1(
259	terminated(one_of("01"), many0(char('_')))
260	)
261	)
262	)(input)
263	}
264	```
265
266	#### Decimal
267
268	```rust
269	use nom::{
270	IResult,
271	multi::{many0, many1},
272	combinator::recognize,
273	sequence::terminated,
274	character::complete::{char, one_of},
275	};
276
277	fn decimal(input: &str) -> IResult<&str, &str> {
278	recognize(
279	many1(
280	terminated(one_of("0123456789"), many0(char('_')))
281	)
282	)(input)
283	}
284	```
285
286	### Floating Point Numbers
287
288	The following is adapted from [the Python parser by Valentin Lorentz (ProgVal)](https://github.com/ProgVal/rust-python-parser/blob/master/src/numbers.rs).
289
290	```rust
291	use nom::{
292	IResult,
293	branch::alt,
294	multi::{many0, many1},
295	combinator::{opt, recognize},
296	sequence::{preceded, terminated, tuple},
297	character::complete::{char, one_of},
298	};
299
300	fn float(input: &str) -> IResult<&str, &str> {
301	alt((
302	// Case one: .42
303	recognize(
304	tuple((
305	char('.'),
306	decimal,
307	opt(tuple((
308	one_of("eE"),
309	opt(one_of("+-")),
310	decimal
311	)))
312	))
313	)
314	, // Case two: 42e42 and 42.42e42
315	recognize(
316	tuple((
317	decimal,
318	opt(preceded(
319	char('.'),
320	decimal,
321	)),
322	one_of("eE"),
323	opt(one_of("+-")),
324	decimal
325	))
326	)
327	, // Case three: 42. and 42.42
328	recognize(
329	tuple((
330	decimal,
331	char('.'),
332	opt(decimal)
333	))
334	)
335	))(input)
336	}
337
338	fn decimal(input: &str) -> IResult<&str, &str> {
339	recognize(
340	many1(
341	terminated(one_of("0123456789"), many0(char('_')))
342	)
343	)(input)
344	}
345	```
346
347	# implementing FromStr
348
349	The [FromStr trait](https://doc.rust-lang.org/std/str/trait.FromStr.html) provides
350	a common interface to parse from a string.
351
352	```rust
353	use nom::{
354	IResult, Finish, error::Error,
355	bytes::complete::{tag, take_while},
356	};
357	use std::str::FromStr;
358
359	// will recognize the name in "Hello, name!"
360	fn parse_name(input: &str) -> IResult<&str, &str> {
361	let (i, _) = tag("Hello, ")(input)?;
362	let (i, name) = take_while(\|c:char\| c.is_alphabetic())(i)?;
363	let (i, _) = tag("!")(i)?;
364
365	Ok((i, name))
366	}
367
368	// with FromStr, the result cannot be a reference to the input, it must be owned
369	#[derive(Debug)]
370	pub struct Name(pub String);
371
372	impl FromStr for Name {
373	// the error must be owned as well
374	type Err = Error<String>;
375
376	fn from_str(s: &str) -> Result<Self, Self::Err> {
377	match parse_name(s).finish() {
378	Ok((_remaining, name)) => Ok(Name(name.to_string())),
379	Err(Error { input, code }) => Err(Error {
380	input: input.to_string(),
381	code,
382	})
383	}
384	}
385	}
386
387	fn main() {
388	// parsed: Ok(Name("nom"))
389	println!("parsed: {:?}", "Hello, nom!".parse::<Name>());
390
391	// parsed: Err(Error { input: "123!", code: Tag })
392	println!("parsed: {:?}", "Hello, 123!".parse::<Name>());
393	}
394	```
395