]> git.proxmox.com Git - rustc.git/blame - vendor/nom/doc/nom_recipes.md
New upstream version 1.71.1+dfsg1
[rustc.git] / vendor / nom / doc / nom_recipes.md
CommitLineData
5099ac24
FG
1# Nom Recipes
2
3These are short recipes for accomplishing common tasks with nom.
4
5* [Whitespace](#whitespace)
6 + [Wrapper combinators that eat whitespace before and after a parser](#wrapper-combinators-that-eat-whitespace-before-and-after-a-parser)
7* [Comments](#comments)
8 + [`// C++/EOL-style comments`](#-ceol-style-comments)
9 + [`/* C-style comments */`](#-c-style-comments-)
10* [Identifiers](#identifiers)
11 + [`Rust-Style Identifiers`](#rust-style-identifiers)
12* [Literal Values](#literal-values)
13 + [Escaped Strings](#escaped-strings)
14 + [Integers](#integers)
15 - [Hexadecimal](#hexadecimal)
16 - [Octal](#octal)
17 - [Binary](#binary)
18 - [Decimal](#decimal)
19 + [Floating Point Numbers](#floating-point-numbers)
20
21## Whitespace
22
23
24
25### Wrapper combinators that eat whitespace before and after a parser
26
27```rust
28use nom::{
29 IResult,
30 error::ParseError,
31 combinator::value,
32 sequence::delimited,
33 character::complete::multispace0,
34};
35
36/// A combinator that takes a parser `inner` and produces a parser that also consumes both leading and
37/// trailing whitespace, returning the output of `inner`.
38fn ws<'a, F: 'a, O, E: ParseError<&'a str>>(inner: F) -> impl FnMut(&'a str) -> IResult<&'a str, O, E>
39 where
40 F: Fn(&'a str) -> IResult<&'a str, O, E>,
41{
42 delimited(
43 multispace0,
44 inner,
45 multispace0
46 )
47}
48```
49
50To eat only trailing whitespace, replace `delimited(...)` with `terminated(&inner, multispace0)`.
51Likewise, the eat only leading whitespace, replace `delimited(...)` with `preceded(multispace0,
52&inner)`. You can use your own parser instead of `multispace0` if you want to skip a different set
53of lexemes.
54
55## Comments
56
57### `// C++/EOL-style comments`
58
59This version uses `%` to start a comment, does not consume the newline character, and returns an
60output of `()`.
61
62```rust
63use nom::{
64 IResult,
65 error::ParseError,
66 combinator::value,
67 sequence::pair,
68 bytes::complete::is_not,
69 character::complete::char,
70};
71
72pub fn peol_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E>
73{
74 value(
75 (), // Output is thrown away.
76 pair(char('%'), is_not("\n\r"))
77 )(i)
78}
79```
80
81### `/* C-style comments */`
82
83Inline comments surrounded with sentinel tags `(*` and `*)`. This version returns an output of `()`
84and does not handle nested comments.
85
86```rust
87use nom::{
88 IResult,
89 error::ParseError,
90 combinator::value,
91 sequence::tuple,
92 bytes::complete::{tag, take_until},
93};
94
95pub fn pinline_comment<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, (), E> {
96 value(
97 (), // Output is thrown away.
98 tuple((
99 tag("(*"),
100 take_until("*)"),
101 tag("*)")
102 ))
103 )(i)
104}
105```
106
107## Identifiers
108
109### `Rust-Style Identifiers`
110
111Parsing identifiers that may start with a letter (or underscore) and may contain underscores,
112letters and numbers may be parsed like this:
113
114```rust
115use nom::{
116 IResult,
117 branch::alt,
49aad941 118 multi::many0_count,
5099ac24
FG
119 combinator::recognize,
120 sequence::pair,
121 character::complete::{alpha1, alphanumeric1},
122 bytes::complete::tag,
123};
124
125pub fn identifier(input: &str) -> IResult<&str, &str> {
126 recognize(
127 pair(
128 alt((alpha1, tag("_"))),
49aad941 129 many0_count(alt((alphanumeric1, tag("_"))))
5099ac24
FG
130 )
131 )(input)
132}
133```
134
135Let's say we apply this to the identifier `hello_world123abc`. The first `alt` parser would
136recognize `h`. The `pair` combinator ensures that `ello_world123abc` will be piped to the next
137`alphanumeric0` parser, which recognizes every remaining character. However, the `pair` combinator
138returns a tuple of the results of its sub-parsers. The `recognize` parser produces a `&str` of the
139input text that was parsed, which in this case is the entire `&str` `hello_world123abc`.
140
141## Literal Values
142
143### Escaped Strings
144
49aad941 145This is [one of the examples](https://github.com/Geal/nom/blob/main/examples/string.rs) in the
5099ac24
FG
146examples directory.
147
148### Integers
149
150The following recipes all return string slices rather than integer values. How to obtain an
151integer value instead is demonstrated for hexadecimal integers. The others are similar.
152
153The parsers allow the grouping character `_`, which allows one to group the digits by byte, for
154example: `0xA4_3F_11_28`. If you prefer to exclude the `_` character, the lambda to convert from a
155string slice to an integer value is slightly simpler. You can also strip the `_` from the string
156slice that is returned, which is demonstrated in the second hexdecimal number parser.
157
158If you wish to limit the number of digits in a valid integer literal, replace `many1` with
159`many_m_n` in the recipes.
160
161#### Hexadecimal
162
163The parser outputs the string slice of the digits without the leading `0x`/`0X`.
164
165```rust
166use nom::{
167 IResult,
168 branch::alt,
169 multi::{many0, many1},
170 combinator::recognize,
171 sequence::{preceded, terminated},
172 character::complete::{char, one_of},
173 bytes::complete::tag,
174};
175
176fn hexadecimal(input: &str) -> IResult<&str, &str> { // <'a, E: ParseError<&'a str>>
177 preceded(
178 alt((tag("0x"), tag("0X"))),
179 recognize(
180 many1(
181 terminated(one_of("0123456789abcdefABCDEF"), many0(char('_')))
182 )
183 )
184 )(input)
185}
186```
187
188If you want it to return the integer value instead, use map:
189
190```rust
191use nom::{
192 IResult,
193 branch::alt,
194 multi::{many0, many1},
195 combinator::{map_res, recognize},
196 sequence::{preceded, terminated},
197 character::complete::{char, one_of},
198 bytes::complete::tag,
199};
200
201fn hexadecimal_value(input: &str) -> IResult<&str, i64> {
202 map_res(
203 preceded(
204 alt((tag("0x"), tag("0X"))),
205 recognize(
206 many1(
207 terminated(one_of("0123456789abcdefABCDEF"), many0(char('_')))
208 )
209 )
210 ),
211 |out: &str| i64::from_str_radix(&str::replace(&out, "_", ""), 16)
212 )(input)
213}
214```
215
216#### Octal
217
218```rust
219use nom::{
220 IResult,
221 branch::alt,
222 multi::{many0, many1},
223 combinator::recognize,
224 sequence::{preceded, terminated},
225 character::complete::{char, one_of},
226 bytes::complete::tag,
227};
228
229fn octal(input: &str) -> IResult<&str, &str> {
230 preceded(
231 alt((tag("0o"), tag("0O"))),
232 recognize(
233 many1(
234 terminated(one_of("01234567"), many0(char('_')))
235 )
236 )
237 )(input)
238}
239```
240
241#### Binary
242
243```rust
244use nom::{
245 IResult,
246 branch::alt,
247 multi::{many0, many1},
248 combinator::recognize,
249 sequence::{preceded, terminated},
250 character::complete::{char, one_of},
251 bytes::complete::tag,
252};
253
254fn binary(input: &str) -> IResult<&str, &str> {
255 preceded(
256 alt((tag("0b"), tag("0B"))),
257 recognize(
258 many1(
259 terminated(one_of("01"), many0(char('_')))
260 )
261 )
262 )(input)
263}
264```
265
266#### Decimal
267
268```rust
269use nom::{
270 IResult,
271 multi::{many0, many1},
272 combinator::recognize,
273 sequence::terminated,
274 character::complete::{char, one_of},
275};
276
277fn decimal(input: &str) -> IResult<&str, &str> {
278 recognize(
279 many1(
280 terminated(one_of("0123456789"), many0(char('_')))
281 )
282 )(input)
283}
284```
285
286### Floating Point Numbers
287
288The following is adapted from [the Python parser by Valentin Lorentz (ProgVal)](https://github.com/ProgVal/rust-python-parser/blob/master/src/numbers.rs).
289
290```rust
291use nom::{
292 IResult,
293 branch::alt,
294 multi::{many0, many1},
295 combinator::{opt, recognize},
296 sequence::{preceded, terminated, tuple},
297 character::complete::{char, one_of},
298};
299
300fn float(input: &str) -> IResult<&str, &str> {
301 alt((
302 // Case one: .42
303 recognize(
304 tuple((
305 char('.'),
306 decimal,
307 opt(tuple((
308 one_of("eE"),
309 opt(one_of("+-")),
310 decimal
311 )))
312 ))
313 )
314 , // Case two: 42e42 and 42.42e42
315 recognize(
316 tuple((
317 decimal,
318 opt(preceded(
319 char('.'),
320 decimal,
321 )),
322 one_of("eE"),
323 opt(one_of("+-")),
324 decimal
325 ))
326 )
327 , // Case three: 42. and 42.42
328 recognize(
329 tuple((
330 decimal,
331 char('.'),
332 opt(decimal)
333 ))
334 )
335 ))(input)
336}
337
338fn decimal(input: &str) -> IResult<&str, &str> {
339 recognize(
340 many1(
341 terminated(one_of("0123456789"), many0(char('_')))
342 )
343 )(input)
344}
345```
346
347# implementing FromStr
348
349The [FromStr trait](https://doc.rust-lang.org/std/str/trait.FromStr.html) provides
350a common interface to parse from a string.
351
352```rust
353use nom::{
354 IResult, Finish, error::Error,
355 bytes::complete::{tag, take_while},
356};
357use std::str::FromStr;
358
359// will recognize the name in "Hello, name!"
360fn parse_name(input: &str) -> IResult<&str, &str> {
361 let (i, _) = tag("Hello, ")(input)?;
362 let (i, name) = take_while(|c:char| c.is_alphabetic())(i)?;
363 let (i, _) = tag("!")(i)?;
364
365 Ok((i, name))
366}
367
368// with FromStr, the result cannot be a reference to the input, it must be owned
369#[derive(Debug)]
370pub struct Name(pub String);
371
372impl FromStr for Name {
373 // the error must be owned as well
374 type Err = Error<String>;
375
376 fn from_str(s: &str) -> Result<Self, Self::Err> {
377 match parse_name(s).finish() {
378 Ok((_remaining, name)) => Ok(Name(name.to_string())),
379 Err(Error { input, code }) => Err(Error {
380 input: input.to_string(),
381 code,
382 })
383 }
384 }
385}
386
387fn main() {
388 // parsed: Ok(Name("nom"))
389 println!("parsed: {:?}", "Hello, nom!".parse::<Name>());
390
391 // parsed: Err(Error { input: "123!", code: Tag })
392 println!("parsed: {:?}", "Hello, 123!".parse::<Name>());
393}
394```
395