nom
is a great parser-combinator crate for Rust. It is constantly among my top-3 reasons to choose Rust for some of my projects. One of the reasons
is that it makes it easy to "grow" parsers. It is natural to start a project with a certain over-the-wire or at-rest format, and then later on find a
need to evolve it. Add extra fields, alternative notation, rename certain concepts. nom
makes it easy to support legacy format without any loss of the
code readability thanks to its alt
combinator:
let (input, commands) = many1(alt((legacy_command, command_v1, command_v2)))(input)?;
But having nom
to try out wrong formats when the input actually contains a mistake can lead to two problems:
-
It will hurt parsing performance
-
It will produce less coherent errors (even when using
ErrorTree
fromnom-superior
)
If you ever see Alt
combinator featured prominently in the Flamegraph (cargo flamegraph
is awesome) it may indicate this particular situation.
If you see a fairly misleading expecting EOF
error on an invalid input instead of a meaningful error, it is another indicator. And props to you for not forgetting the all_consuming
combinator.
The simple pattern to avoid this situation is to wrap the parsers where no alternatives possible in a cut-context pair.
For example, consider a parser for an assignment string:
let (input, identifier) = ws(identifier)(input)?;
let (input, _) = tag("=")(input)?;
let (input, operator) = ws(parse_operator)(input)?;
If the first two parsers succeeded, but the third failed, there is no reason to try anything else. We already know it is an assignment line that was somehow malformed. Let's tell the end user that in a nice error while also saving ourselves some precious CPU cycles:
let (input, identifier) = ws(identifier)(input)?;
let (input, _) = tag("=")(input)?;
let (input, operator) = ws(cut(context("Expecting operator in the `name(args)` format",
parse_operator)))(input)?;
While we are talking about malformed input, you do have unit tests for the invalid strings as well as for the valid ones, right?