The cut-context pattern with nom

nom is a great parser-combinator crate for Rust. It is constantly among my top-3 reasons to choose Rust for some of my projects. One of the reasons is that it makes it easy to "grow" parsers. It is natural to start a project with a certain over-the-wire or at-rest format, and then later on find a need to evolve it. Add extra fields, alternative notation, rename certain concepts. nom makes it easy to support legacy format without any loss of the code readability thanks to its alt combinator:

let (input, commands) = many1(alt((legacy_command, command_v1, command_v2)))(input)?;

But having nom to try out wrong formats when the input actually contains a mistake can lead to two problems:

  1. It will hurt parsing performance

  2. It will produce less coherent errors (even when using ErrorTree from nom-superior)

If you ever see Alt combinator featured prominently in the Flamegraph (cargo flamegraph is awesome) it may indicate this particular situation.

If you see a fairly misleading expecting EOF error on an invalid input instead of a meaningful error, it is another indicator. And props to you for not forgetting the all_consuming combinator.

The simple pattern to avoid this situation is to wrap the parsers where no alternatives possible in a cut-context pair.

For example, consider a parser for an assignment string:

let (input, identifier) = ws(identifier)(input)?;
let (input, _) = tag("=")(input)?;
let (input, operator) = ws(parse_operator)(input)?;

If the first two parsers succeeded, but the third failed, there is no reason to try anything else. We already know it is an assignment line that was somehow malformed. Let's tell the end user that in a nice error while also saving ourselves some precious CPU cycles:

let (input, identifier) = ws(identifier)(input)?;
let (input, _) = tag("=")(input)?;
let (input, operator) = ws(cut(context("Expecting operator in the `name(args)` format",
                           parse_operator)))(input)?;

While we are talking about malformed input, you do have unit tests for the invalid strings as well as for the valid ones, right?

Posted On

Category:

Tags: / /