Basic Usage: Patterns

A pent pattern is a series of whitespace-delimited tokens that represents all non-whitespace content on a given line of text.

A blank line—one that is empty, or contains only whitespace—can be matched with an empty pattern string:

>>> check_pattern(pattern="", text="")
MATCH

>>> check_pattern(pattern="", text="          ")
MATCH

>>> check_pattern(pattern="", text="     \t   ")
MATCH

If a line contains one piece of non-whitespace text, a single token will suffice to match the whole line:

>>> check_pattern(pattern="&.", text="foo")
MATCH

>>> check_pattern(pattern="&.", text="     foo")
MATCH

>>> check_pattern(pattern="#..i", text="-5")
MATCH

>>> check_pattern(pattern="#..i", text="    50000   ")
MATCH

>>> check_pattern(pattern="#..f", text="2")  # Wrong number type
NO MATCH

>>> check_pattern(pattern="#.-i", text="2")  # Wrong number sign
NO MATCH

>>> check_pattern(pattern="", text="42")  # Line is not blank
NO MATCH

If a line contains more than one piece of non-whitespace text, all pieces must be matched by a token in the pattern:

>>> check_pattern(pattern="&+", text="foo bar baz")  # One-or-more gets all three
MATCH

>>> check_pattern(pattern="&. &.", text="foo bar baz")  # Only 2/3 words matched
NO MATCH

>>> check_pattern(pattern="&. #..i", text="foo 42")
MATCH

>>> check_pattern(pattern="&+ #..i", text="foo bar baz 42")
MATCH

>>> check_pattern(pattern="#+.i", text="-2 -1 0 1 2")
MATCH

>>> check_pattern(pattern="#+.i", text="-2 -1 foo 1 2")  # 'foo' is not an int
NO MATCH

>>> check_pattern(pattern="#+.i &. #+.i", text="-2 -1 foo 1 2")
MATCH

Be careful when using “~” and “&+”, as they may match more aggressively than expected:

>>> check_pattern(pattern="~ #+.i", text="foo bar 42 34")
MATCH

>>> show_capture(pattern="~! #+.i", text="foo bar 42 34")
[[['foo', 'bar']]]

>>> check_pattern(pattern="&+ #+.i", text="foo bar 42 34")
MATCH

>>> show_capture(pattern="&!+ #+.i", text="foo bar 42 34")
[[['foo', 'bar', '42']]]

>>> check_pattern(pattern="&+ #+.i", text="foo 42 bar 34")
MATCH

>>> show_capture(pattern="&!+ #+.i", text="foo 42 bar 34")
[[['foo', '42', 'bar']]]

Punctuation will foul matches unless explicitly accounted for:

>>> check_pattern(pattern="#+.i", text="1 2 ---- 3 4")
NO MATCH

>>> check_pattern(pattern="#+.i &. #+.i", text="1 2 ---- 3 4")
MATCH

In situations where punctuation is directly adjacent to the content to be captured, the space-after flags must be used to modify pent’s expectations for whitespace:

>>> check_pattern(pattern="~ #..d @..", text="The value is 3.1415.")  # No space between number and '.'
NO MATCH

>>> check_pattern(pattern="~ #x..d @..", text="The value is 3.1415.")
MATCH

In situations where some initial content will definitely appear on a line, but some additional trailing content may or may not appear at the end of the line, it’s important to use one of the space-after modifier flags in order for pent to find a match when the trailing content is absent. This is because the default required trailing whitespace will (naturally) require whitespace to be present between the end of the matched content and the end of the line, and if EOL immediately follows the content the pattern match will fail, since the required whitespace is absent:

>>> check_pattern(pattern="&. #.+i ~", text="always 42 sometimes")
MATCH

>>> check_pattern(pattern="&. #.+i ~", text="always 42")
NO MATCH

>>> check_pattern(pattern="&. #.+i ~", text="always 42   ")
MATCH

>>> check_pattern(pattern="&. #x.+i ~", text="always 42")
MATCH

>>> check_pattern(pattern="&. #x.+i ~", text="always 42 sometimes")
MATCH

Optional Line Flag: ?

In some cases, an entire line of text will be present in some occurrences of a desired Parser match with a block of text, but absent in others. To accommodate such situations, pent recognizes an ‘optional-line flag’ in a pattern. This flag is a sole “?”, occurring as the first “token” in the pattern. Inclusion of this flag will cause the pattern to match in the following three cases:

  1. A line is present that completely matches the optional pattern (per usual behavior).

  2. A blank line (no non-whitespace content) is present where the optional pattern would match.

  3. NO line is present where the optional pattern would match.

It is difficult to construct meaningful examples of this behavior without using a full Parser construction; as such, see this tutorial page for more details.