Required/Optional/Prohibited Trailing Whitespace

By default, number (#), misc (&), and literal (@) tokens require trailing whitespace to be present in the text in order to match:

>>> text_space = dedent("""\
... foo: 5
... bar: 8
... """)
>>> text_nospace = dedent("""\
... foo:5
... bar:8
... """)
>>> prs_req = pent.Parser(body="&. #!.+i")
>>> prs_req.capture_body(text_space)
[[['5'], ['8']]]
>>> prs_req.capture_body(text_nospace)
[]

pent provides a means to make this trailing whitespace either optional or prohibited, if needed, via a token-level flag.

Optional trailing whitespace is indicated with an “o” flag in the token:

>>> prs_opt = pent.Parser(body="&o. #!.+i")
>>> prs_opt.capture_body(text_space)
[[['5'], ['8']]]
>>> prs_opt.capture_body(text_nospace)
[[['5'], ['8']]]

Similarly, prohibited trailing whitespace is indicated with an “x” flag in the token:

>>> prs_prohib = pent.Parser(body="&x. #!.+i")
>>> prs_prohib.capture_body(text_space)
[]
>>> prs_prohib.capture_body(text_nospace)
[[['5'], ['8']]]

If used in combination with the capturing “!” flag, the trailing-space flag is placed before the capturing flag; e.g., as “&x!.”.

One common situation where this capability is needed is when a number of interest is contained in prose text and falls at the end of a sentence:

>>> text_prose = dedent("""\
... pi is approximately 3.14159.
... """)
>>> pent.Parser(body="~ #!..d &.").capture_body(text_prose)
[]
>>> pent.Parser(body="~ #x!..d &.").capture_body(text_prose)
[[['3.14159']]]

Don’t forget to include a token for that trailing period! The Parser won’t find a match, otherwise:

>>> pent.Parser(body="~ #x!..d").capture_body(text_prose)
[]

Limitations of the “Any” Token

Note that, as currently implemented, the ‘any’ token (~) does not allow specification of optional or prohibited trailing whitespace; any content that it matches must be followed by whitespace for the Parser to work:

>>> text_sandwich = dedent("""\
... This number3.14159is sandwiched in text.
... """)
>>> pent.Parser(body="~ #x!..d ~").capture_body(text_sandwich)
[]

In order to match this value, the preceding text must be matched either by a literal or a misc token:

>>> pent.Parser(body="~ @x.number #x!..d ~").capture_body(text_sandwich)
[[['3.14159']]]
>>> pent.Parser(body="~ &x. #x!..d ~").capture_body(text_sandwich)
[[['3.14159']]]

This deficiency will be addressed in #78.