pent Parser Tutorial¶
There is almost always more than one way to construct a pent
Parser
to capture a given dataset. Sometimes, if the data
format is complex or contains irrelevant content interspersed with the
data of interest, significant pre- or post-processing may be required. As well,
it’s important to inspect your starting data carefully, often by
loading it into a Python string, to be sure there aren’t, say, a bunch of
unprintable characters floating around and fouling the regex matches.
This tutorial starts by describing the basic structure of
the semantic components of pent
’s parsing model:
tokens, patterns, and Parsers
.
It then lays out some approaches to constructing Parsers
for realistic datasets, with the goal of enabling new users
to get quickly up to speed
building their own Parsers
.
For a formal description of the grammar of the tokens used herein, see the pent Mini-Language Grammar.
- Basic Usage
- Examples
- Capturing with a Single
Parser
- Capturing with Nested
Parser
s - The Misc Token
- *Post-Processing of Captured Data
- *Internal Spaces in One-Or-More Matches
- The Optional-Line Token
- Required/Optional/Prohibited Trailing Whitespace
- *’Any’ Tokens at EOL
- *Pre-Processing/Data Cleanup Example
- *Examples of
Parser
-Generated Regex
- Capturing with a Single