pent Parser Tutorial¶
There is almost always more than one way to construct a pent Parser
to capture a given dataset. Sometimes, if the data
format is complex or contains irrelevant content interspersed with the
data of interest, significant pre- or post-processing may be required. As well,
it’s important to inspect your starting data carefully, often by
loading it into a Python string, to be sure there aren’t, say, a bunch of
unprintable characters floating around and fouling the regex matches.
This tutorial starts by describing the basic structure of
the semantic components of pent’s parsing model:
tokens, patterns, and Parsers.
It then lays out some approaches to constructing Parsers
for realistic datasets, with the goal of enabling new users
to get quickly up to speed
building their own Parsers.
For a formal description of the grammar of the tokens used herein, see the pent Mini-Language Grammar.
- Basic Usage
- Examples
- Capturing with a Single
Parser - Capturing with Nested
Parsers - The Misc Token
- *Post-Processing of Captured Data
- *Internal Spaces in One-Or-More Matches
- The Optional-Line Token
- Required/Optional/Prohibited Trailing Whitespace
- *’Any’ Tokens at EOL
- *Pre-Processing/Data Cleanup Example
- *Examples of
Parser-Generated Regex
- Capturing with a Single