API (draft page)¶
Unstructured API dump, to provide cross-reference targets for other portions of the docs.
Any of the objects/attributes/methods documented here may
become private implementation details in future
versions of pent
.
Mini-language parser for pent
.
pent
Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
8 Sep 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
class
pent.parser.
Parser
(head=None, body=None, tail=None)¶ Mini-language parser for structured numerical data.
-
capture_body
(text)¶ Capture all values from the pattern body, recursing if needed.
-
classmethod
capture_parser
(prs, text)¶ Perform capture of a Parser pattern.
-
classmethod
capture_section
(sec, text)¶ Perform capture of a str, iterable, or Parser section.
-
classmethod
capture_str_pattern
(pat_str, text)¶ Perform capture of string/iterable-of-str pattern.
-
capture_struct
(text)¶ Perform capture of marked groups to nested dict(s).
-
classmethod
convert_line
(line, *, capture_groups=True, group_id=0)¶ Convert line of tokens to regex.
The constructed regex is required to match the entirety of a line of text, using lookbehind and lookahead at the start and end of the pattern, respectively.
group_id indicates the starting value of the index for any capture groups added.
-
classmethod
convert_section
(sec, capture_groups=False, capture_sections=True)¶ Convert the head, body or tail to regex.
-
static
generate_captures
(m)¶ Generate captures from a regex match.
-
pattern
(capture_sections=True)¶ Return the regex pattern for the entire parser.
The individual capture groups are NEVER inserted when regex is generated this way.
Instead, head/body/tail capture groups are inserted, in order to subdivide matched text by these subsets. These ‘section’ capture groups are ONLY inserted for the top-level Parser, though – they are suppressed for inner nested Parsers.
-
Token handling for mini-language parser for pent
.
pent
Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
20 Sep 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
class
pent.token.
Token
(token, do_capture=True)¶ Encapsulates transforming mini-language patterns tokens into regex.
-
property
capture
¶ Return flag for whether a regex capture group should be created.
-
do_capture
¶ Whether group capture should be added or not
-
property
is_any
¶ Return flag for whether the token is an “any content” token.
-
property
is_misc
¶ Return flag for whether the token is a misc token.
-
property
is_num
¶ Return flag for whether the token matches a number.
-
property
is_optional_line
¶ Return flag for whether the token flags an optional line.
-
property
is_str
¶ Return flag for whether the token matches a literal string.
-
property
match_quantity
¶ Return match quantity.
None
forpent.enums.Content.Any
orpent.enums.Content.OptionalLine
-
needs_group_id
¶ Flag for whether group ID substitution needs to be done
-
property
space_after
¶ Return Enum value for handling of post-match whitespace.
-
token
¶ Mini-language token string to be parsed
-
property
Regex patterns for pent
.
pent
Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
2 Sep 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
pent.patterns.
number_patterns
= {(<Number.Decimal: 'd'>, <Sign.Positive: '+'>): '[+]?(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Decimal: 'd'>, <Sign.Negative: '-'>): '-(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Decimal: 'd'>, <Sign.Any: '.'>): '[+-]?(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Float: 'f'>, <Sign.Positive: '+'>): '[+]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.Float: 'f'>, <Sign.Negative: '-'>): '-((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.Float: 'f'>, <Sign.Any: '.'>): '[+-]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.General: 'g'>, <Sign.Positive: '+'>): '[+]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.General: 'g'>, <Sign.Negative: '-'>): '-((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.General: 'g'>, <Sign.Any: '.'>): '[+-]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.Integer: 'i'>, <Sign.Positive: '+'>): '[+]?\\d+', (<Number.Integer: 'i'>, <Sign.Negative: '-'>): '-\\d+', (<Number.Integer: 'i'>, <Sign.Any: '.'>): '[+-]?\\d+', (<Number.SciNot: 's'>, <Sign.Positive: '+'>): '[+]?(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)', (<Number.SciNot: 's'>, <Sign.Negative: '-'>): '-(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)', (<Number.SciNot: 's'>, <Sign.Any: '.'>): '[+-]?(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)'}¶ dict
ofpyparsing
patterns matching single numbers.
-
pent.patterns.
std_num_punct
= 'deDE+.-'¶ str
with the standard numerical punctuation to include as not marking word boundaries. de is included to account for scientific notation.
-
pent.patterns.
std_scinot_markers
= 'deDE'¶ str
with the standard allowed scientific notation exponent marker characters
-
pent.patterns.
std_word_chars
= 'a-zA-Z0-9deDE+.-'¶ Standard word marker characters for pent
-
pent.patterns.
std_wordify
(p)¶ Wrap a token in the
pent
standard word start/end markers.
-
pent.patterns.
std_wordify_close
(p)¶ Append the standard word end markers.
-
pent.patterns.
std_wordify_open
(p)¶ Prepend the standard word start markers.
-
pent.patterns.
wordify_close
(p, word_chars)¶ Append the word end markers.
-
pent.patterns.
wordify_open
(p, word_chars)¶ Prepend the word start markers.
-
pent.patterns.
wordify_pattern
(p, word_chars)¶ Wrap pattern with word start/end markers using arbitrary word chars.
Enums
for pent
.
pent
Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
3 Sep 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
class
pent.enums.
Content
¶ Enumeration for the possible types of content.
-
Any
= '~'¶ Arbitrary match, including whitespace
-
Misc
= '&'¶ Arbitrary single-“word” match, no whitespace
-
Number
= '#'¶ Number
-
OptionalLine
= '?'¶ Flag to mark pattern line as optional
-
String
= '@'¶ Literal string
-
-
class
pent.enums.
Number
¶ Enumeration for the different kinds of recognized number primitives.
-
Decimal
= 'd'¶ Decimal floating-point value; no scientific/exponential notation
-
Float
= 'f'¶ “Floating-point value with or without an exponent
-
General
= 'g'¶ “General” value; integer, float, or scientific notation
-
Integer
= 'i'¶ Integer value; no decimal or scientific/exponential notation
-
SciNot
= 's'¶ Scientific/exponential notation, where exponent is required
-
-
class
pent.enums.
ParserField
¶ Enumeration for the fields/subsections of a Parser pattern.
-
Body
= 'body'¶ Body
-
Head
= 'head'¶ Header
-
Tail
= 'tail'¶ Tail/footer
-
-
class
pent.enums.
Quantity
¶ Enumeration for the various match quantities.
-
OneOrMore
= '+'¶ One-or-more match
-
Single
= '.'¶ Single value match
-
-
class
pent.enums.
Sign
¶ Enumeration for the different kinds of recognized numerical signs.
-
Any
= '.'¶ Any sign
-
Negative
= '-'¶ Negative value only (leading ‘-‘ required; includes negative zero)
-
Positive
= '+'¶ Positive value only (leading ‘+’ optional; includes zero)
-
-
class
pent.enums.
SpaceAfter
¶ Enumeration for the various constraints on space after tokens.
-
Optional
= 'o'¶ Optional following space
-
Prohibited
= 'x'¶ Following space prohibited
-
Required
= ''¶ Default is required following space; no explicit enum value
-
-
class
pent.enums.
TokenField
¶ Enumeration for fields within a mini-language number token.
-
Capture
= 'capture'¶ Flag to ignore matched content when collecting into regex groups
-
Number
= 'number'¶ Format of the numerical value (int, float, scinot, decimal, general)
-
Quantity
= 'quantity'¶ Match quantity of the field (single value, optional, one-or-more, zero-or-more, etc.)
-
Sign
= 'sign'¶ Sign of acceptable values (any, positive, negative)
-
SignNumber
= 'sign_number'¶ Combined sign and number, for initial pattern group retrieval
-
SpaceAfter
= 'space_after'¶ Flag to change the space-after behavior of a token
-
Str
= 'str'¶ Literal content, for a string match
-
Type
= 'type'¶ Content type (any, string, number)
-
Custom exceptions for pent
.
pent
Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
10 Sep 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
exception
pent.errors.
LineError
(line)¶ Raised during attempts to parse invalid token sequences.
-
exception
pent.errors.
PentError
¶ Superclass for all custom pent errors.
-
exception
pent.errors.
SectionError
(msg='')¶ Raised from failed attempts to parse a Parser section.
-
exception
pent.errors.
ThruListError
(msg='')¶ Raised from failed ThruList indexing attempts.
-
exception
pent.errors.
TokenError
(token)¶ Raised during attempts to parse an invalid token.
Custom list object for pent
.
pent
Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
3 Oct 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
class
pent.thrulist.
ThruList
¶ List that passes through key if len == 1.
Utility functions for pent
.
pent
Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
14 Oct 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
pent.utils.
column_stack_2d
(data)¶ Perform column-stacking on a list of 2d data blocks.