API (draft page)¶
Unstructured API dump, to provide cross-reference targets for other portions of the docs.
Any of the objects/attributes/methods documented here may
become private implementation details in future
versions of pent.
Mini-language parser for pent.
pent Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
8 Sep 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
class
pent.parser.Parser(head=None, body=None, tail=None)¶ Mini-language parser for structured numerical data.
-
capture_body(text)¶ Capture all values from the pattern body, recursing if needed.
-
classmethod
capture_parser(prs, text)¶ Perform capture of a Parser pattern.
-
classmethod
capture_section(sec, text)¶ Perform capture of a str, iterable, or Parser section.
-
classmethod
capture_str_pattern(pat_str, text)¶ Perform capture of string/iterable-of-str pattern.
-
capture_struct(text)¶ Perform capture of marked groups to nested dict(s).
-
classmethod
convert_line(line, *, capture_groups=True, group_id=0)¶ Convert line of tokens to regex.
The constructed regex is required to match the entirety of a line of text, using lookbehind and lookahead at the start and end of the pattern, respectively.
group_id indicates the starting value of the index for any capture groups added.
-
classmethod
convert_section(sec, capture_groups=False, capture_sections=True)¶ Convert the head, body or tail to regex.
-
static
generate_captures(m)¶ Generate captures from a regex match.
-
pattern(capture_sections=True)¶ Return the regex pattern for the entire parser.
The individual capture groups are NEVER inserted when regex is generated this way.
Instead, head/body/tail capture groups are inserted, in order to subdivide matched text by these subsets. These ‘section’ capture groups are ONLY inserted for the top-level Parser, though – they are suppressed for inner nested Parsers.
-
Token handling for mini-language parser for pent.
pent Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
20 Sep 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
class
pent.token.Token(token, do_capture=True)¶ Encapsulates transforming mini-language patterns tokens into regex.
-
property
capture¶ Return flag for whether a regex capture group should be created.
-
do_capture¶ Whether group capture should be added or not
-
property
is_any¶ Return flag for whether the token is an “any content” token.
-
property
is_misc¶ Return flag for whether the token is a misc token.
-
property
is_num¶ Return flag for whether the token matches a number.
-
property
is_optional_line¶ Return flag for whether the token flags an optional line.
-
property
is_str¶ Return flag for whether the token matches a literal string.
-
property
match_quantity¶ Return match quantity.
Noneforpent.enums.Content.Anyorpent.enums.Content.OptionalLine
-
needs_group_id¶ Flag for whether group ID substitution needs to be done
-
property
space_after¶ Return Enum value for handling of post-match whitespace.
-
token¶ Mini-language token string to be parsed
-
property
Regex patterns for pent.
pent Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
2 Sep 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
pent.patterns.number_patterns= {(<Number.Decimal: 'd'>, <Sign.Positive: '+'>): '[+]?(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Decimal: 'd'>, <Sign.Negative: '-'>): '-(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Decimal: 'd'>, <Sign.Any: '.'>): '[+-]?(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Float: 'f'>, <Sign.Positive: '+'>): '[+]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.Float: 'f'>, <Sign.Negative: '-'>): '-((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.Float: 'f'>, <Sign.Any: '.'>): '[+-]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.General: 'g'>, <Sign.Positive: '+'>): '[+]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.General: 'g'>, <Sign.Negative: '-'>): '-((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.General: 'g'>, <Sign.Any: '.'>): '[+-]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.Integer: 'i'>, <Sign.Positive: '+'>): '[+]?\\d+', (<Number.Integer: 'i'>, <Sign.Negative: '-'>): '-\\d+', (<Number.Integer: 'i'>, <Sign.Any: '.'>): '[+-]?\\d+', (<Number.SciNot: 's'>, <Sign.Positive: '+'>): '[+]?(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)', (<Number.SciNot: 's'>, <Sign.Negative: '-'>): '-(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)', (<Number.SciNot: 's'>, <Sign.Any: '.'>): '[+-]?(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)'}¶ dictofpyparsingpatterns matching single numbers.
-
pent.patterns.std_num_punct= 'deDE+.-'¶ strwith the standard numerical punctuation to include as not marking word boundaries. de is included to account for scientific notation.
-
pent.patterns.std_scinot_markers= 'deDE'¶ strwith the standard allowed scientific notation exponent marker characters
-
pent.patterns.std_word_chars= 'a-zA-Z0-9deDE+.-'¶ Standard word marker characters for pent
-
pent.patterns.std_wordify(p)¶ Wrap a token in the
pentstandard word start/end markers.
-
pent.patterns.std_wordify_close(p)¶ Append the standard word end markers.
-
pent.patterns.std_wordify_open(p)¶ Prepend the standard word start markers.
-
pent.patterns.wordify_close(p, word_chars)¶ Append the word end markers.
-
pent.patterns.wordify_open(p, word_chars)¶ Prepend the word start markers.
-
pent.patterns.wordify_pattern(p, word_chars)¶ Wrap pattern with word start/end markers using arbitrary word chars.
Enums for pent.
pent Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
3 Sep 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
class
pent.enums.Content¶ Enumeration for the possible types of content.
-
Any= '~'¶ Arbitrary match, including whitespace
-
Misc= '&'¶ Arbitrary single-“word” match, no whitespace
-
Number= '#'¶ Number
-
OptionalLine= '?'¶ Flag to mark pattern line as optional
-
String= '@'¶ Literal string
-
-
class
pent.enums.Number¶ Enumeration for the different kinds of recognized number primitives.
-
Decimal= 'd'¶ Decimal floating-point value; no scientific/exponential notation
-
Float= 'f'¶ “Floating-point value with or without an exponent
-
General= 'g'¶ “General” value; integer, float, or scientific notation
-
Integer= 'i'¶ Integer value; no decimal or scientific/exponential notation
-
SciNot= 's'¶ Scientific/exponential notation, where exponent is required
-
-
class
pent.enums.ParserField¶ Enumeration for the fields/subsections of a Parser pattern.
-
Body= 'body'¶ Body
-
Head= 'head'¶ Header
-
Tail= 'tail'¶ Tail/footer
-
-
class
pent.enums.Quantity¶ Enumeration for the various match quantities.
-
OneOrMore= '+'¶ One-or-more match
-
Single= '.'¶ Single value match
-
-
class
pent.enums.Sign¶ Enumeration for the different kinds of recognized numerical signs.
-
Any= '.'¶ Any sign
-
Negative= '-'¶ Negative value only (leading ‘-‘ required; includes negative zero)
-
Positive= '+'¶ Positive value only (leading ‘+’ optional; includes zero)
-
-
class
pent.enums.SpaceAfter¶ Enumeration for the various constraints on space after tokens.
-
Optional= 'o'¶ Optional following space
-
Prohibited= 'x'¶ Following space prohibited
-
Required= ''¶ Default is required following space; no explicit enum value
-
-
class
pent.enums.TokenField¶ Enumeration for fields within a mini-language number token.
-
Capture= 'capture'¶ Flag to ignore matched content when collecting into regex groups
-
Number= 'number'¶ Format of the numerical value (int, float, scinot, decimal, general)
-
Quantity= 'quantity'¶ Match quantity of the field (single value, optional, one-or-more, zero-or-more, etc.)
-
Sign= 'sign'¶ Sign of acceptable values (any, positive, negative)
-
SignNumber= 'sign_number'¶ Combined sign and number, for initial pattern group retrieval
-
SpaceAfter= 'space_after'¶ Flag to change the space-after behavior of a token
-
Str= 'str'¶ Literal content, for a string match
-
Type= 'type'¶ Content type (any, string, number)
-
Custom exceptions for pent.
pent Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
10 Sep 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
exception
pent.errors.LineError(line)¶ Raised during attempts to parse invalid token sequences.
-
exception
pent.errors.PentError¶ Superclass for all custom pent errors.
-
exception
pent.errors.SectionError(msg='')¶ Raised from failed attempts to parse a Parser section.
-
exception
pent.errors.ThruListError(msg='')¶ Raised from failed ThruList indexing attempts.
-
exception
pent.errors.TokenError(token)¶ Raised during attempts to parse an invalid token.
Custom list object for pent.
pent Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
3 Oct 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
class
pent.thrulist.ThruList¶ List that passes through key if len == 1.
Utility functions for pent.
pent Extracts Numerical Text.
- Author
Brian Skinn (bskinn@alum.mit.edu)
- File Created
14 Oct 2018
- Copyright
(c) Brian Skinn 2018-2019
- Source Repository
- Documentation
- License
The MIT License; see LICENSE.txt for full license terms
Members
-
pent.utils.column_stack_2d(data)¶ Perform column-stacking on a list of 2d data blocks.