API (draft page)¶

Unstructured API dump, to provide cross-reference targets for other portions of the docs.

Any of the objects/attributes/methods documented here may become private implementation details in future versions of pent.

Mini-language parser for pent.

pent Extracts Numerical Text.

Author: Brian Skinn (bskinn@alum.mit.edu)
File Created: 8 Sep 2018
Copyright: (c) Brian Skinn 2018-2019
Source Repository: http://www.github.com/bskinn/pent
Documentation: http://pent.readthedocs.io
License: The MIT License; see LICENSE.txt for full license terms

Members

class pent.parser.Parser(head=None, body=None, tail=None)¶

Mini-language parser for structured numerical data.

capture_body(text)¶: Capture all values from the pattern body, recursing if needed.

classmethod capture_parser(prs, text)¶: Perform capture of a Parser pattern.

classmethod capture_section(sec, text)¶: Perform capture of a str, iterable, or Parser section.

classmethod capture_str_pattern(pat_str, text)¶: Perform capture of string/iterable-of-str pattern.

capture_struct(text)¶: Perform capture of marked groups to nested dict(s).

classmethod convert_line(line, *, capture_groups=True, group_id=0)¶

Convert line of tokens to regex.

The constructed regex is required to match the entirety of a line of text, using lookbehind and lookahead at the start and end of the pattern, respectively.

group_id indicates the starting value of the index for any capture groups added.

classmethod convert_section(sec, capture_groups=False, capture_sections=True)¶: Convert the head, body or tail to regex.

static generate_captures(m)¶: Generate captures from a regex match.

pattern(capture_sections=True)¶

Return the regex pattern for the entire parser.

The individual capture groups are NEVER inserted when regex is generated this way.

Instead, head/body/tail capture groups are inserted, in order to subdivide matched text by these subsets. These ‘section’ capture groups are ONLY inserted for the top-level Parser, though – they are suppressed for inner nested Parsers.

Token handling for mini-language parser for pent.

pent Extracts Numerical Text.

Author: Brian Skinn (bskinn@alum.mit.edu)
File Created: 20 Sep 2018
Copyright: (c) Brian Skinn 2018-2019
Source Repository: http://www.github.com/bskinn/pent
Documentation: http://pent.readthedocs.io
License: The MIT License; see LICENSE.txt for full license terms

Members

class pent.token.Token(token, do_capture=True)¶

Encapsulates transforming mini-language patterns tokens into regex.

property capture¶: Return flag for whether a regex capture group should be created.

do_capture¶: Whether group capture should be added or not

property is_any¶: Return flag for whether the token is an “any content” token.

property is_misc¶: Return flag for whether the token is a misc token.

property is_num¶: Return flag for whether the token matches a number.

property is_optional_line¶: Return flag for whether the token flags an optional line.

property is_str¶: Return flag for whether the token matches a literal string.

property match_quantity¶

Return match quantity.

None for pent.enums.Content.Any or pent.enums.Content.OptionalLine

needs_group_id¶: Flag for whether group ID substitution needs to be done

property number¶: #: Return number format; None if token doesn’t match a number.

property pattern¶: Return assembled regex pattern from the token, as str.

property sign¶: #: Return number sign; None if token doesn’t match a number.

property space_after¶: Return Enum value for handling of post-match whitespace.

token¶: Mini-language token string to be parsed

Regex patterns for pent.

pent Extracts Numerical Text.

Author: Brian Skinn (bskinn@alum.mit.edu)
File Created: 2 Sep 2018
Copyright: (c) Brian Skinn 2018-2019
Source Repository: http://www.github.com/bskinn/pent
Documentation: http://pent.readthedocs.io
License: The MIT License; see LICENSE.txt for full license terms

Members

pent.patterns.number_patterns = {(<Number.Decimal: 'd'>, <Sign.Positive: '+'>): '[+]?(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Decimal: 'd'>, <Sign.Negative: '-'>): '-(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Decimal: 'd'>, <Sign.Any: '.'>): '[+-]?(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Float: 'f'>, <Sign.Positive: '+'>): '[+]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.Float: 'f'>, <Sign.Negative: '-'>): '-((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.Float: 'f'>, <Sign.Any: '.'>): '[+-]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.General: 'g'>, <Sign.Positive: '+'>): '[+]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.General: 'g'>, <Sign.Negative: '-'>): '-((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.General: 'g'>, <Sign.Any: '.'>): '[+-]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.Integer: 'i'>, <Sign.Positive: '+'>): '[+]?\\d+', (<Number.Integer: 'i'>, <Sign.Negative: '-'>): '-\\d+', (<Number.Integer: 'i'>, <Sign.Any: '.'>): '[+-]?\\d+', (<Number.SciNot: 's'>, <Sign.Positive: '+'>): '[+]?(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)', (<Number.SciNot: 's'>, <Sign.Negative: '-'>): '-(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)', (<Number.SciNot: 's'>, <Sign.Any: '.'>): '[+-]?(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)'}¶: dict of pyparsing patterns matching single numbers.

pent.patterns.std_num_punct = 'deDE+.-'¶: str with the standard numerical punctuation to include as not marking word boundaries. de is included to account for scientific notation.

pent.patterns.std_scinot_markers = 'deDE'¶: str with the standard allowed scientific notation exponent marker characters

pent.patterns.std_word_chars = 'a-zA-Z0-9deDE+.-'¶: Standard word marker characters for pent

pent.patterns.std_wordify(p)¶: Wrap a token in the pent standard word start/end markers.

pent.patterns.std_wordify_close(p)¶: Append the standard word end markers.

pent.patterns.std_wordify_open(p)¶: Prepend the standard word start markers.

pent.patterns.wordify_close(p, word_chars)¶: Append the word end markers.

pent.patterns.wordify_open(p, word_chars)¶: Prepend the word start markers.

pent.patterns.wordify_pattern(p, word_chars)¶: Wrap pattern with word start/end markers using arbitrary word chars.

Enums for pent.

pent Extracts Numerical Text.

Author: Brian Skinn (bskinn@alum.mit.edu)
File Created: 3 Sep 2018
Copyright: (c) Brian Skinn 2018-2019
Source Repository: http://www.github.com/bskinn/pent
Documentation: http://pent.readthedocs.io
License: The MIT License; see LICENSE.txt for full license terms

Members

class pent.enums.Content¶

Enumeration for the possible types of content.

Any = '~'¶: Arbitrary match, including whitespace

Misc = '&'¶: Arbitrary single-“word” match, no whitespace

Number = '#'¶: Number

OptionalLine = '?'¶: Flag to mark pattern line as optional

String = '@'¶: Literal string

class pent.enums.Number¶

Enumeration for the different kinds of recognized number primitives.

Decimal = 'd'¶: Decimal floating-point value; no scientific/exponential notation

Float = 'f'¶: “Floating-point value with or without an exponent

General = 'g'¶: “General” value; integer, float, or scientific notation

Integer = 'i'¶: Integer value; no decimal or scientific/exponential notation

SciNot = 's'¶: Scientific/exponential notation, where exponent is required

class pent.enums.ParserField¶

Enumeration for the fields/subsections of a Parser pattern.

Body = 'body'¶: Body

Head = 'head'¶: Header

Tail = 'tail'¶: Tail/footer

class pent.enums.Quantity¶

Enumeration for the various match quantities.

OneOrMore = '+'¶: One-or-more match

Single = '.'¶: Single value match

class pent.enums.Sign¶

Enumeration for the different kinds of recognized numerical signs.

Any = '.'¶: Any sign

Negative = '-'¶: Negative value only (leading ‘-‘ required; includes negative zero)

Positive = '+'¶: Positive value only (leading ‘+’ optional; includes zero)

class pent.enums.SpaceAfter¶

Enumeration for the various constraints on space after tokens.

Optional = 'o'¶: Optional following space

Prohibited = 'x'¶: Following space prohibited

Required = ''¶: Default is required following space; no explicit enum value

class pent.enums.TokenField¶

Enumeration for fields within a mini-language number token.

Capture = 'capture'¶: Flag to ignore matched content when collecting into regex groups

Number = 'number'¶: Format of the numerical value (int, float, scinot, decimal, general)

Quantity = 'quantity'¶: Match quantity of the field (single value, optional, one-or-more, zero-or-more, etc.)

Sign = 'sign'¶: Sign of acceptable values (any, positive, negative)

SignNumber = 'sign_number'¶: Combined sign and number, for initial pattern group retrieval

SpaceAfter = 'space_after'¶: Flag to change the space-after behavior of a token

Str = 'str'¶: Literal content, for a string match

Type = 'type'¶: Content type (any, string, number)

Custom exceptions for pent.

pent Extracts Numerical Text.

Author: Brian Skinn (bskinn@alum.mit.edu)
File Created: 10 Sep 2018
Source Repository: http://www.github.com/bskinn/pent
Documentation: http://pent.readthedocs.io
License: The MIT License; see LICENSE.txt for full license terms

Members

exception pent.errors.LineError(line)¶: Raised during attempts to parse invalid token sequences.

exception pent.errors.PentError¶: Superclass for all custom pent errors.

exception pent.errors.SectionError(msg='')¶: Raised from failed attempts to parse a Parser section.

exception pent.errors.ThruListError(msg='')¶: Raised from failed ThruList indexing attempts.

exception pent.errors.TokenError(token)¶: Raised during attempts to parse an invalid token.

Custom list object for pent.

pent Extracts Numerical Text.

Author: Brian Skinn (bskinn@alum.mit.edu)
File Created: 3 Oct 2018
Source Repository: http://www.github.com/bskinn/pent
Documentation: http://pent.readthedocs.io
License: The MIT License; see LICENSE.txt for full license terms

Members

class pent.thrulist.ThruList¶: List that passes through key if len == 1.

Utility functions for pent.

pent Extracts Numerical Text.

Author: Brian Skinn (bskinn@alum.mit.edu)
File Created: 14 Oct 2018
Source Repository: http://www.github.com/bskinn/pent
Documentation: http://pent.readthedocs.io
License: The MIT License; see LICENSE.txt for full license terms

Members

pent.utils.column_stack_2d(data)¶: Perform column-stacking on a list of 2d data blocks.