API (draft page)

Unstructured API dump, to provide cross-reference targets for other portions of the docs.

Any of the objects/attributes/methods documented here may become private implementation details in future versions of pent.

Mini-language parser for pent.

pent Extracts Numerical Text.

Author

Brian Skinn (bskinn@alum.mit.edu)

File Created

8 Sep 2018

Copyright

(c) Brian Skinn 2018-2019

Source Repository

http://www.github.com/bskinn/pent

Documentation

http://pent.readthedocs.io

License

The MIT License; see LICENSE.txt for full license terms

Members

class pent.parser.Parser(head=None, body=None, tail=None)

Mini-language parser for structured numerical data.

capture_body(text)

Capture all values from the pattern body, recursing if needed.

classmethod capture_parser(prs, text)

Perform capture of a Parser pattern.

classmethod capture_section(sec, text)

Perform capture of a str, iterable, or Parser section.

classmethod capture_str_pattern(pat_str, text)

Perform capture of string/iterable-of-str pattern.

capture_struct(text)

Perform capture of marked groups to nested dict(s).

classmethod convert_line(line, *, capture_groups=True, group_id=0)

Convert line of tokens to regex.

The constructed regex is required to match the entirety of a line of text, using lookbehind and lookahead at the start and end of the pattern, respectively.

group_id indicates the starting value of the index for any capture groups added.

classmethod convert_section(sec, capture_groups=False, capture_sections=True)

Convert the head, body or tail to regex.

static generate_captures(m)

Generate captures from a regex match.

pattern(capture_sections=True)

Return the regex pattern for the entire parser.

The individual capture groups are NEVER inserted when regex is generated this way.

Instead, head/body/tail capture groups are inserted, in order to subdivide matched text by these subsets. These ‘section’ capture groups are ONLY inserted for the top-level Parser, though – they are suppressed for inner nested Parsers.

Token handling for mini-language parser for pent.

pent Extracts Numerical Text.

Author

Brian Skinn (bskinn@alum.mit.edu)

File Created

20 Sep 2018

Copyright

(c) Brian Skinn 2018-2019

Source Repository

http://www.github.com/bskinn/pent

Documentation

http://pent.readthedocs.io

License

The MIT License; see LICENSE.txt for full license terms

Members

class pent.token.Token(token, do_capture=True)

Encapsulates transforming mini-language patterns tokens into regex.

property capture

Return flag for whether a regex capture group should be created.

do_capture

Whether group capture should be added or not

property is_any

Return flag for whether the token is an “any content” token.

property is_misc

Return flag for whether the token is a misc token.

property is_num

Return flag for whether the token matches a number.

property is_optional_line

Return flag for whether the token flags an optional line.

property is_str

Return flag for whether the token matches a literal string.

property match_quantity

Return match quantity.

None for pent.enums.Content.Any or pent.enums.Content.OptionalLine

needs_group_id

Flag for whether group ID substitution needs to be done

property number

#: Return number format; None if token doesn’t match a number.

property pattern

Return assembled regex pattern from the token, as str.

property sign

#: Return number sign; None if token doesn’t match a number.

property space_after

Return Enum value for handling of post-match whitespace.

token

Mini-language token string to be parsed

Regex patterns for pent.

pent Extracts Numerical Text.

Author

Brian Skinn (bskinn@alum.mit.edu)

File Created

2 Sep 2018

Copyright

(c) Brian Skinn 2018-2019

Source Repository

http://www.github.com/bskinn/pent

Documentation

http://pent.readthedocs.io

License

The MIT License; see LICENSE.txt for full license terms

Members

pent.patterns.number_patterns = {(<Number.Decimal: 'd'>, <Sign.Positive: '+'>): '[+]?(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Decimal: 'd'>, <Sign.Negative: '-'>): '-(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Decimal: 'd'>, <Sign.Any: '.'>): '[+-]?(\\d+\\.\\d*|\\d*\\.\\d+)', (<Number.Float: 'f'>, <Sign.Positive: '+'>): '[+]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.Float: 'f'>, <Sign.Negative: '-'>): '-((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.Float: 'f'>, <Sign.Any: '.'>): '[+-]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+))', (<Number.General: 'g'>, <Sign.Positive: '+'>): '[+]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.General: 'g'>, <Sign.Negative: '-'>): '-((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.General: 'g'>, <Sign.Any: '.'>): '[+-]?((\\d+\\.\\d*|\\d*\\.\\d+)|(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)|\\d+)', (<Number.Integer: 'i'>, <Sign.Positive: '+'>): '[+]?\\d+', (<Number.Integer: 'i'>, <Sign.Negative: '-'>): '-\\d+', (<Number.Integer: 'i'>, <Sign.Any: '.'>): '[+-]?\\d+', (<Number.SciNot: 's'>, <Sign.Positive: '+'>): '[+]?(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)', (<Number.SciNot: 's'>, <Sign.Negative: '-'>): '-(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)', (<Number.SciNot: 's'>, <Sign.Any: '.'>): '[+-]?(\\d+\\.?\\d*[deDE][+-]?\\d+|\\d*\\.\\d+[deDE][+-]?\\d+)'}

dict of pyparsing patterns matching single numbers.

pent.patterns.std_num_punct = 'deDE+.-'

str with the standard numerical punctuation to include as not marking word boundaries. de is included to account for scientific notation.

pent.patterns.std_scinot_markers = 'deDE'

str with the standard allowed scientific notation exponent marker characters

pent.patterns.std_word_chars = 'a-zA-Z0-9deDE+.-'

Standard word marker characters for pent

pent.patterns.std_wordify(p)

Wrap a token in the pent standard word start/end markers.

pent.patterns.std_wordify_close(p)

Append the standard word end markers.

pent.patterns.std_wordify_open(p)

Prepend the standard word start markers.

pent.patterns.wordify_close(p, word_chars)

Append the word end markers.

pent.patterns.wordify_open(p, word_chars)

Prepend the word start markers.

pent.patterns.wordify_pattern(p, word_chars)

Wrap pattern with word start/end markers using arbitrary word chars.

Enums for pent.

pent Extracts Numerical Text.

Author

Brian Skinn (bskinn@alum.mit.edu)

File Created

3 Sep 2018

Copyright

(c) Brian Skinn 2018-2019

Source Repository

http://www.github.com/bskinn/pent

Documentation

http://pent.readthedocs.io

License

The MIT License; see LICENSE.txt for full license terms

Members

class pent.enums.Content

Enumeration for the possible types of content.

Any = '~'

Arbitrary match, including whitespace

Misc = '&'

Arbitrary single-“word” match, no whitespace

Number = '#'

Number

OptionalLine = '?'

Flag to mark pattern line as optional

String = '@'

Literal string

class pent.enums.Number

Enumeration for the different kinds of recognized number primitives.

Decimal = 'd'

Decimal floating-point value; no scientific/exponential notation

Float = 'f'

“Floating-point value with or without an exponent

General = 'g'

“General” value; integer, float, or scientific notation

Integer = 'i'

Integer value; no decimal or scientific/exponential notation

SciNot = 's'

Scientific/exponential notation, where exponent is required

class pent.enums.ParserField

Enumeration for the fields/subsections of a Parser pattern.

Body = 'body'

Body

Head = 'head'

Header

Tail = 'tail'

Tail/footer

class pent.enums.Quantity

Enumeration for the various match quantities.

OneOrMore = '+'

One-or-more match

Single = '.'

Single value match

class pent.enums.Sign

Enumeration for the different kinds of recognized numerical signs.

Any = '.'

Any sign

Negative = '-'

Negative value only (leading ‘-‘ required; includes negative zero)

Positive = '+'

Positive value only (leading ‘+’ optional; includes zero)

class pent.enums.SpaceAfter

Enumeration for the various constraints on space after tokens.

Optional = 'o'

Optional following space

Prohibited = 'x'

Following space prohibited

Required = ''

Default is required following space; no explicit enum value

class pent.enums.TokenField

Enumeration for fields within a mini-language number token.

Capture = 'capture'

Flag to ignore matched content when collecting into regex groups

Number = 'number'

Format of the numerical value (int, float, scinot, decimal, general)

Quantity = 'quantity'

Match quantity of the field (single value, optional, one-or-more, zero-or-more, etc.)

Sign = 'sign'

Sign of acceptable values (any, positive, negative)

SignNumber = 'sign_number'

Combined sign and number, for initial pattern group retrieval

SpaceAfter = 'space_after'

Flag to change the space-after behavior of a token

Str = 'str'

Literal content, for a string match

Type = 'type'

Content type (any, string, number)

Custom exceptions for pent.

pent Extracts Numerical Text.

Author

Brian Skinn (bskinn@alum.mit.edu)

File Created

10 Sep 2018

Copyright

(c) Brian Skinn 2018-2019

Source Repository

http://www.github.com/bskinn/pent

Documentation

http://pent.readthedocs.io

License

The MIT License; see LICENSE.txt for full license terms

Members

exception pent.errors.LineError(line)

Raised during attempts to parse invalid token sequences.

exception pent.errors.PentError

Superclass for all custom pent errors.

exception pent.errors.SectionError(msg='')

Raised from failed attempts to parse a Parser section.

exception pent.errors.ThruListError(msg='')

Raised from failed ThruList indexing attempts.

exception pent.errors.TokenError(token)

Raised during attempts to parse an invalid token.

Custom list object for pent.

pent Extracts Numerical Text.

Author

Brian Skinn (bskinn@alum.mit.edu)

File Created

3 Oct 2018

Copyright

(c) Brian Skinn 2018-2019

Source Repository

http://www.github.com/bskinn/pent

Documentation

http://pent.readthedocs.io

License

The MIT License; see LICENSE.txt for full license terms

Members

class pent.thrulist.ThruList

List that passes through key if len == 1.

Utility functions for pent.

pent Extracts Numerical Text.

Author

Brian Skinn (bskinn@alum.mit.edu)

File Created

14 Oct 2018

Copyright

(c) Brian Skinn 2018-2019

Source Repository

http://www.github.com/bskinn/pent

Documentation

http://pent.readthedocs.io

License

The MIT License; see LICENSE.txt for full license terms

Members

pent.utils.column_stack_2d(data)

Perform column-stacking on a list of 2d data blocks.