Register forum user name Search FAQ

Lua LPEG library

LPeg is a pattern-matching library for Lua, based on Parsing Expression Grammars (PEGs).

For a more formal treatment of LPeg, as well as some discussion about its implementation, see:

http://www.inf.puc-rio.br/~roberto/docs/peg.pdf (A Text Pattern-Matching Tool based on Parsing Expression Grammars.)

Following the Snobol tradition, LPeg defines patterns as first-class objects. That is, patterns are regular Lua values (represented by userdata). The library offers several functions to create and compose patterns. With the use of metamethods, several of these functions are provided as infix or prefix operators. On the one hand, the result is usually much more verbose than the typical encoding of patterns using the so called regular expressions (which typically are not regular expressions in the formal sense). On the other hand, first-class patterns allow much better documentation (as it is easy to comment the code, to use auxiliary variables to break complex definitions, etc.) and are extensible, as we can define new functions to create and compose patterns.

For a quick glance of the library, the following table summarizes its basic operations for creating patterns:




Operator Description


  • lpeg.P(string) : Matches string literally
  • lpeg.P(number) : Matches exactly number characters
  • lpeg.S(string) : Matches any character in string (Set)
  • lpeg.R("xy") : Matches any character between x and y (Range)
  • patt^n : Matches at least n repetitions of patt
  • patt^-n : Matches at most n repetitions of patt
  • patt1 * patt2 : Matches patt1 followed by patt2
  • patt1 + patt2 : Matches patt1 or patt2 (ordered choice)
  • patt1 - patt2 : Matches patt1 if patt2 does not match
  • -patt : Equivalent to ("" - patt)
  • #patt : Matches patt but consumes no input





As a very simple example, lpeg.R("09")^1 creates a pattern that matches a non-empty sequence of digits. As a not so simple example, -lpeg.P(1) (which can be written as lpeg.P(-1) or simply -1 for operations expecting a pattern) matches an empty string only if it cannot match a single character; so, it succeeds only at the subject's end.

In addition to the functions documented here, some arithmetic operators have special effects on patterns:




#patt

Returns a pattern that matches only if the input string matches patt, but without consuming any input, independently of success or failure. (This pattern is equivalent to &patt in the original PEG notation.)

When it succeeds, #patt produces all captures produced by patt.


-patt

Returns a pattern that matches only if the input string does not match patt. It does not consume any input, independently of success or failure. (This pattern is equivalent to !patt in the original PEG notation.)

As an example, the pattern -lpeg.P(1) matches only the end of string.

This pattern never produces any captures, because either patt fails or -patt fails. (A failing pattern never produces captures.)


patt1 + patt2

Returns a pattern equivalent to an ordered choice of patt1 and patt2. (This is denoted by patt1 / patt2 in the original PEG notation, not to be confused with the / operation in LPeg.) It matches either patt1 or patt2, with no backtracking once one of them succeeds. The identity element for this operation is the pattern lpeg.P(false), which always fails.

If both patt1 and patt2 are character sets, this operation is equivalent to set union.


patt1 - patt2

Returns a pattern equivalent to !patt2 patt1. This pattern asserts that the input does not match patt2 and then matches patt1.

If both patt1 and patt2 are character sets, this operation is equivalent to set difference. Note that -patt is equivalent to "" - patt (or 0 - patt). If patt is a character set, 1 - patt is its complement.


patt1 * patt2

Returns a pattern that matches patt1 and then matches patt2, starting where patt1 finished. The identity element for this operation is the pattern lpeg.P(true), which always succeeds.

(LPeg uses the * operator [instead of the more obvious ..] both because it has the right priority and because in formal languages it is common to use a dot for denoting concatenation.)


patt^n

If n is nonnegative, this pattern is equivalent to pattn patt*. It matches at least n occurrences of patt.

Otherwise, when n is negative, this pattern is equivalent to (patt?)-n. That is, it matches at most -n occurrences of patt.

In particular, patt^0 is equivalent to patt*, patt^1 is equivalent to patt+, and patt^-1 is equivalent to patt? in the original PEG notation.

In all cases, the resulting pattern is greedy with no backtracking (also called a possessive repetition). That is, it matches only the longest possible sequence of matches for patt.


patt / string

Creates a string capture. It creates a capture string based on string. The captured value is a copy of string, except that the character % works as an escape character: any sequence in string of the form %n, with n between 1 and 9, stands for the match of the n-th capture in patt. The sequence %0 stands for the whole match. The sequence %% stands for a single %.


patt / table

Creates a query capture. It indexes the given table using as key the first value captured by patt, or the whole match if patt produced no value. The value at that index is the final value of the capture. If the table does not have that key, there is no captured value.


patt / function

Creates a function capture. It calls the given function passing all captures made by patt as arguments, or the whole match if patt made no capture. The values returned by the function are the final values of the capture. In particular, if function returns no value, there is no captured value.





Grammars

With the use of Lua variables, it is possible to define patterns incrementally, with each new pattern using previously defined ones. However, this technique does not allow the definition of recursive patterns. For recursive patterns, we need real grammars.

LPeg represents grammars with tables, where each entry is a rule.

The call lpeg.V(v) creates a pattern that represents the nonterminal (or variable) with index v in a grammar. Because the grammar still does not exist when this function is evaluated, the result is an open reference to the respective rule.

A table is fixed when it is converted to a pattern (either by calling lpeg.P or by using it wherein a pattern is expected). Then every open reference created by lpeg.V(v) is corrected to refer to the rule indexed by v in the table.

When a table is fixed, the result is a pattern that matches its initial rule. The entry with index 1 in the table defines its initial rule. If that entry is a string, it is assumed to be the name of the initial rule. Otherwise, LPeg assumes that the entry 1 itself is the initial rule.

As an example, the following grammar matches strings of a's and b's that have the same number of a's and b's:


equalcount = lpeg.P{
"S"; -- initial rule name
S = "a" * lpeg.V"B" + "b" * lpeg.V"A" + "",
A = "a" * lpeg.V"S" + "b" * lpeg.V"A" * lpeg.V"A",
B = "b" * lpeg.V"S" + "a" * lpeg.V"B" * lpeg.V"B",
} * -1

Lua functions

lpeg.B - Matches patt n characters behind the current position, consuming no input
lpeg.C - Creates a simple capture
lpeg.Carg - Creates an argument capture
lpeg.Cb - Creates a back capture
lpeg.Cc - Creates a constant capture
lpeg.Cf - Creates a fold capture
lpeg.Cg - Creates a group capture
lpeg.Cmt - Creates a match-time capture
lpeg.Cp - Creates a position capture
lpeg.Cs - Creates a substitution capture
lpeg.Ct - Creates a table capture
lpeg.locale - Returns a table of patterns matching the current locale
lpeg.match - Matches a pattern against a string
lpeg.P - Converts a value into a pattern
lpeg.print - Outputs debugging information to stdout
lpeg.R - Returns a pattern that matches a range of characters
lpeg.S - Returns a pattern that matches a set of characters
lpeg.setmaxstack - Sets the maximum size for the backtrack stack
lpeg.type - Tests if a value is a pattern
lpeg.V - Creates a non-terminal variable for a grammar
lpeg.version - Returns the LPeg version

Topics

Lua bc (big number) functions
Lua bit manipulation functions
Lua package functions
Lua PCRE regular expression functions
Lua script extensions
Lua string functions
Lua syntax
Lua table functions
Lua utilities
Regular Expressions
Scripting
Scripting callbacks - plugins

(Help topic: general=lua_lpeg)

Documentation contents page


Search ...

Enter a search string to find matching documentation.

Search for:   

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.