Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ Programming ➜ General ➜ lpeg functions

lpeg functions

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1  2 

Posted by Albert Chan   (55 posts)  Bio
Date Reply #15 on Wed 31 Jan 2018 04:27 AM (UTC)

Amended on Wed 31 Jan 2018 04:58 AM (UTC) by Albert Chan

Message
Reading the source, it does not do any kind of analysis you did.
When it see the -, len(patt) = 0, thus generate behind 0
(which will optimized away in generated pcode)

In other words, it never expected a minus in the first place.

To understand it better, it is better to breakup the logic

let behind(n) = move back n bytes
let len(patt) = bytes the pattern need for a match

B(patt) = behind(len(patt)) * P(patt)

so B is an assertion, without consuming any charcters.

With above formula, B(-patt) = P(-patt), which is what we observed
But it is not based on any logical reasoning.

Logically, the sign should be pulled out, B(-patt) = -B(patt)
(preceded by not this pattern == not preceded by this pattern)

P.S.
I just improved on my lpeg.B patch, now lpeg.B(-patt) = behind(len(patt)).
In other words, it move back without matching.
Top

Posted by Albert Chan   (55 posts)  Bio
Date Reply #16 on Wed 31 Jan 2018 01:59 PM (UTC)

Amended on Wed 31 Jan 2018 09:00 PM (UTC) by Albert Chan

Message
parsing B(-patt) is confusing

Example: patt = -P('and') * P(1), What is B(patt) ?

I thought it assert not preceded by 'and' and not end-of-line ... I were wrong

lpeg currently set len of pattern assertion = 0
-> len(patt) = 0 + 1 = 1
-> it added a behind 1

the problem is, this might not be what the user expected.
since B(patt) = behind(len) * P(patt), and len == 1, P(1) always optimized away

So, write it without assertions inside patt, it is easier to read: -(B'a' * 'nd')
Top

Posted by Nick Gammon   Australia  (23,166 posts)  Bio   Forum Administrator
Date Reply #17 on Wed 31 Jan 2018 11:15 PM (UTC)
Message
My impression is that it simply moves the current position back a fixed number of characters and then matches on what you provide. I can't off-hand think of good uses for it, but perhaps if you were parsing a source file, and were looking for quotes, you would want to accept:


"


But not:


\"


So you might look for something like:


P'"' * B('\')


That is a quote, provided it does not have a backslash before it.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,166 posts)  Bio   Forum Administrator
Date Reply #18 on Wed 31 Jan 2018 11:16 PM (UTC)
Message
Of course that is wrong, that would give only quotes with a backslash before them. Hmmm.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Albert Chan   (55 posts)  Bio
Date Reply #19 on Thu 01 Feb 2018 (UTC)

Amended on Thu 01 Feb 2018 12:43 PM (UTC) by Albert Chan

Message
Beside validation like above, it may be used to optimize search

note: i added '@' prefix in re.lua to call lpeg.B

t = 'everytime everywhere everything'

pat1 = re.compile "g <- 'everything' / . [^e]* g"
pat2 = re.compile "g <- 'g' @'everything' / . [^g]* g"

since 'g' is rarer than 'e', pat2 is more efficient than pat1
Top

Posted by Albert Chan   (55 posts)  Bio
Date Reply #20 on Fri 02 Feb 2018 12:15 AM (UTC)

Amended on Fri 02 Feb 2018 03:32 AM (UTC) by Albert Chan

Message
just learned lpeg can do search from the end of string

-- Example I got from lpeg tutorial

target = "You see 666 dogs and a cat"

= lpeg.match(P'cat', target, -3)
25
Top

Posted by Nick Gammon   Australia  (23,166 posts)  Bio   Forum Administrator
Date Reply #21 on Fri 02 Feb 2018 03:03 AM (UTC)
Message
That is mentioned here: http://gammon.com.au/lpeg

Search for "anywhere".

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Albert Chan   (55 posts)  Bio
Date Reply #22 on Fri 02 Feb 2018 06:07 PM (UTC)

Amended on Sun 04 Feb 2018 01:26 AM (UTC) by Albert Chan

Message
this example is from lpeg re reference page, how does it work ?

Both parenthesis are needed ... why ?

rev = re.compile[[ R <- (!.) -> '' / ({.} R) -> '%2%1' ]]
print(rev:match"0123456789")   --> 9876543210

Above causes backtrack stack overflow for long string (which means pattern is not tail recursive)

Below is a tail recursive pattern using my B patched version, {~ ~} simply concatenate all captures

rev = re.compile ".* {~ (@{.} %b)* ~}"
Top

Posted by Albert Chan   (55 posts)  Bio
Date Reply #23 on Sat 03 Feb 2018 05:30 PM (UTC)

Amended on Sat 03 Feb 2018 09:36 PM (UTC) by Albert Chan

Message
Finally figured out why (!.) cannot be written as plain !.

I had expected the lpeg precedence unary minus above divide:

(-P(1)) / func == -P(1) / func

But re.lua did not put ! precedence above ->

re pattern "!. -> func" == "! (. -> func)" == "!." (likely a bug, func optimized away)

= lpeg.pcode( re.compile("!. -> func", {func=func}) )
[1 = function ]
00 testany -> 3
02 fail
03 end
Top

Posted by Nick Gammon   Australia  (23,166 posts)  Bio   Forum Administrator
Date Reply #24 on Sat 03 Feb 2018 08:50 PM (UTC)

Amended on Sat 03 Feb 2018 09:51 PM (UTC) by Nick Gammon

Message
The documentation at http://www.inf.puc-rio.br/~roberto/lpeg/re.html shows that


->



is higher in precedence than


!

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Albert Chan   (55 posts)  Bio
Date Reply #25 on Sat 03 Feb 2018 09:11 PM (UTC)

Amended on Sat 03 Feb 2018 09:17 PM (UTC) by Albert Chan

Message
I do not understand what " (!.) -> '' " mean, so I tried without " -> '' "

=re.match("123456789", " R <- (!.) / ({.} R) -> '%2%1' ")
invalid capture index (2)

-- what if '' is 1 ?
=re.match("123456789", " R <- (!.) -> 1 / ({.} R) -> '%2%1' ")
987654321
Top

Posted by Albert Chan   (55 posts)  Bio
Date Reply #26 on Mon 19 Feb 2018 07:54 PM (UTC)

Amended on Tue 20 Feb 2018 02:05 AM (UTC) by Albert Chan

Message
I mentioned re.lua mult(p, n) in reply 6.
Turns out, this function is unnecessary, since lpeg have mult functionality built-in (n >= 0)
p^n = mult(p,n) * p^0

just a few lines patch to lptree.c lp_star() simplify everything.
https://github.com/achan001/LPeg-anywhere/blob/master/lptree.c

Removed mult function !
--> new function re.pow, just to supply the third argument true, for true power
--> re.pow(p,n) = p * p ... * p (n times)
function re.pow(p, n) return mt.__pow(p, n, true) end
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


90,245 views.

This is page 2, subject is 2 pages long:  [Previous page]  1  2 

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.