Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are
spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the
password reset link.
Due to spam on this forum, all posts now need moderator approval.
Entire forum
➜ Programming
➜ General
➜ lpeg functions
It is now over 60 days since the last post. This thread is closed.
Refresh page
Pages: 1
2
| Posted by
| Albert Chan
(55 posts) Bio
|
| Date
| Reply #15 on Wed 31 Jan 2018 04:27 AM (UTC) Amended on Wed 31 Jan 2018 04:58 AM (UTC) by Albert Chan
|
| Message
| Reading the source, it does not do any kind of analysis you did.
When it see the -, len(patt) = 0, thus generate behind 0
(which will optimized away in generated pcode)
In other words, it never expected a minus in the first place.
To understand it better, it is better to breakup the logic
let behind(n) = move back n bytes
let len(patt) = bytes the pattern need for a match
B(patt) = behind(len(patt)) * P(patt)
so B is an assertion, without consuming any charcters.
With above formula, B(-patt) = P(-patt), which is what we observed
But it is not based on any logical reasoning.
Logically, the sign should be pulled out, B(-patt) = -B(patt)
(preceded by not this pattern == not preceded by this pattern)
P.S.
I just improved on my lpeg.B patch, now lpeg.B(-patt) = behind(len(patt)).
In other words, it move back without matching. | | Top |
|
| Posted by
| Albert Chan
(55 posts) Bio
|
| Date
| Reply #16 on Wed 31 Jan 2018 01:59 PM (UTC) Amended on Wed 31 Jan 2018 09:00 PM (UTC) by Albert Chan
|
| Message
| parsing B(-patt) is confusing
Example: patt = -P('and') * P(1), What is B(patt) ?
I thought it assert not preceded by 'and' and not end-of-line ... I were wrong
lpeg currently set len of pattern assertion = 0
-> len(patt) = 0 + 1 = 1
-> it added a behind 1
the problem is, this might not be what the user expected.
since B(patt) = behind(len) * P(patt), and len == 1, P(1) always optimized away
So, write it without assertions inside patt, it is easier to read: -(B'a' * 'nd') | | Top |
|
| Posted by
| Nick Gammon
Australia (23,166 posts) Bio
Forum Administrator |
| Date
| Reply #17 on Wed 31 Jan 2018 11:15 PM (UTC) |
| Message
| My impression is that it simply moves the current position back a fixed number of characters and then matches on what you provide. I can't off-hand think of good uses for it, but perhaps if you were parsing a source file, and were looking for quotes, you would want to accept:
But not:
So you might look for something like:
That is a quote, provided it does not have a backslash before it. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Nick Gammon
Australia (23,166 posts) Bio
Forum Administrator |
| Date
| Reply #18 on Wed 31 Jan 2018 11:16 PM (UTC) |
| Message
| | Of course that is wrong, that would give only quotes with a backslash before them. Hmmm. |
- Nick Gammon
www.gammon.com.au, www.mushclient.com | | Top |
|
| Posted by
| Albert Chan
(55 posts) Bio
|
| Date
| Reply #19 on Thu 01 Feb 2018 (UTC) Amended on Thu 01 Feb 2018 12:43 PM (UTC) by Albert Chan
|
| Message
| Beside validation like above, it may be used to optimize search
note: i added '@' prefix in re.lua to call lpeg.B
t = 'everytime everywhere everything'
pat1 = re.compile "g <- 'everything' / . [^e]* g"
pat2 = re.compile "g <- 'g' @'everything' / . [^g]* g"
since 'g' is rarer than 'e', pat2 is more efficient than pat1 | | Top |
|
| Posted by
| Albert Chan
(55 posts) Bio
|
| Date
| Reply #20 on Fri 02 Feb 2018 12:15 AM (UTC) Amended on Fri 02 Feb 2018 03:32 AM (UTC) by Albert Chan
|
| Message
| just learned lpeg can do search from the end of string
-- Example I got from lpeg tutorial
target = "You see 666 dogs and a cat"
= lpeg.match(P'cat', target, -3)
25 | | Top |
|
| Posted by
| Nick Gammon
Australia (23,166 posts) Bio
Forum Administrator |
| Date
| Reply #21 on Fri 02 Feb 2018 03:03 AM (UTC) |
| Message
| |
| Posted by
| Albert Chan
(55 posts) Bio
|
| Date
| Reply #22 on Fri 02 Feb 2018 06:07 PM (UTC) Amended on Sun 04 Feb 2018 01:26 AM (UTC) by Albert Chan
|
| Message
| this example is from lpeg re reference page, how does it work ?
Both parenthesis are needed ... why ?
rev = re.compile[[ R <- (!.) -> '' / ({.} R) -> '%2%1' ]]
print(rev:match"0123456789") --> 9876543210
Above causes backtrack stack overflow for long string (which means pattern is not tail recursive)
Below is a tail recursive pattern using my B patched version, {~ ~} simply concatenate all captures
rev = re.compile ".* {~ (@{.} %b)* ~}"
| | Top |
|
| Posted by
| Albert Chan
(55 posts) Bio
|
| Date
| Reply #23 on Sat 03 Feb 2018 05:30 PM (UTC) Amended on Sat 03 Feb 2018 09:36 PM (UTC) by Albert Chan
|
| Message
| Finally figured out why (!.) cannot be written as plain !.
I had expected the lpeg precedence unary minus above divide:
(-P(1)) / func == -P(1) / func
But re.lua did not put ! precedence above ->
re pattern "!. -> func" == "! (. -> func)" == "!." (likely a bug, func optimized away)
= lpeg.pcode( re.compile("!. -> func", {func=func}) )
[1 = function ]
00 testany -> 3
02 fail
03 end
| | Top |
|
| Posted by
| Nick Gammon
Australia (23,166 posts) Bio
Forum Administrator |
| Date
| Reply #24 on Sat 03 Feb 2018 08:50 PM (UTC) Amended on Sat 03 Feb 2018 09:51 PM (UTC) by Nick Gammon
|
| Message
| |
| Posted by
| Albert Chan
(55 posts) Bio
|
| Date
| Reply #25 on Sat 03 Feb 2018 09:11 PM (UTC) Amended on Sat 03 Feb 2018 09:17 PM (UTC) by Albert Chan
|
| Message
| I do not understand what " (!.) -> '' " mean, so I tried without " -> '' "
=re.match("123456789", " R <- (!.) / ({.} R) -> '%2%1' ")
invalid capture index (2)
-- what if '' is 1 ?
=re.match("123456789", " R <- (!.) -> 1 / ({.} R) -> '%2%1' ")
987654321 | | Top |
|
| Posted by
| Albert Chan
(55 posts) Bio
|
| Date
| Reply #26 on Mon 19 Feb 2018 07:54 PM (UTC) Amended on Tue 20 Feb 2018 02:05 AM (UTC) by Albert Chan
|
| Message
| I mentioned re.lua mult(p, n) in reply 6.
Turns out, this function is unnecessary, since lpeg have mult functionality built-in (n >= 0)
just a few lines patch to lptree.c lp_star() simplify everything.
https://github.com/achan001/LPeg-anywhere/blob/master/lptree.c
Removed mult function !
--> new function re.pow, just to supply the third argument true, for true power
--> re.pow(p,n) = p * p ... * p (n times)
function re.pow(p, n) return mt.__pow(p, n, true) end
| | Top |
|
The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).
To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.
90,245 views.
This is page 2, subject is 2 pages long:
1
2
It is now over 60 days since the last post. This thread is closed.
Refresh page
top