Can you explain how to use the lexer stand-alone? #25

BenHanson · 2021-12-27T17:05:53Z

i.e. I want to use it much the way I use http://benhanson.net/lexertl.html (ignore all Unicode etc for now, see the Examples)

peter-winter · 2021-12-27T17:50:20Z

There is no official standalone lexer feature. There is a way, but I'm not convinced it should be official in its current state.
If you want you can look under the hood and see how regex parser implements it's lexer.

peter-winter · 2021-12-27T18:32:11Z

Or did I just completely misunderstood you and you want to just use the regex lexer without the actual parser?
In this case you are out of luck. You could theoretically create a huge regex with all lexical tokens (just separate them with | ) like this:
(token_1_regex)|(token_2_regex)
but unfortunately there is no support for grouping captures.
You won't know which sub-regex was actually matched.
I'm open implementing this feature, this would take some time though.

BenHanson · 2021-12-27T21:32:54Z

Yes, I was hoping to be able to use the lexer without the parser.
If you have the ability for your regexes to have numeric ids you just include that in the end state for each regex. You resolve ambiguity by only setting the id in an end state if one has not already been set.
(As you hinted at your lexer generator should or all the regexes together)

peter-winter · 2021-12-28T08:54:33Z

I guess I could do a standalone lexer feature, it shouldn't be too hard to expose some interface for that. Like you said I already am creating a DFA from all the terminal symbols.
It would be a lexer class that resemble a parser interface but simpler, just the terms(...) call is enough.
And then a 'match' method accepting same kinds of arguments (buffer, options, error stream) and returning an index and a string view of a matched text.

BenHanson · 2021-12-28T16:49:48Z

There are a couple of conventions:

If there is no match for the lexer at the current position, you usually return a single character (I return an id of ~0 in this case too). This allows you to continue lexing.
It is customary to return 0 for End of Input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can you explain how to use the lexer stand-alone? #25

Can you explain how to use the lexer stand-alone? #25

BenHanson commented Dec 27, 2021

peter-winter commented Dec 27, 2021

peter-winter commented Dec 27, 2021

BenHanson commented Dec 27, 2021 •

edited

Loading

peter-winter commented Dec 28, 2021

BenHanson commented Dec 28, 2021

Can you explain how to use the lexer stand-alone? #25

Can you explain how to use the lexer stand-alone? #25

Comments

BenHanson commented Dec 27, 2021

peter-winter commented Dec 27, 2021

peter-winter commented Dec 27, 2021

BenHanson commented Dec 27, 2021 • edited Loading

peter-winter commented Dec 28, 2021

BenHanson commented Dec 28, 2021

BenHanson commented Dec 27, 2021 •

edited

Loading