Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you explain how to use the lexer stand-alone? #25

Open
BenHanson opened this issue Dec 27, 2021 · 5 comments
Open

Can you explain how to use the lexer stand-alone? #25

BenHanson opened this issue Dec 27, 2021 · 5 comments

Comments

@BenHanson
Copy link

i.e. I want to use it much the way I use http://benhanson.net/lexertl.html (ignore all Unicode etc for now, see the Examples)

@peter-winter
Copy link
Owner

There is no official standalone lexer feature. There is a way, but I'm not convinced it should be official in its current state.
If you want you can look under the hood and see how regex parser implements it's lexer.

@peter-winter
Copy link
Owner

Or did I just completely misunderstood you and you want to just use the regex lexer without the actual parser?
In this case you are out of luck. You could theoretically create a huge regex with all lexical tokens (just separate them with | ) like this:
(token_1_regex)|(token_2_regex)
but unfortunately there is no support for grouping captures.
You won't know which sub-regex was actually matched.
I'm open implementing this feature, this would take some time though.

@BenHanson
Copy link
Author

BenHanson commented Dec 27, 2021

Yes, I was hoping to be able to use the lexer without the parser.
If you have the ability for your regexes to have numeric ids you just include that in the end state for each regex. You resolve ambiguity by only setting the id in an end state if one has not already been set.
(As you hinted at your lexer generator should or all the regexes together)

@peter-winter
Copy link
Owner

I guess I could do a standalone lexer feature, it shouldn't be too hard to expose some interface for that. Like you said I already am creating a DFA from all the terminal symbols.
It would be a lexer class that resemble a parser interface but simpler, just the terms(...) call is enough.
And then a 'match' method accepting same kinds of arguments (buffer, options, error stream) and returning an index and a string view of a matched text.

@BenHanson
Copy link
Author

There are a couple of conventions:

  • If there is no match for the lexer at the current position, you usually return a single character (I return an id of ~0 in this case too). This allows you to continue lexing.
  • It is customary to return 0 for End of Input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants