Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a wasm browser based playground #41

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
1633608
Add 'std::' in several places as suggested by clang
mingodad Jun 22, 2023
4007497
Fix to detect identifiers referenced in rules but not defined
mingodad Jun 22, 2023
01e753c
Add preprocessor guards to allow build without threads/threadpool.
mingodad Jun 22, 2023
fe8da35
Make possible to accept associativity/precedence syntax like bison/byacc
mingodad Jun 22, 2023
98444cb
Add an error message for empty literal/regex declarations, also fix t…
mingodad Jun 22, 2023
82bfdf5
Check if '%whitespace' directive is present in the grammar and if not…
mingodad Jun 22, 2023
0e05c13
Add code to allow generate an EBNF for railroad diagram generation
mingodad Jun 22, 2023
061bf24
Add method to dump the input from the lexer.
mingodad Jun 22, 2023
7a668bb
Reuse result of already called function.
mingodad Jun 22, 2023
9243595
Add a method to show grammar compilation stats.
mingodad Jun 22, 2023
a64710d
Reorder class member for better memory usage/alignment.
mingodad Jun 22, 2023
642de0b
Rename write output function to prevent clash with C lib ::write
mingodad Jun 22, 2023
a5387f3
Use a typedef and macros to allow easy experimenting with different t…
mingodad Jun 22, 2023
d7cacbc
Simplify GrammaSymbolSet
mingodad Jun 23, 2023
06dfb8e
Only check for '%whitespace' directive if the grammar has no other er…
mingodad Jun 23, 2023
deb8e3e
Fix examples/test that were missing '%whitespace' directive.
mingodad Jun 23, 2023
49af659
Check if we are at the end and then stop
mingodad Jun 23, 2023
4e5acce
First working version of an wasm browser based playground
mingodad Jun 23, 2023
41860c1
Add a naive implementation of "%case_insensitive" directive, right no…
mingodad Jun 23, 2023
0aec408
Add column info to error messages in ErrorPolicy
mingodad Jun 23, 2023
d9e8e15
Add special regex character escape for the naive case insensitive imp…
mingodad Jun 25, 2023
9b72871
Make trivial methods inline.
mingodad Jun 26, 2023
3aa9a4a
Add column info to GrammarSymbol and error messages
mingodad Jun 26, 2023
f59b70c
First implementation for outptut an parse tree. The MissingHeaders te…
mingodad Jun 28, 2023
02178b9
Check if the input is accepted && full before print the parse tree
mingodad Jun 28, 2023
eb7ff4c
Now I've got closer to a good parse tree dump
mingodad Jul 16, 2023
ccdca1a
Missing fixes for a better parse tree output
mingodad Jul 16, 2023
9f907f6
Show an error message when associativity is assigned to a non-terminal.
mingodad Jul 16, 2023
1d09221
Undo a mistaken removing code for a better parse tree output.
mingodad Jul 16, 2023
8aa81d2
Fix generation of empty productions for genEBNF
mingodad Jul 16, 2023
51ab70e
Add YACC generation from LALR grammars.
mingodad Jul 17, 2023
208eda6
Fix to only match fully words, because before it was matching a subst…
mingodad Jul 18, 2023
fc40770
Add a missing case when generating a YACC file
mingodad Jul 18, 2023
08939ba
Add the reducing transition as a parameter to action handlers, this w…
mingodad Jul 18, 2023
ee28902
Update examples to use the extra action handler parameter recently in…
mingodad Jul 18, 2023
f5c1810
Add 2 new grammar options to easy debug, also fix line counting when …
mingodad Jul 19, 2023
6f73f32
Add code to detect user content changes and alert him/her
mingodad Nov 14, 2023
02946cc
Added grammar examples
mingodad Nov 14, 2023
d5231dc
Update code
mingodad Nov 14, 2023
2e1ef74
Create static.yml
mingodad Nov 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/lalr/Parser.ipp
Original file line number Diff line number Diff line change
Expand Up @@ -387,6 +387,7 @@ void Parser<Iterator, UserData, Char, Traits, Allocator>::parse( Iterator start,
const ParserSymbol* symbol = reinterpret_cast<const ParserSymbol*>( lexer_.symbol() );
while ( parse(symbol, lexer_.lexeme(), lexer_.line(), lexer_.column()) )
{
if(lexer_.full()) break;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fixing a bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is some grammars that enter a endless loop because the lexer doesn't advance.
I don't know exactly which ones trigger the bug but you can try it with this script:

#!/bin/sh

basep=playground
checkGrammar() {
	echo Now testing $1 $2
	/usr/bin/time ./grammar_test-clang -g $basep/$1 -i $basep/$2
}

checkGrammar json3.g test.json.txt
checkGrammar lua.g test.lua
checkGrammar carbon-lang.g prelude.carbon
checkGrammar postgresql-16.g test.sql
#checkGrammar cxx-parser.g test.cpp
checkGrammar lsl_ext.g test.lsl
checkGrammar bison.g carbon-lang.y
checkGrammar bison-bug.g carbon-lang.y
checkGrammar dparser.g test.dparser
checkGrammar parse_gen.g test.parse_gen
checkGrammar tameparser.g test.tameparser
checkGrammar javascript.g test.js
checkGrammar javascript-core.g test.js
checkGrammar cparser.g test.c
checkGrammar java11.g test.java
checkGrammar rust.g test.rs
checkGrammar go.g test.go
checkGrammar php-8.2.g test.php
checkGrammar gringo-ng.g test.clingo
checkGrammar ada-adayacc.g test.adb

Build script:

#!/bin/sh

umask 022

myflags="-O2 -g"
#myflags="-O2 -g -m32"
#myflags="-g"

clang-16-env clang++ \
	-std=c++17 $myflags -Wall -Wextra -Wno-unused-function -pedantic \
	-Isrc -DLALR_NO_THREADS \
	src/lalr/ErrorPolicy.cpp \
	src/lalr/Grammar.cpp \
	src/lalr/GrammarCompiler.cpp \
	src/lalr/GrammarGenerator.cpp \
	src/lalr/GrammarParser.cpp \
	src/lalr/GrammarState.cpp \
	src/lalr/GrammarSymbol.cpp \
	src/lalr/GrammarSymbolSet.cpp \
	src/lalr/GrammarTransition.cpp \
	src/lalr/RegexCompiler.cpp \
	src/lalr/RegexGenerator.cpp \
	src/lalr/RegexItem.cpp \
	src/lalr/RegexNode.cpp \
	src/lalr/RegexParser.cpp \
	src/lalr/RegexState.cpp \
	src/lalr/RegexSyntaxTree.cpp \
	src/lalr/RegexToken.cpp \
	src/lalr/lalr_examples/grammar_test.cpp \
	-o grammar_test-clang

grammar_test.cpp:


#include <stdio.h>
#include <stdarg.h>
#include <lalr/GrammarCompiler.hpp>
#include <lalr/Parser.hpp>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>
#include <time.h>

static int errors_ = 0;

typedef unsigned char mychar_t;

static void show_error( const char* format, ... )
{
    ++errors_;
    va_list args;
    va_start( args, format );
    vfprintf( stderr, format, args );
    va_end( args );
}

int read_file(const char *fname, std::vector<mychar_t> &content)
{
        struct stat stat;
        int result = ::stat( fname, &stat );
        if ( result != 0 )
        {
            show_error( "Stat failed on '%s' - result=%d\n", fname, result );
            return EXIT_FAILURE;
        }

        FILE* file = fopen( fname, "rb" );
        if ( !file )
        {
            show_error( "Opening '%s' to read failed - errno=%d\n", fname, errno );
            return EXIT_FAILURE;
        }

        int size = stat.st_size;
        content.resize( size+1 );
        int read = int( fread(&content[0], sizeof(mychar_t), size, file) );
        fclose( file );
        file = nullptr;
        if ( read != size )
        {
            show_error( "Reading grammar from '%s' failed - read=%d\n", fname, int(read) );
            return EXIT_FAILURE;
        }
        content[size] = '\0';
	return EXIT_SUCCESS;
}

static clock_t start_time;
clock_t myShowDiffTime(const char *title)
{
    clock_t now = clock();
    clock_t diff = now - start_time;

    int msec = diff * 1000 / CLOCKS_PER_SEC;
    printf("%s: Time taken %d seconds %d milliseconds\n", title, msec/1000, msec%1000);
    start_time = now;
    return now;
}

struct C_MultLineCommentLexer
{
	static lalr::PositionIterator<const mychar_t*> string_lexer( const lalr::PositionIterator<const mychar_t*>& begin,
							const lalr::PositionIterator<const mychar_t*>& end,
							std::basic_string<mychar_t>* lexeme,
							const void** /*symbol*/ )
	{
		LALR_ASSERT( lexeme );

		lexeme->clear();
                //printf("C_MultLineCommentLexer : %s\n", lexeme->c_str());

		bool done = false;
		lalr::PositionIterator<const mychar_t*> i = begin;
		while ( i != end && !done)
		{
			switch( *i )
			{
				case '*':
					++i;
					if(i != end && *i == '/') done = true;
					continue;
					break;
			}
			++i;
		}
		if ( i != end )
		{
			LALR_ASSERT( *i == '/' );
			++i;
		}
		return i;
	}
};

struct AstUserDataDbg {
    int index;
    int stack_index;
    static int next_index;;
    static int total;
    AstUserDataDbg():index(total++), stack_index(next_index++) {};
};
int AstUserDataDbg::next_index = 0;
int AstUserDataDbg::total = 0;


static bool astMakerDbg( AstUserDataDbg& result, const AstUserDataDbg* start, const lalr::ParserNode<mychar_t>* nodes, size_t length )
{
//    //printf("astMaker: %s\n", nodes[0].lexeme().c_str());
//    const char *lexstr = (length > 0 ? (const char *)nodes[0].lexeme().c_str() : "::lnull");
//    const char *idstr = (length > 0 ? nodes[0].symbol()->identifier : "::inull");
//    int line = (length > 0 ? nodes[0].line() : 0);
//    int column = (length > 0 ? nodes[0].column() : 0);
//    //const char *stateLabel = (length > 0 ? nodes[0].state()->label : "::inull");
//    printf("astMaker: %p\t%zd:%d:%d\t%p\t%zd\t->\t%s : %s :%d:%d\n", start, length,
//                length ? start->index : -1, length ? start->stack_index : -1,
//                nodes, length, idstr, lexstr, line, column);
    printf("----\n");
    for(size_t i=0; i< length; ++i)
        printf("%zd:%d\t%p\t%d:%d\t%p <:> %s <:> %s <:> %s <:> %d:%d\n", i, nodes[i].symbol()->type,
                start+i, start[i].index, start[i].stack_index, nodes+i,
                nodes[i].symbol()->identifier, nodes[i].symbol()->lexeme,
                nodes[i].lexeme().c_str(), nodes[i].line(), nodes[i].column());
    return true;
}

struct ParseTreeUserData {
    std::vector<ParseTreeUserData> children;
    const lalr::ParserSymbol *symbol;
    std::basic_string<mychar_t> lexeme; ///< The lexeme at this node (empty if this node's symbol is non-terminal).
    ParseTreeUserData():children(0),symbol(nullptr) {};
};


static bool parsetreeMaker( ParseTreeUserData& result, const ParseTreeUserData* start, const lalr::ParserNode<mychar_t>* nodes, size_t length )
{
    if(length == 0) return false;
    result.symbol = nodes[length-1].state()->transitions->reduced_symbol;
    for(size_t i_node = 0; i_node < length; ++i_node)
    {
        const lalr::ParserNode<mychar_t>& the_node = nodes[i_node];
        switch(the_node.symbol()->type)
        {
            case lalr::SymbolType::SYMBOL_TERMINAL:
            {
                ParseTreeUserData& udt = result.children.emplace_back();
                udt.symbol = the_node.symbol();
                udt.lexeme = the_node.lexeme();
                //printf("TERMINAL: %s : %s\n", udt.symbol->identifier, udt.lexeme.c_str());
            }
            break;
            case lalr::SymbolType::SYMBOL_NON_TERMINAL:
            {
                if(the_node.symbol() == result.symbol)
                {
                    const ParseTreeUserData& startx = start[i_node];
                    for (std::vector<ParseTreeUserData>::const_iterator child = startx.children.begin(); child != startx.children.end(); ++child)
                    {
                        result.children.push_back( std::move(*child) );
                    }
                }
                else
                {
                    ParseTreeUserData& udt = result.children.emplace_back();
                    udt.symbol = the_node.symbol();
                    if(udt.symbol == start[i_node].symbol)
                    {
                        udt.children = start[i_node].children;
                    }
                    else
                        udt.children.push_back(std::move(start[i_node]));                        
                }
                //printf("NON_TERMINAL: %s\n", result.symbol->identifier);
            }
            break;
            default:
                //LALR_ASSERT( ?? );
                printf("Unexpected symbol %p\n", the_node.symbol());
        }
    }
    return true;
}

static void indent( int level )
{
    for ( int i = 0; i < level; ++i )
    {
        printf( " |" );
    }
}

static void print_parsetree( const ParseTreeUserData& ast, int level )
{
    if(ast.symbol)
    {
        indent( level );
        switch(ast.symbol->type)
        {
            case lalr::SymbolType::SYMBOL_TERMINAL:
                if(ast.lexeme.size())
                {
                    //indent( level -1);
                    printf("%s -> %s\n", ast.symbol->identifier, ast.lexeme.c_str());
                }
                break;
            case lalr::SymbolType::SYMBOL_NON_TERMINAL:
                //indent( level );
                printf("%s\n", ast.symbol->lexeme);
                break;
        }
    }

    for (std::vector<ParseTreeUserData>::const_iterator child = ast.children.begin(); child != ast.children.end(); ++child)
    {
        print_parsetree( *child, ast.symbol ? (level + 1) : level );
    }
}

#include <locale.h>

int main(int argc, char *argv[])
{
	const char *grammar_fn = nullptr;
	const char *input_fn = nullptr;
        bool dumpLexer = false;
        start_time = clock();

        setlocale(LC_NUMERIC, "");

	std::vector<char> grammar_txt;
	std::vector<mychar_t> input_txt;

	if ( argc < 2 )
	{
		printf( "%s -g|--grammar grammar_fname -i|--input input_fname -d|--dumpLex\n", argv[0] );
		printf( "\n" );
		return EXIT_FAILURE;
	}

	int argi = 1;
	while ( argi < argc )
	{
		if ( strcmp(argv[argi], "-g") == 0 || strcmp(argv[argi], "--grammar") == 0 )
		{
		    grammar_fn = argv[argi + 1];
		    argi += 2;
		}
		else if ( strcmp(argv[argi], "-i") == 0 || strcmp(argv[argi], "--input") == 0 )
		{
		    input_fn = argv[argi + 1];
		    argi += 2;
		}
		else if ( strcmp(argv[argi], "-d") == 0 || strcmp(argv[argi], "--dumpLex") == 0 )
		{
		    dumpLexer = true;
		    argi += 1;
		}
	}

	if(grammar_fn != nullptr)
	{
		int rc = read_file(grammar_fn, (std::vector<mychar_t>&)grammar_txt);
		if(rc != EXIT_SUCCESS) return rc;
                size_t grammar_txt_size = grammar_txt.size()-1; //-1 to account for the '\0' terminator
                myShowDiffTime("read grammar");
		printf("Grammar size = %d\n", (int)grammar_txt_size);
		lalr::GrammarCompiler compiler;
		lalr::ErrorPolicy error_policy;
		int errors = compiler.compile( &grammar_txt[0], &grammar_txt[0] + grammar_txt_size, &error_policy );
                myShowDiffTime("compile grammar");
		if(errors != 0)
		{
			printf("Error count = %d\n", errors);
			return EXIT_FAILURE;
		}
                compiler.showStats();
		if(input_fn != nullptr)
		{
			rc = read_file(input_fn, input_txt);
			if(rc != EXIT_SUCCESS) return rc;
                        size_t input_txt_size = input_txt.size()-1; //-1 to account for the '\0' terminator
                        myShowDiffTime("read input");
			printf("Input size = %d\n", (int)input_txt_size);
			lalr::ErrorPolicy error_policy_input;
                        lalr::Parser<const mychar_t*, ParseTreeUserData> parser( compiler.parser_state_machine(), &error_policy_input );
                        parser.set_default_action_handler(parsetreeMaker);
                        //lalr::Parser<const mychar_t*, AstUserDataDbg> parser( compiler.parser_state_machine(), &error_policy_input );
                        //parser.set_default_action_handler(astMakerDbg);
			//lalr::Parser<const mychar_t*, int> parser( compiler.parser_state_machine(), &error_policy_input );
                        parser.lexer_action_handlers()
                            ( "C_MultilineComment", &C_MultLineCommentLexer::string_lexer )
                            ;
                        if(dumpLexer) parser.dumpLex( &input_txt[0], &input_txt[0] + input_txt_size );
			else parser.parse( &input_txt[0], &input_txt[0] + input_txt_size );
                        myShowDiffTime("parse input");
			printf( "accepted = %d, full = %d\n", parser.accepted(),  parser.full());
                        if(parser.accepted() &&  parser.full())
                        {
                            print_parsetree( parser.user_data(), 0 );
                        }
		}
	}
	return EXIT_SUCCESS;
}

lexer_.advance();
symbol = reinterpret_cast<const ParserSymbol*>( lexer_.symbol() );
}
Expand Down