Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online playground ? #11

Closed
mingodad opened this issue Jan 26, 2023 · 78 comments
Closed

Online playground ? #11

mingodad opened this issue Jan 26, 2023 · 78 comments
Assignees

Comments

@mingodad
Copy link
Contributor

I'm looking at an online playground like:

And testing it before trying implementing it with a SQL grammar and didn't found any way to define case insensitive tokens/literals.

@mingodad
Copy link
Contributor Author

Shown bellow is a start point of translating https://github.com/facebookincubator/CG-SQL/blob/main/sources/cql.y and https://github.com/facebookincubator/CG-SQL/blob/main/sources/cql.l to the format understood by this project.

I've noticed that this project do not seem to accept a dummy token name to be used only for precedence purposes.
This grammar seem to be a good one to profile this project, because it's taking a bit of time to process it.

cql {

%left UNION_ALL UNION INTERSECT EXCEPT ;
%right ASSIGN ;
%left OR ;
%left AND ;
%left NOT ;
%left BETWEEN NOT_BETWEEN NE NE_ '=' EQEQ LIKE NOT_LIKE GLOB NOT_GLOB MATCH NOT_MATCH REGEXP NOT_REGEXP IN NOT_IN IS_NOT IS IS_TRUE IS_FALSE IS_NOT_TRUE IS_NOT_FALSE ;
%right ISNULL NOTNULL  ;
%left '<' '>' GE LE ;
%left LS RS '&' '|' ;
%left '+' '-' ;
%left '*' '/' '%' ;
%left CONCAT ;
%left COLLATE ;
%right UMINUS '~' ;

%whitespace "[ \t\r\n]*";

program :
	opt_stmt_list
	;

opt_stmt_list :
	/* empty */
	| stmt_list
	;

stmt_list :
	stmt ';'
	| stmt_list stmt ';'
	;

stmt :
	misc_attrs any_stmt
	;

any_stmt :
	alter_table_add_column_stmt
	| begin_schema_region_stmt
	| begin_trans_stmt
	| call_stmt
	| close_stmt
	| commit_return_stmt
	| commit_trans_stmt
	| continue_stmt
	| create_index_stmt
	| create_proc_stmt
	| create_table_stmt
	| create_trigger_stmt
	| create_view_stmt
	| create_virtual_table_stmt
	| declare_deployable_region_stmt
	| declare_enum_stmt
	| declare_func_stmt
	| declare_out_call_stmt
	| declare_proc_no_check_stmt
	| declare_proc_stmt
	| declare_schema_region_stmt
	| declare_stmt
	| delete_stmt
	| drop_index_stmt
	| drop_table_stmt
	| drop_trigger_stmt
	| drop_view_stmt
	| echo_stmt
	| emit_enums_stmt
	| end_schema_region_stmt
	| enforce_normal_stmt
	| enforce_pop_stmt
	| enforce_push_stmt
	| enforce_reset_stmt
	| enforce_strict_stmt
	| explain_stmt
	| fetch_call_stmt
	| fetch_stmt
	| fetch_values_stmt
	| guard_stmt
	| if_stmt
	| insert_stmt
	| leave_stmt
	| let_stmt
	| loop_stmt
	| open_stmt
	| out_stmt
	| out_union_stmt
	| previous_schema_stmt
	| proc_savepoint_stmt
	| release_savepoint_stmt
	| return_stmt
	| rollback_return_stmt
	| rollback_trans_stmt
	| savepoint_stmt
	| select_stmt
	| schema_ad_hoc_migration_stmt
	| schema_upgrade_script_stmt
	| schema_upgrade_version_stmt
	| set_stmt
	| switch_stmt
	| throw_stmt
	| trycatch_stmt
	| update_cursor_stmt
	| update_stmt
	| upsert_stmt
	| while_stmt
	| with_delete_stmt
	| with_insert_stmt
	| with_update_stmt
	| with_upsert_stmt
	;

explain_stmt :
	EXPLAIN opt_query_plan explain_target
	;

opt_query_plan :
	/* empty */
	| QUERY_PLAN
	;

explain_target :
	select_stmt
	| update_stmt
	| delete_stmt
	| with_delete_stmt
	| with_insert_stmt
	| insert_stmt
	| upsert_stmt
	| drop_table_stmt
	| drop_view_stmt
	| drop_index_stmt
	| drop_trigger_stmt
	| begin_trans_stmt
	| commit_trans_stmt
	;

previous_schema_stmt :
	AT_PREVIOUS_SCHEMA
	;

schema_upgrade_script_stmt :
	AT_SCHEMA_UPGRADE_SCRIPT
	;

schema_upgrade_version_stmt :
	AT_SCHEMA_UPGRADE_VERSION '(' INTLIT ')'
	;

set_stmt :
	SET name ASSIGN expr
	| SET name FROM CURSOR name
	;

let_stmt :
	LET name ASSIGN expr
	;

version_attrs_opt_recreate :
	/* empty */
	| AT_RECREATE
	| AT_RECREATE '(' name ')'
	| version_attrs
	;

opt_version_attrs :
	/* empty */
	| version_attrs
	;

version_attrs :
	AT_CREATE version_annotation opt_version_attrs
	| AT_DELETE version_annotation opt_version_attrs
	;

opt_delete_version_attr :
	/* empty */
	| AT_DELETE version_annotation
	;

drop_table_stmt :
	DROP TABLE IF EXISTS name
	| DROP TABLE name
	;

drop_view_stmt :
	DROP VIEW IF EXISTS name
	| DROP VIEW name
	;

drop_index_stmt :
	DROP INDEX IF EXISTS name
	| DROP INDEX name
	;

drop_trigger_stmt :
	DROP TRIGGER IF EXISTS name
	| DROP TRIGGER name
	;

create_virtual_table_stmt :
	CREATE VIRTUAL TABLE opt_if_not_exists name USING name opt_module_args AS '(' col_key_list ')' opt_delete_version_attr
	;

opt_module_args :
	/* empty */
	| '(' misc_attr_value_list ')'
	| '(' ARGUMENTS FOLLOWING ')'
	;

create_table_stmt :
	CREATE opt_temp TABLE opt_if_not_exists name '(' col_key_list ')' opt_no_rowid version_attrs_opt_recreate
	;

opt_temp :
	/* empty */
	| TEMP
	;

opt_if_not_exists :
	/* empty */
	| IF NOT EXISTS
	;

opt_no_rowid :
	/* empty */
	| WITHOUT ROWID
	;

col_key_list :
	col_key_def
	| col_key_def ',' col_key_list
	;

col_key_def :
	col_def
	| pk_def
	| fk_def
	| unq_def
	| check_def
	| shape_def
	;

check_def :
	CONSTRAINT name CHECK '(' expr ')'
	| CHECK '(' expr ')'
	;

shape_def :
	LIKE name
	| LIKE name ARGUMENTS
	;

col_name :
	name
	;

misc_attr_key :
	name
	| name ':' name
	;

misc_attr_value_list :
	misc_attr_value
	| misc_attr_value ',' misc_attr_value_list
	;

misc_attr_value :
	name
	| any_literal
	| const_expr
	| '(' misc_attr_value_list ')'
	| '-' num_literal
	;

misc_attr :
	AT_ATTRIBUTE '(' misc_attr_key ')'
	| AT_ATTRIBUTE '(' misc_attr_key '=' misc_attr_value ')'
	;

misc_attrs :
	/* empty */
	| misc_attr misc_attrs
	;

col_def :
	misc_attrs col_name data_type_any col_attrs
	;

pk_def :
	CONSTRAINT name PRIMARY KEY '(' indexed_columns ')' opt_conflict_clause
	| PRIMARY KEY '(' indexed_columns ')' opt_conflict_clause
	;

opt_conflict_clause :
	/* empty */
	| conflict_clause
	;

conflict_clause :
	ON_CONFLICT ROLLBACK
	| ON_CONFLICT ABORT
	| ON_CONFLICT FAIL
	| ON_CONFLICT IGNORE
	| ON_CONFLICT REPLACE
	;

opt_fk_options :
	/* empty */
	| fk_options
	;

fk_options :
	fk_on_options
	| fk_deferred_options
	| fk_on_options fk_deferred_options
	;

fk_on_options :
	ON DELETE fk_action
	| ON UPDATE fk_action
	| ON UPDATE fk_action ON DELETE fk_action
	| ON DELETE fk_action ON UPDATE fk_action
	;

fk_action :
	SET NULL_
	| SET DEFAULT
	| CASCADE
	| RESTRICT
	| NO ACTION
	;

fk_deferred_options :
	DEFERRABLE fk_initial_state
	| NOT_DEFERRABLE fk_initial_state
	;

fk_initial_state :
	/* empty */
	| INITIALLY DEFERRED
	| INITIALLY IMMEDIATE
	;

fk_def :
	CONSTRAINT name FOREIGN KEY '(' name_list ')' fk_target_options
	| FOREIGN KEY '(' name_list ')' fk_target_options
	;

fk_target_options :
	REFERENCES name '(' name_list ')' opt_fk_options
	;

unq_def :
	CONSTRAINT name UNIQUE '(' indexed_columns ')' opt_conflict_clause
	| UNIQUE '(' indexed_columns ')' opt_conflict_clause
	;

opt_unique :
	/* empty */
	| UNIQUE
	;

indexed_column :
	expr opt_asc_desc
	;

indexed_columns :
	indexed_column
	| indexed_column ',' indexed_columns
	;

create_index_stmt :
	CREATE opt_unique INDEX opt_if_not_exists name ON name '(' indexed_columns ')' opt_where opt_delete_version_attr
	;

name :
	ID
	| TEXT
	| TRIGGER
	| ROWID
	| REPLACE
	| KEY
	| VIRTUAL
	| TYPE
	| HIDDEN
	| PRIVATE
	;

opt_name :
	/* empty */
	| name
	;

name_list :
	name
	| name ',' name_list
	;

opt_name_list :
	/* empty */
	| name_list
	;

col_attrs :
	/* empty */
	| NOT NULL_ opt_conflict_clause col_attrs
	| PRIMARY KEY opt_conflict_clause col_attrs
	| PRIMARY KEY opt_conflict_clause AUTOINCREMENT col_attrs
	| DEFAULT '-' num_literal col_attrs
	| DEFAULT num_literal col_attrs
	| DEFAULT const_expr col_attrs
	| DEFAULT str_literal col_attrs
	| COLLATE name col_attrs
	| CHECK '(' expr ')' col_attrs
	| UNIQUE opt_conflict_clause col_attrs
	| HIDDEN col_attrs
	| AT_SENSITIVE col_attrs
	| AT_CREATE version_annotation col_attrs
	| AT_DELETE version_annotation col_attrs
	| fk_target_options col_attrs
	;

version_annotation :
	'(' INTLIT ',' name ')'
	| '(' INTLIT ',' name ':' name ')'
	| '(' INTLIT ')'
	;

opt_kind :
	/* empty */
	| '<' name '>'
	;

data_type_numeric :
	INT_ opt_kind
	| INTEGER opt_kind
	| REAL opt_kind
	| LONG_ opt_kind
	| BOOL_ opt_kind
	| LONG_ INTEGER opt_kind
	| LONG_ INT_ opt_kind
	| LONG_INT opt_kind
	| LONG_INTEGER opt_kind
	;

data_type_any :
	data_type_numeric
	| TEXT opt_kind
	| BLOB opt_kind
	| OBJECT opt_kind
	| OBJECT '<' name CURSOR '>'
	| ID
	;

data_type_with_options :
	data_type_any
	| data_type_any NOT NULL_
	| data_type_any AT_SENSITIVE
	| data_type_any AT_SENSITIVE NOT NULL_
	| data_type_any NOT NULL_ AT_SENSITIVE
	;

str_literal :
	STRLIT
	| CSTRLIT
	;

num_literal :
	INTLIT
	| LONGLIT
	| REALLIT
	| TRUE_
	| FALSE_
	;

const_expr :
	CONST '(' expr ')'
	;

any_literal :
	str_literal
	| num_literal
	| NULL_
	| AT_FILE '(' str_literal ')'
	| AT_PROC
	| BLOBLIT
	;

raise_expr :
	RAISE '(' IGNORE ')'
	| RAISE '(' ROLLBACK ',' expr ')'
	| RAISE '(' ABORT ',' expr ')'
	| RAISE '(' FAIL ',' expr ')'
	;

call :
	name '(' arg_list ')' opt_filter_clause
	| name '(' DISTINCT arg_list ')' opt_filter_clause
	;

basic_expr :
	name
	| AT_RC
	| name '.' name
	| any_literal
	| const_expr
	| '(' expr ')'
	| call
	| window_func_inv
	| raise_expr
	| '(' select_stmt ')'
	| '(' select_stmt IF NOTHING expr ')'
	| '(' select_stmt IF NOTHING OR NULL_ expr ')'
	| '(' select_stmt IF NOTHING THROW ')'
	| EXISTS '(' select_stmt ')'
	| CASE expr case_list END
	| CASE expr case_list ELSE expr END
	| CASE case_list END
	| CASE case_list ELSE expr END
	| CAST '(' expr AS data_type_any ')'
	;

math_expr :
	basic_expr
	| math_expr '&' math_expr
	| math_expr '|' math_expr
	| math_expr LS math_expr
	| math_expr RS math_expr
	| math_expr '+' math_expr
	| math_expr '-' math_expr
	| math_expr '*' math_expr
	| math_expr '/' math_expr
	| math_expr '%' math_expr
	| math_expr IS_NOT_TRUE
	| math_expr IS_NOT_FALSE
	| math_expr ISNULL
	| math_expr NOTNULL
	| math_expr IS_TRUE
	| math_expr IS_FALSE
	| '-' math_expr  %prec UMINUS
	| '~' math_expr  %prec UMINUS
	| NOT math_expr
	| math_expr '=' math_expr
	| math_expr EQEQ math_expr
	| math_expr '<' math_expr
	| math_expr '>' math_expr
	| math_expr NE math_expr
	| math_expr NE_ math_expr
	| math_expr GE math_expr
	| math_expr LE math_expr
	| math_expr NOT_IN '(' expr_list ')'
	| math_expr NOT_IN '(' select_stmt ')'
	| math_expr IN '(' expr_list ')'
	| math_expr IN '(' select_stmt ')'
	| math_expr LIKE math_expr
	| math_expr NOT_LIKE math_expr
	| math_expr MATCH math_expr
	| math_expr NOT_MATCH math_expr
	| math_expr REGEXP math_expr
	| math_expr NOT_REGEXP math_expr
	| math_expr GLOB math_expr
	| math_expr NOT_GLOB math_expr
	| math_expr BETWEEN math_expr AND math_expr %prec BETWEEN
	| math_expr NOT_BETWEEN math_expr AND math_expr %prec BETWEEN
	| math_expr IS_NOT math_expr
	| math_expr IS math_expr
	| math_expr CONCAT math_expr
	| math_expr COLLATE name
	;

expr :
	math_expr
	| expr AND expr
	| expr OR expr
	;

case_list :
	WHEN expr THEN expr
	| WHEN expr THEN expr case_list
	;

arg_expr :
	'*'
	| expr
	| shape_arguments
	;

arg_list :
	/* empty */
	| arg_expr
	| arg_expr ',' arg_list
	;

expr_list :
	expr
	| expr ',' expr_list
	;

shape_arguments :
	FROM name
	| FROM name shape_def
	| FROM ARGUMENTS
	| FROM ARGUMENTS shape_def
	;

call_expr :
	expr
	| shape_arguments
	;

call_expr_list :
	call_expr
	| call_expr ',' call_expr_list
	;

cte_tables :
	cte_table
	| cte_table ',' cte_tables
	;

cte_table :
	name '(' name_list ')' AS '(' select_stmt_no_with ')'
	| name '(' '*' ')' AS '(' select_stmt_no_with ')'
	;

with_prefix :
	WITH cte_tables
	| WITH RECURSIVE cte_tables
	;

with_select_stmt :
	with_prefix select_stmt_no_with
	;

select_stmt :
	with_select_stmt
	| select_stmt_no_with
	;

select_stmt_no_with :
	select_core_list opt_orderby opt_limit opt_offset
	;

select_core_list :
	select_core
	| select_core compound_operator select_core_list
	;

values :
	'(' insert_list ')'
	| '(' insert_list ')' ',' values
	;

select_core :
	SELECT select_opts select_expr_list opt_from_query_parts opt_where opt_groupby opt_having opt_select_window
	| VALUES values
	;

compound_operator :
	UNION
	| UNION_ALL
	| INTERSECT
	| EXCEPT
	;

window_func_inv :
	name '(' arg_list ')' opt_filter_clause OVER window_name_or_defn
	;

opt_filter_clause :
	/* empty */
	| FILTER '(' opt_where ')'
	;

window_name_or_defn :
	window_defn
	| name
	;

window_defn :
	'(' opt_partition_by opt_orderby opt_frame_spec ')'
	;

opt_frame_spec :
	/* empty */
	| frame_type frame_boundary_opts frame_exclude
	;

frame_type :
	RANGE
	| ROWS
	| GROUPS
	;

frame_exclude :
	/* empty */
	| EXCLUDE_NO_OTHERS
	| EXCLUDE_CURRENT_ROW
	| EXCLUDE_GROUP
	| EXCLUDE_TIES
	;

frame_boundary_opts :
	frame_boundary
	| BETWEEN frame_boundary_start AND frame_boundary_end
	;

frame_boundary_start :
	UNBOUNDED PRECEDING
	| expr PRECEDING
	| CURRENT_ROW
	| expr FOLLOWING
	;

frame_boundary_end :
	expr PRECEDING
	| CURRENT_ROW
	| expr FOLLOWING
	| UNBOUNDED FOLLOWING
	;

frame_boundary :
	UNBOUNDED PRECEDING
	| expr PRECEDING
	| CURRENT_ROW
	;

opt_partition_by :
	/* empty */
	| PARTITION BY expr_list
	;

opt_select_window :
	/* empty */
	| window_clause
	;

window_clause :
	WINDOW window_name_defn_list
	;

window_name_defn_list :
	window_name_defn
	| window_name_defn ',' window_name_defn_list
	;

window_name_defn :
	name AS window_defn
	;

region_spec :
	name
	| name PRIVATE
	;

region_list :
	region_spec ',' region_list
	| region_spec
	;

declare_schema_region_stmt :
	AT_DECLARE_SCHEMA_REGION name
	| AT_DECLARE_SCHEMA_REGION name USING region_list
	;

declare_deployable_region_stmt :
	AT_DECLARE_DEPLOYABLE_REGION name
	| AT_DECLARE_DEPLOYABLE_REGION name USING region_list
	;

begin_schema_region_stmt :
	AT_BEGIN_SCHEMA_REGION name
	;

end_schema_region_stmt :
	AT_END_SCHEMA_REGION
	;

schema_ad_hoc_migration_stmt :
	AT_SCHEMA_AD_HOC_MIGRATION version_annotation
	| AT_SCHEMA_AD_HOC_MIGRATION FOR AT_RECREATE '(' name ',' name ')'
	;

emit_enums_stmt :
	AT_EMIT_ENUMS opt_name_list
	;

opt_from_query_parts :
	/* empty */
	| FROM query_parts
	;

opt_where :
	/* empty */
	| WHERE expr
	;

opt_groupby :
	/* empty */
	| GROUP BY groupby_list
	;

groupby_list :
	groupby_item
	| groupby_item ',' groupby_list
	;

groupby_item :
	expr opt_asc_desc
	;

opt_asc_desc :
	/* empty */
	| ASC
	| DESC
	;

opt_having :
	/* empty */
	| HAVING expr
	;

opt_orderby :
	/* empty */
	| ORDER BY groupby_list
	;

opt_limit :
	/* empty */
	| LIMIT expr
	;

opt_offset :
	/* empty */
	| OFFSET expr
	;

select_opts :
	/* empty */
	| ALL
	| DISTINCT
	| DISTINCTROW
	;

select_expr_list :
	select_expr
	| select_expr ',' select_expr_list
	| '*'
	;

select_expr :
	expr opt_as_alias
	| name '.' '*'
	;

opt_as_alias :
	/* empty */
	| as_alias
	;

as_alias :
	AS name
	| name
	;

query_parts :
	table_or_subquery_list
	| join_clause
	;

table_or_subquery_list :
	table_or_subquery
	| table_or_subquery ',' table_or_subquery_list
	;

join_clause :
	table_or_subquery join_target_list
	;

join_target_list :
	join_target
	| join_target join_target_list
	;

table_or_subquery :
	name opt_as_alias
	| '(' select_stmt ')' opt_as_alias
	| table_function opt_as_alias
	| '(' query_parts ')'
	;

join_type :
	/* empty */
	| LEFT
	| RIGHT
	| LEFT OUTER
	| RIGHT OUTER
	| INNER
	| CROSS
	;

join_target :
	join_type JOIN table_or_subquery opt_join_cond
	;

opt_join_cond :
	/* empty */
	| join_cond
	;

join_cond :
	ON expr
	| USING '(' name_list ')'
	;

table_function :
	name '(' arg_list ')'
	;

create_view_stmt :
	CREATE opt_temp VIEW opt_if_not_exists name AS select_stmt opt_delete_version_attr
	;

with_delete_stmt :
	with_prefix delete_stmt
	;

delete_stmt :
	DELETE FROM name opt_where
	;

opt_insert_dummy_spec :
	/* empty */
	| AT_DUMMY_SEED '(' expr ')' dummy_modifier
	;

dummy_modifier :
	/* empty */
	| AT_DUMMY_NULLABLES
	| AT_DUMMY_DEFAULTS
	| AT_DUMMY_NULLABLES AT_DUMMY_DEFAULTS
	| AT_DUMMY_DEFAULTS AT_DUMMY_NULLABLES
	;

insert_stmt_type :
	INSERT INTO
	| INSERT OR REPLACE INTO
	| INSERT OR IGNORE INTO
	| INSERT OR ROLLBACK INTO
	| INSERT OR ABORT INTO
	| INSERT OR FAIL INTO
	| REPLACE INTO
	;

with_insert_stmt :
	with_prefix insert_stmt
	;

opt_column_spec :
	/* empty */
	| '(' opt_name_list ')'
	| '(' shape_def ')'
	;

from_shape :
	FROM CURSOR name opt_column_spec
	| FROM name opt_column_spec
	| FROM ARGUMENTS opt_column_spec
	;

insert_stmt :
	insert_stmt_type name opt_column_spec select_stmt opt_insert_dummy_spec
	| insert_stmt_type name opt_column_spec from_shape opt_insert_dummy_spec
	| insert_stmt_type name DEFAULT VALUES
	| insert_stmt_type name USING select_stmt
	| insert_stmt_type name USING expr_names opt_insert_dummy_spec
	;

insert_list :
	/* empty */
	| expr
	| expr ',' insert_list
	;

basic_update_stmt :
	UPDATE opt_name SET update_list opt_where
	;

with_update_stmt :
	with_prefix update_stmt
	;

update_stmt :
	UPDATE name SET update_list opt_where opt_orderby opt_limit
	;

update_entry :
	name '=' expr
	;

update_list :
	update_entry
	| update_entry ',' update_list
	;

with_upsert_stmt :
	with_prefix upsert_stmt
	;

upsert_stmt :
	insert_stmt ON_CONFLICT conflict_target DO NOTHING
	| insert_stmt ON_CONFLICT conflict_target DO basic_update_stmt
	;

update_cursor_stmt :
	UPDATE CURSOR name opt_column_spec FROM VALUES '(' insert_list ')'
	| UPDATE CURSOR name opt_column_spec from_shape
	| UPDATE CURSOR name USING expr_names
	;

conflict_target :
	/* empty */
	| '(' indexed_columns ')' opt_where
	;

function :
	FUNC
	| FUNCTION
	;

declare_out_call_stmt :
	DECLARE OUT call_stmt
	;

declare_enum_stmt :
	DECLARE ENUM name data_type_numeric '(' enum_values ')'
	;

enum_values :
	enum_value
	| enum_value ',' enum_values
	;

enum_value :
	name
	| name '=' expr
	;

declare_func_stmt :
	DECLARE function name '(' params ')' data_type_with_options
	| DECLARE SELECT function name '(' params ')' data_type_with_options
	| DECLARE function name '(' params ')' CREATE data_type_with_options
	| DECLARE SELECT function name '(' params ')' '(' typed_names ')'
	;

procedure :
	PROC
	| PROCEDURE
	;

declare_proc_no_check_stmt :
	DECLARE procedure name NO CHECK
	;

declare_proc_stmt :
	DECLARE procedure name '(' params ')'
	| DECLARE procedure name '(' params ')' '(' typed_names ')'
	| DECLARE procedure name '(' params ')' USING TRANSACTION
	| DECLARE procedure name '(' params ')' OUT '(' typed_names ')'
	| DECLARE procedure name '(' params ')' OUT '(' typed_names ')' USING TRANSACTION
	| DECLARE procedure name '(' params ')' OUT UNION '(' typed_names ')'
	| DECLARE procedure name '(' params ')' OUT UNION '(' typed_names ')' USING TRANSACTION
	;

create_proc_stmt :
	CREATE procedure name '(' params ')' BEGIN_ opt_stmt_list END
	;

inout :
	IN
	| OUT
	| INOUT
	;

typed_name :
	name data_type_with_options
	| shape_def
	| name shape_def
	;

typed_names :
	typed_name
	| typed_name ',' typed_names
	;

param :
	name data_type_with_options
	| inout name data_type_with_options
	| shape_def
	| name shape_def
	;

params :
	/* empty */
	| param
	| param ',' params
	;

declare_stmt :
	DECLARE name_list data_type_with_options
	| DECLARE name CURSOR FOR select_stmt
	| DECLARE name CURSOR FOR explain_stmt
	| DECLARE name CURSOR FOR call_stmt
	| DECLARE name CURSOR FETCH FROM call_stmt
	| DECLARE name CURSOR shape_def
	| DECLARE name CURSOR LIKE select_stmt
	| DECLARE name CURSOR FOR name
	| DECLARE name TYPE data_type_with_options
	;

call_stmt :
	CALL name '(' ')'
	| CALL name '(' call_expr_list ')'
	;

while_stmt :
	WHILE expr BEGIN_ opt_stmt_list END
	;

switch_stmt :
	SWITCH expr switch_case switch_cases
	| SWITCH expr ALL VALUES switch_case switch_cases
	;

switch_case :
	WHEN expr_list THEN stmt_list
	| WHEN expr_list THEN NOTHING
	;

switch_cases :
	switch_case switch_cases
	| ELSE stmt_list END
	| END
	;

loop_stmt :
	LOOP fetch_stmt BEGIN_ opt_stmt_list END
	;

leave_stmt :
	LEAVE
	;

return_stmt :
	RETURN
	;

rollback_return_stmt :
	ROLLBACK RETURN
	;

commit_return_stmt :
	COMMIT RETURN
	;

throw_stmt :
	THROW
	;

trycatch_stmt :
	BEGIN_ TRY opt_stmt_list END TRY ';' BEGIN_ CATCH opt_stmt_list END CATCH
	;

continue_stmt :
	CONTINUE
	;

fetch_stmt :
	FETCH name INTO name_list
	| FETCH name
	;

fetch_values_stmt :
	FETCH name opt_column_spec FROM VALUES '(' insert_list ')' opt_insert_dummy_spec
	| FETCH name opt_column_spec from_shape opt_insert_dummy_spec
	| FETCH name USING expr_names opt_insert_dummy_spec
	;

expr_names :
	expr_name
	| expr_name ',' expr_names
	;

expr_name :
	expr as_alias
	;

fetch_call_stmt :
	FETCH name opt_column_spec FROM call_stmt
	;

open_stmt :
	OPEN name
	;

close_stmt :
	CLOSE name
	;

out_stmt :
	OUT name
	;

out_union_stmt :
	OUT UNION name
	;

if_stmt :
	IF expr THEN opt_stmt_list opt_elseif_list opt_else END IF
	;

opt_else :
	/* empty */
	| ELSE opt_stmt_list
	;

elseif_item :
	ELSE_IF expr THEN opt_stmt_list
	;

elseif_list :
	elseif_item
	| elseif_item elseif_list
	;

opt_elseif_list :
	/* empty */
	| elseif_list
	;

control_stmt :
	commit_return_stmt
	| continue_stmt
	| leave_stmt
	| return_stmt
	| rollback_return_stmt
	| throw_stmt
	;

guard_stmt :
	IF expr control_stmt
	;

transaction_mode :
	/* empty */
	| DEFERRED
	| IMMEDIATE
	| EXCLUSIVE
	;

begin_trans_stmt :
	BEGIN_ transaction_mode TRANSACTION
	| BEGIN_ transaction_mode
	;

rollback_trans_stmt :
	ROLLBACK
	| ROLLBACK TRANSACTION
	| ROLLBACK TO savepoint_name
	| ROLLBACK TRANSACTION TO savepoint_name
	| ROLLBACK TO SAVEPOINT savepoint_name
	| ROLLBACK TRANSACTION TO SAVEPOINT savepoint_name
	;

commit_trans_stmt :
	COMMIT TRANSACTION
	| COMMIT
	;

proc_savepoint_stmt :
	procedure SAVEPOINT BEGIN_ opt_stmt_list END
	;

savepoint_name :
	AT_PROC
	| name
	;

savepoint_stmt :
	SAVEPOINT savepoint_name
	;

release_savepoint_stmt :
	RELEASE savepoint_name
	| RELEASE SAVEPOINT savepoint_name
	;

echo_stmt :
	AT_ECHO name ',' str_literal
	;

alter_table_add_column_stmt :
	ALTER TABLE name ADD COLUMN col_def
	;

create_trigger_stmt :
	CREATE opt_temp TRIGGER opt_if_not_exists trigger_def opt_delete_version_attr
	;

trigger_def :
	name trigger_condition trigger_operation ON name trigger_action
	;

trigger_condition :
	/* empty */
	| BEFORE
	| AFTER
	| INSTEAD OF
	;

trigger_operation :
	DELETE
	| INSERT
	| UPDATE opt_of
	;

opt_of :
	/* empty */
	| OF name_list
	;

trigger_action :
	opt_foreachrow opt_when_expr BEGIN_ trigger_stmts END
	;

opt_foreachrow :
	/* empty */
	| FOR_EACH_ROW
	;

opt_when_expr :
	/* empty */
	| WHEN expr
	;

trigger_stmts :
	trigger_stmt
	| trigger_stmt trigger_stmts
	;

trigger_stmt :
	trigger_update_stmt ';'
	| trigger_insert_stmt ';'
	| trigger_delete_stmt ';'
	| trigger_select_stmt ';'
	;

trigger_select_stmt :
	select_stmt_no_with
	;

trigger_insert_stmt :
	insert_stmt
	;

trigger_delete_stmt :
	delete_stmt
	;

trigger_update_stmt :
	basic_update_stmt
	;

enforcement_options :
	FOREIGN KEY ON UPDATE
	| FOREIGN KEY ON DELETE
	| JOIN
	| UPSERT STATEMENT
	| WINDOW function
	| WITHOUT ROWID
	| TRANSACTION
	| SELECT IF NOTHING
	| INSERT SELECT
	| TABLE FUNCTION
	| ENCODE CONTEXT_COLUMN
	| ENCODE CONTEXT_TYPE INTEGER
	| ENCODE CONTEXT_TYPE LONG_INTEGER
	| ENCODE CONTEXT_TYPE REAL
	| ENCODE CONTEXT_TYPE BOOL_
	| ENCODE CONTEXT_TYPE TEXT
	| ENCODE CONTEXT_TYPE BLOB
	| IS_TRUE
	| CAST
	| NULL_ CHECK ON NOT NULL_
	;

enforce_strict_stmt :
	AT_ENFORCE_STRICT enforcement_options
	;

enforce_normal_stmt :
	AT_ENFORCE_NORMAL enforcement_options
	;

enforce_reset_stmt :
	AT_ENFORCE_RESET
	;

enforce_push_stmt :
	AT_ENFORCE_PUSH
	;

enforce_pop_stmt :
	AT_ENFORCE_POP
	;

//Lex

UNBOUNDED : 'UNBOUNDED' ;
EXPLAIN : 'EXPLAIN' ;
QUERY_PLAN : 'QUERY_PLAN' ;
AT_PREVIOUS_SCHEMA : 'AT_PREVIOUS_SCHEMA' ;
AT_SCHEMA_UPGRADE_SCRIPT : 'AT_SCHEMA_UPGRADE_SCRIPT' ;
AT_SCHEMA_UPGRADE_VERSION : 'AT_SCHEMA_UPGRADE_VERSION' ;
SET : 'SET' ;
FROM : 'FROM' ;
CURSOR : 'CURSOR' ;
LET : 'LET' ;
AT_RECREATE : 'AT_RECREATE' ;
AT_CREATE : 'AT_CREATE' ;
AT_DELETE : 'AT_DELETE' ;
DROP : 'DROP' ;
TABLE : 'TABLE' ;
IF : 'IF' ;
EXISTS : 'EXISTS' ;
VIEW : 'VIEW' ;
INDEX : 'INDEX' ;
TRIGGER : 'TRIGGER' ;
CREATE : 'CREATE' ;
VIRTUAL : 'VIRTUAL' ;
USING : 'USING' ;
AS : 'AS' ;
ARGUMENTS : 'ARGUMENTS' ;
FOLLOWING : 'FOLLOWING' ;
TEMP : 'TEMP' ;
WITHOUT : 'WITHOUT' ;
ROWID : 'ROWID' ;
CONSTRAINT : 'CONSTRAINT' ;
CHECK : 'CHECK' ;
AT_ATTRIBUTE : 'AT_ATTRIBUTE' ;
PRIMARY : 'PRIMARY' ;
KEY : 'KEY' ;
ON_CONFLICT : 'ON_CONFLICT' ;
ROLLBACK : 'ROLLBACK' ;
ABORT : 'ABORT' ;
FAIL : 'FAIL' ;
IGNORE : 'IGNORE' ;
REPLACE : 'REPLACE' ;
ON : 'ON' ;
DELETE : 'DELETE' ;
UPDATE : 'UPDATE' ;
NULL_ : 'NULL_' ;
DEFAULT : 'DEFAULT' ;
CASCADE : 'CASCADE' ;
RESTRICT : 'RESTRICT' ;
NO : 'NO' ;
ACTION : 'ACTION' ;
DEFERRABLE : 'DEFERRABLE' ;
NOT_DEFERRABLE : 'NOT_DEFERRABLE' ;
INITIALLY : 'INITIALLY' ;
DEFERRED : 'DEFERRED' ;
IMMEDIATE : 'IMMEDIATE' ;
FOREIGN : 'FOREIGN' ;
REFERENCES : 'REFERENCES' ;
UNIQUE : 'UNIQUE' ;
TEXT : 'TEXT' ;
TYPE : 'TYPE' ;
HIDDEN : 'HIDDEN' ;
PRIVATE : 'PRIVATE' ;
AUTOINCREMENT : 'AUTOINCREMENT' ;
AT_SENSITIVE : 'AT_SENSITIVE' ;
INT_ : 'INT_' ;
INTEGER : 'INTEGER' ;
REAL : 'REAL' ;
LONG_ : 'LONG_' ;
BOOL_ : 'BOOL_' ;
LONG_INT : 'LONG_INT' ;
LONG_INTEGER : 'LONG_INTEGER' ;
BLOB : 'BLOB' ;
OBJECT : 'OBJECT' ;
STRLIT : 'STRLIT' ;
CSTRLIT : 'CSTRLIT' ;
LONGLIT : 'LONGLIT' ;
TRUE_ : 'TRUE_' ;
FALSE_ : 'FALSE_' ;
CONST : 'CONST' ;
AT_FILE : 'AT_FILE' ;
AT_PROC : 'AT_PROC' ;
BLOBLIT : 'BLOBLIT' ;
RAISE : 'RAISE' ;
DISTINCT : 'DISTINCT' ;
AT_RC : 'AT_RC' ;
NOTHING : 'NOTHING' ;
THROW : 'THROW' ;
CASE : 'CASE' ;
END : 'END' ;
ELSE : 'ELSE' ;
CAST : 'CAST' ;
WHEN : 'WHEN' ;
THEN : 'THEN' ;
WITH : 'WITH' ;
RECURSIVE : 'RECURSIVE' ;
SELECT : 'SELECT' ;
VALUES : 'VALUES' ;
OVER : 'OVER' ;
FILTER : 'FILTER' ;
RANGE : 'RANGE' ;
ROWS : 'ROWS' ;
GROUPS : 'GROUPS' ;
EXCLUDE_NO_OTHERS : 'EXCLUDE_NO_OTHERS' ;
EXCLUDE_CURRENT_ROW : 'EXCLUDE_CURRENT_ROW' ;
EXCLUDE_GROUP : 'EXCLUDE_GROUP' ;
EXCLUDE_TIES : 'EXCLUDE_TIES' ;
PRECEDING : 'PRECEDING' ;
CURRENT_ROW : 'CURRENT_ROW' ;
PARTITION : 'PARTITION' ;
BY : 'BY' ;
WINDOW : 'WINDOW' ;
AT_DECLARE_SCHEMA_REGION : 'AT_DECLARE_SCHEMA_REGION' ;
AT_DECLARE_DEPLOYABLE_REGION : 'AT_DECLARE_DEPLOYABLE_REGION' ;
AT_BEGIN_SCHEMA_REGION : 'AT_BEGIN_SCHEMA_REGION' ;
AT_END_SCHEMA_REGION : 'AT_END_SCHEMA_REGION' ;
AT_SCHEMA_AD_HOC_MIGRATION : 'AT_SCHEMA_AD_HOC_MIGRATION' ;
FOR : 'FOR' ;
AT_EMIT_ENUMS : 'AT_EMIT_ENUMS' ;
WHERE : 'WHERE' ;
GROUP : 'GROUP' ;
ASC : 'ASC' ;
DESC : 'DESC' ;
HAVING : 'HAVING' ;
ORDER : 'ORDER' ;
LIMIT : 'LIMIT' ;
OFFSET : 'OFFSET' ;
ALL : 'ALL' ;
DISTINCTROW : 'DISTINCTROW' ;
LEFT : 'LEFT' ;
RIGHT : 'RIGHT' ;
OUTER : 'OUTER' ;
INNER : 'INNER' ;
CROSS : 'CROSS' ;
JOIN : 'JOIN' ;
AT_DUMMY_SEED : 'AT_DUMMY_SEED' ;
AT_DUMMY_NULLABLES : 'AT_DUMMY_NULLABLES' ;
AT_DUMMY_DEFAULTS : 'AT_DUMMY_DEFAULTS' ;
INSERT : 'INSERT' ;
INTO : 'INTO' ;
DO : 'DO' ;
FUNC : 'FUNC' ;
FUNCTION : 'FUNCTION' ;
DECLARE : 'DECLARE' ;
OUT : 'OUT' ;
ENUM : 'ENUM' ;
PROC : 'PROC' ;
PROCEDURE : 'PROCEDURE' ;
TRANSACTION : 'TRANSACTION' ;
BEGIN_ : 'BEGIN_' ;
INOUT : 'INOUT' ;
FETCH : 'FETCH' ;
CALL : 'CALL' ;
WHILE : 'WHILE' ;
SWITCH : 'SWITCH' ;
LOOP : 'LOOP' ;
LEAVE : 'LEAVE' ;
RETURN : 'RETURN' ;
COMMIT : 'COMMIT' ;
TRY : 'TRY' ;
CATCH : 'CATCH' ;
CONTINUE : 'CONTINUE' ;
OPEN : 'OPEN' ;
CLOSE : 'CLOSE' ;
ELSE_IF : 'ELSE_IF' ;
EXCLUSIVE : 'EXCLUSIVE' ;
TO : 'TO' ;
SAVEPOINT : 'SAVEPOINT' ;
RELEASE : 'RELEASE' ;
AT_ECHO : 'AT_ECHO' ;
ALTER : 'ALTER' ;
ADD : 'ADD' ;
COLUMN : 'COLUMN' ;
BEFORE : 'BEFORE' ;
AFTER : 'AFTER' ;
INSTEAD : 'INSTEAD' ;
OF : 'OF' ;
FOR_EACH_ROW : 'FOR_EACH_ROW' ;
UPSERT : 'UPSERT' ;
STATEMENT : 'STATEMENT' ;
ENCODE : 'ENCODE' ;
CONTEXT_COLUMN : 'CONTEXT_COLUMN' ;
CONTEXT_TYPE : 'CONTEXT_TYPE' ;
AT_ENFORCE_STRICT : 'AT_ENFORCE_STRICT' ;
AT_ENFORCE_NORMAL : 'AT_ENFORCE_NORMAL' ;
AT_ENFORCE_RESET : 'AT_ENFORCE_RESET' ;
AT_ENFORCE_PUSH : 'AT_ENFORCE_PUSH' ;
AT_ENFORCE_POP : 'AT_ENFORCE_POP' ;
UMINUS : 'UMINUS' ;

ID : "[_A-Z][A-Z0-9_]*" ;
REALLIT : "([0-9]+'.'[0-9]*|'.'[0-9]+)(E('+'|'-')?[0-9]+)?" ;
INTLIT : "[0-9]+" ;

ASSIGN : ':=' ;
CONCAT : '||' ;
EQEQ : '==' ;
GE : '>=' ;
LE : '<=' ;
LS : '<<' ;
NE : '<>' ;
NE_ : '!=' ;
RS : '>>' ;

}

Output:

/usr/bin/time ./lalr-nb cql.g
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '+'
...
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '>>'
Command exited with non-zero status 1
2.15user 0.01system 0:02.17elapsed 99%CPU (0avgtext+0avgdata 32724maxresident)k
0inputs+0outputs (0major+7428minor)pagefaults 0swaps

@cwbaker cwbaker self-assigned this Jan 28, 2023
@cwbaker
Copy link
Owner

cwbaker commented Jan 28, 2023

I'm looking at an online playground like:

* https://yhirose.github.io/cpp-peglib/
* https://chrishixon.github.io/chpeg/playground/
* https://peggyjs.org/online.html
* https://mingodad.github.io/CocoR-Typescript/playground
* https://gerhobbelt.github.io/jison/try/

That would be awesome. Thanks for taking the time!

And testing it before trying implementing it with a SQL grammar and didn't found any way to define case insensitive tokens/literals.

You should be able to use a regular expression, e.g. "(SELECT)|(select)" or "[Ss][Ee][Ll][Ee][Cc][Tt]".

@cwbaker
Copy link
Owner

cwbaker commented Jan 28, 2023

I've noticed that this project do not seem to accept a dummy token name to be used only for precedence purposes.

Can you give me an example to show me what you mean? I don't quite understand how that would be used. Doesn't have to be related to the SQL grammar if you have a simpler example. Thanks.

@cwbaker
Copy link
Owner

cwbaker commented Jan 28, 2023

This grammar seem to be a good one to profile this project, because it's taking a bit of time to process it.

That's easily the biggest grammar that's been thrown at it.

I'm not sure how much time I have over the next few weeks to look at this, but I'll see what I can do.

I'd also be happy to accept a PR -- at a guess set<> is being used in several places where unordered_set<> would be much faster, beyond replacing the sets would be some care taken to ensure that subsequent runs generate deterministic state machines, i.e. the ordering of states needs to be deterministic to avoid generating different state machines from the same grammar.

Profiling it first would be a better move though, that's just my intuition as to what's taking up time.

@mingodad
Copy link
Contributor Author

I've noticed that this project do not seem to accept a dummy token name to be used only for precedence purposes.

Can you give me an example to show me what you mean? I don't quite understand how that would be used. Doesn't have to be related to the SQL grammar if you have a simpler example. Thanks.

Here https://www.gnu.org/software/bison/manual/html_node/Contextual-Precedence.html you can see the description of it.

The %prec modifier declares the precedence of a particular rule by specifying a terminal symbol whose precedence should be used for that rule. It’s not necessary for that symbol to appear otherwise in the rule. The modifier’s syntax is:

%prec terminal-symbol

@cwbaker
Copy link
Owner

cwbaker commented Mar 8, 2023

Can you give me an example to show me what you mean? I don't quite understand how that would be used. Doesn't have to be related to the SQL grammar if you have a simpler example. Thanks.

Here https://www.gnu.org/software/bison/manual/html_node/Contextual-Precedence.html you can see the description of it.

The %prec modifier declares the precedence of a particular rule by specifying a terminal symbol whose precedence should be used for that rule. It’s not necessary for that symbol to appear otherwise in the rule. The modifier’s syntax is:
%prec terminal-symbol

The lalr equivalent is %precedence and its used the same way, i.e. at the end of a rule to set the precedence for that rule to match the precedence of a token. The last paragraph of Precedence and Associativity describes it:

The precedence of a production defaults to that of its right-most terminal but can be explicitly set to the precedence of a different terminal using the %precedence directive. The precedence directive appears after the right-hand side of a production before any attached action and is followed by the terminal whose precedence and associativity the production is to inherit.

Is that what you're after?

@mingodad
Copy link
Contributor Author

mingodad commented Mar 8, 2023

Thank you again for reply !

It seems that's my fault because I was declaring %prec UMINUS; instead of %precedence UMINUS; , sorry about that.

But now I'm attaching the bison/yacc naked grammar and the converted lalr grammar and byacc/bison reports no unresolved conflicts but lalr reports lots of then:

./lalr cql.g
...
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '+'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '-'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '*'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '/'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '%'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '||'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '=='
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '>='
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '<='
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '<<'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (542): ERROR: shift/reduce conflict for 'math_expr' on '>>'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '+'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '-'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '*'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '/'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '%'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '||'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '=='
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '>='
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '<='
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '<<'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (555): ERROR: shift/reduce conflict for 'math_expr' on '>>'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '+'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '-'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '*'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '/'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '%'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '||'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '=='
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '>='
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '<='
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '<<'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (556): ERROR: shift/reduce conflict for 'math_expr' on '>>'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '+'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '-'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '*'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '/'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '%'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '||'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '=='
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '>='
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '<='
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '<<'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (557): ERROR: shift/reduce conflict for 'math_expr' on '>>'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '+'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '-'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '*'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '/'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '%'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '||'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '=='
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '>='
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '<='
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '<<'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (558): ERROR: shift/reduce conflict for 'math_expr' on '>>'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '+'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '-'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '*'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '/'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '%'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '||'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '=='
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '>='
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '<='
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '<<'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (559): ERROR: shift/reduce conflict for 'math_expr' on '>>'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '+'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '-'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '*'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '/'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '%'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '||'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '=='
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '>='
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '<='
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '<<'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (560): ERROR: shift/reduce conflict for 'math_expr' on '>>'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '+'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '-'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '*'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '/'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '%'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '||'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '=='
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '>='
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '<='
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '<<'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (561): ERROR: shift/reduce conflict for 'math_expr' on '>>'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '+'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '-'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '*'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '/'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '%'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '||'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '=='
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '>='
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '<='
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '<<'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (562): ERROR: shift/reduce conflict for 'math_expr' on '>>'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '+'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '-'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '*'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '/'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '%'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '||'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '=='
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '>='
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '<='
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '<<'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (565): ERROR: shift/reduce conflict for 'math_expr' on '>>'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '='
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '<'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '>'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '&'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '|'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '+'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '-'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '*'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '/'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '%'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '||'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '=='
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '>='
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '<='
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '<<'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '<>'
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '!='
lalr (566): ERROR: shift/reduce conflict for 'math_expr' on '>>'
bison  cql-naked.y
# no error/warning 

cql-grammars.zip

@cwbaker
Copy link
Owner

cwbaker commented Mar 9, 2023

There are some terminals that are undefined in that grammar, e.g. UNION_ALL, UNION, etc and these weren't being reported as errors -- that's a bug in lalr that is fixed by #12, hopefully merged to master main by the time you read this. Defining these as terminals solves the shift/reduce conflicts.

There was also an error with the REALLIT regular expression. I believe it should be "([0-9]+\.[0-9]*|\.[0-9]+)(E(\+|\-)?[0-9]+)?", i.e. remove the single quotes ' and escape ., +, and - characters with \.

Fixed grammar cql.zip

@mingodad
Copy link
Contributor Author

mingodad commented Mar 9, 2023

Thanks !
The original cql grammar uses FLEX (https://github.com/facebookincubator/CG-SQL/blob/main/sources/cql.l) to generate the scanner and has several multi word keywords like:

EXCLUDE{sp}NO{sp}OTHERS/{stop}   { return EXCLUDE_NO_OTHERS; }
EXCLUDE{sp}CURRENT{sp}ROW/{stop} { return EXCLUDE_CURRENT_ROW; }
EXCLUDE{sp}GROUP/{stop}          { return EXCLUDE_GROUP; }
EXCLUDE{sp}TIES/{stop}           { return EXCLUDE_TIES; }

CURRENT{sp}ROW/{stop}        { return CURRENT_ROW; }
...
QUERY{sp}PLAN/{stop}         { return QUERY_PLAN; }
...
UNION{sp}ALL/{stop}          { return UNION_ALL; }
...
ON{sp}CONFLICT/{stop}        { return ON_CONFLICT; }
...
FROM{sp}BLOB/{stop}          { return FROM_BLOB; }
...
IS{sp}NOT{sp}FALSE/{stop}    { return IS_NOT_FALSE; }
IS{sp}NOT{sp}TRUE/{stop}     { return IS_NOT_TRUE; }
IS{sp}FALSE/{stop}           { return IS_FALSE; }
IS{sp}TRUE/{stop}            { return IS_TRUE; }
IS{sp}NOT/{stop}             { return IS_NOT; }
...
NOT{sp}IN/{stop}             { return NOT_IN; }
...
NOT{sp}LIKE/{stop}           { return NOT_LIKE; }
...

How can those can be achieved in lalr ?

@mingodad
Copy link
Contributor Author

mingodad commented Mar 9, 2023

I'm trying multi word keyword like shown bellow and at least the grammar is accepted:

QUERY_PLAN : "QUERY[:space:]+PLAN" ;

Trying to test a grammar with input with the code shown bellow I think that it could be added to the main lalrc binary:

#include <stdio.h>
#include <stdarg.h>
#include <lalr/GrammarCompiler.hpp>
#include <lalr/Parser.ipp>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>

static int errors_ = 0;

static void show_error( const char* format, ... )
{
    ++errors_;
    va_list args;
    va_start( args, format );
    vfprintf( stderr, format, args );
    va_end( args );
}

int read_file(const char *fname, std::vector<char> &content)
{
        struct stat stat;
        int result = ::stat( fname, &stat );
        if ( result != 0 )
        {
            show_error( "Stat failed on '%s' - result=%d\n", fname, result );
            return EXIT_FAILURE;
        }

        FILE* file = fopen( fname, "rb" );
        if ( !file )
        {
            show_error( "Opening '%s' to read failed - errno=%d\n", fname, errno );
            return EXIT_FAILURE;
        }

        int size = stat.st_size;
        content.resize( size );
        int read = int( fread(&content[0], sizeof(uint8_t), size, file) );
        fclose( file );
        file = nullptr;
        if ( read != size )
        {
            show_error( "Reading grammar from '%s' failed - read=%d\n", fname, int(read) );
            return EXIT_FAILURE;
        }
	return EXIT_SUCCESS;
}

int main(int argc, char *argv[])
{
	const char *grammar_fn = nullptr;
	const char *input_fn = nullptr;

	std::vector<char> grammar_txt;
	std::vector<char> input_txt;

	if ( argc < 2 )
	{
		printf( "%s -g|--grammar grammar_fname -i|--input input_fname\n", argv[0] );
		printf( "\n" );
		return EXIT_FAILURE;
	}

	int argi = 1;
	while ( argi < argc )
	{
		if ( strcmp(argv[argi], "-g") == 0 || strcmp(argv[argi], "--grammar") == 0 )
		{
		    grammar_fn = argv[argi + 1];
		    argi += 2;
		}
		else if ( strcmp(argv[argi], "-i") == 0 || strcmp(argv[argi], "--input") == 0 )
		{
		    input_fn = argv[argi + 1];
		    argi += 2;
		}
	}

	if(grammar_fn != nullptr)
	{
		int rc = read_file(grammar_fn, grammar_txt);
		if(rc != EXIT_SUCCESS) return rc;
		printf("Grammar size = %d\n", (int)grammar_txt.size());
		lalr::GrammarCompiler compiler;
		lalr::ErrorPolicy error_policy;
		int errors = compiler.compile( &grammar_txt[0], &grammar_txt[0] + grammar_txt.size(), &error_policy );
		if(errors != 0)
		{
			printf("Error count = %d\n", errors);
			return EXIT_FAILURE;
		}
		if(input_fn != nullptr)
		{
			rc = read_file(input_fn, input_txt);
			if(rc != EXIT_SUCCESS) return rc;
			printf("Input size = %d\n", (int)input_txt.size());
			lalr::ErrorPolicy error_policy_input;
			lalr::Parser<const char*, int> parser( compiler.parser_state_machine(), &error_policy_input );
			parser.parse( &input_txt[0], &input_txt[0] + input_txt.size() );
			printf( "accepted = %d, full = %d\n", parser.accepted(),  parser.full());
		}
	}
	return EXIT_SUCCESS;
}

How to declare comments like the ones accepted by SQL ?

/*
Multi line comment
*/

-- till the end of line comment

Can you help make the attached example parse the cg_test.sql ?

Attached zip file contains:

unzip -l cql-test.zip 
Archive:  cql-test.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
    27765  2023-03-09 12:45   cql.g
   193316  2023-03-09 12:45   cg_test.sql
     2794  2023-03-09 12:27   grammar_test.cpp
---------                     -------
   223875                     3 files

cql-test.zip

@mingodad
Copy link
Contributor Author

mingodad commented Mar 9, 2023

When executing the example attached above I'm getting this:

valgrind ./grammar_test -g cql.g -i cg_test.sql
==19529== Memcheck, a memory error detector
==19529== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==19529== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==19529== Command: ./grammar_test -g cql.g -i cg_test.sql
==19529== 
Grammar size = 27765
==19529== Invalid read of size 8
==19529==    at 0x1388A8: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:1167)
==19529==    by 0x1388F2: std::shared_ptr<lalr::RegexNode>::shared_ptr(std::shared_ptr<lalr::RegexNode> const&) (shared_ptr.h:129)
==19529==    by 0x13B6B4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:151)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==    by 0x13CCC8: lalr::RegexSyntaxTree::parse_regular_expression(lalr::RegexToken const&) (RegexSyntaxTree.cpp:739)
==19529==    by 0x13CA4E: lalr::RegexSyntaxTree::calculate_combined_parse_tree(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&) (RegexSyntaxTree.cpp:686)
==19529==    by 0x13B503: lalr::RegexSyntaxTree::reset(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&, lalr::RegexGenerator*) (RegexSyntaxTree.cpp:122)
==19529==    by 0x12E49F: lalr::RegexGenerator::generate(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&, lalr::ErrorPolicy*) (RegexGenerator.cpp:176)
==19529==    by 0x12BD7D: lalr::RegexCompiler::compile(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&, lalr::ErrorPolicy*) (RegexCompiler.cpp:51)
==19529==    by 0x11750E: lalr::GrammarCompiler::compile(char const*, char const*, lalr::ErrorPolicy*, bool) (GrammarCompiler.cpp:118)
==19529==    by 0x10BE8C: main (grammar_test.cpp:88)
==19529==  Address 0x7eb41d0 is 16 bytes before a block of size 32 alloc'd
==19529==    at 0x4C336DB: operator new(unsigned long) (vg_replace_malloc.c:417)
==19529==    by 0x139041: __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::allocate(unsigned long, void const*) (new_allocator.h:114)
==19529==    by 0x138E29: std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::allocate(std::allocator<std::shared_ptr<lalr::RegexNode> >&, unsigned long) (alloc_traits.h:443)
==19529==    by 0x138A71: std::_Vector_base<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_allocate(unsigned long) (stl_vector.h:343)
==19529==    by 0x138213: void std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_realloc_insert<std::shared_ptr<lalr::RegexNode> const&>(__gnu_cxx::__normal_iterator<std::shared_ptr<lalr::RegexNode>*, std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > > >, std::shared_ptr<lalr::RegexNode> const&) (vector.tcc:440)
==19529==    by 0x137DF1: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::push_back(std::shared_ptr<lalr::RegexNode> const&) (stl_vector.h:1195)
==19529==    by 0x13BBA0: lalr::RegexSyntaxTree::end_bracket_expression() (RegexSyntaxTree.cpp:229)
==19529==    by 0x145395: lalr::RegexParser::match_bracket_expression() (RegexParser.cpp:133)
==19529==    by 0x1451C5: lalr::RegexParser::match_base_expression() (RegexParser.cpp:86)
==19529==    by 0x145115: lalr::RegexParser::match_postfix_expression() (RegexParser.cpp:63)
==19529==    by 0x1450CB: lalr::RegexParser::match_cat_expression() (RegexParser.cpp:50)
==19529==    by 0x14505B: lalr::RegexParser::match_or_expression() (RegexParser.cpp:37)
==19529== 
==19529== Invalid read of size 8
==19529==    at 0x1208D8: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:734)
==19529==    by 0x1388CC: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:1167)
==19529==    by 0x1388F2: std::shared_ptr<lalr::RegexNode>::shared_ptr(std::shared_ptr<lalr::RegexNode> const&) (shared_ptr.h:129)
==19529==    by 0x13B6B4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:151)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==    by 0x13CCC8: lalr::RegexSyntaxTree::parse_regular_expression(lalr::RegexToken const&) (RegexSyntaxTree.cpp:739)
==19529==    by 0x13CA4E: lalr::RegexSyntaxTree::calculate_combined_parse_tree(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&) (RegexSyntaxTree.cpp:686)
==19529==    by 0x13B503: lalr::RegexSyntaxTree::reset(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&, lalr::RegexGenerator*) (RegexSyntaxTree.cpp:122)
==19529==    by 0x12E49F: lalr::RegexGenerator::generate(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&, lalr::ErrorPolicy*) (RegexGenerator.cpp:176)
==19529==    by 0x12BD7D: lalr::RegexCompiler::compile(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&, lalr::ErrorPolicy*) (RegexCompiler.cpp:51)
==19529==    by 0x11750E: lalr::GrammarCompiler::compile(char const*, char const*, lalr::ErrorPolicy*, bool) (GrammarCompiler.cpp:118)
==19529==  Address 0x7eb41d8 is 8 bytes before a block of size 32 alloc'd
==19529==    at 0x4C336DB: operator new(unsigned long) (vg_replace_malloc.c:417)
==19529==    by 0x139041: __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::allocate(unsigned long, void const*) (new_allocator.h:114)
==19529==    by 0x138E29: std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::allocate(std::allocator<std::shared_ptr<lalr::RegexNode> >&, unsigned long) (alloc_traits.h:443)
==19529==    by 0x138A71: std::_Vector_base<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_allocate(unsigned long) (stl_vector.h:343)
==19529==    by 0x138213: void std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_realloc_insert<std::shared_ptr<lalr::RegexNode> const&>(__gnu_cxx::__normal_iterator<std::shared_ptr<lalr::RegexNode>*, std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > > >, std::shared_ptr<lalr::RegexNode> const&) (vector.tcc:440)
==19529==    by 0x137DF1: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::push_back(std::shared_ptr<lalr::RegexNode> const&) (stl_vector.h:1195)
==19529==    by 0x13BBA0: lalr::RegexSyntaxTree::end_bracket_expression() (RegexSyntaxTree.cpp:229)
==19529==    by 0x145395: lalr::RegexParser::match_bracket_expression() (RegexParser.cpp:133)
==19529==    by 0x1451C5: lalr::RegexParser::match_base_expression() (RegexParser.cpp:86)
==19529==    by 0x145115: lalr::RegexParser::match_postfix_expression() (RegexParser.cpp:63)
==19529==    by 0x1450CB: lalr::RegexParser::match_cat_expression() (RegexParser.cpp:50)
==19529==    by 0x14505B: lalr::RegexParser::match_or_expression() (RegexParser.cpp:37)
==19529== 
==19529== Invalid read of size 4
==19529==    at 0x11F71C: __gnu_cxx::__atomic_add_single(int*, int) (atomicity.h:74)
==19529==    by 0x11F7B0: __gnu_cxx::__atomic_add_dispatch(int*, int) (atomicity.h:98)
==19529==    by 0x1217D8: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy() (shared_ptr_base.h:139)
==19529==    by 0x1208FC: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:737)
==19529==    by 0x1388CC: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:1167)
==19529==    by 0x1388F2: std::shared_ptr<lalr::RegexNode>::shared_ptr(std::shared_ptr<lalr::RegexNode> const&) (shared_ptr.h:129)
==19529==    by 0x13B6B4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:151)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==    by 0x13CCC8: lalr::RegexSyntaxTree::parse_regular_expression(lalr::RegexToken const&) (RegexSyntaxTree.cpp:739)
==19529==    by 0x13CA4E: lalr::RegexSyntaxTree::calculate_combined_parse_tree(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&) (RegexSyntaxTree.cpp:686)
==19529==    by 0x13B503: lalr::RegexSyntaxTree::reset(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&, lalr::RegexGenerator*) (RegexSyntaxTree.cpp:122)
==19529==  Address 0x7eb41d0 is 16 bytes before a block of size 32 alloc'd
==19529==    at 0x4C336DB: operator new(unsigned long) (vg_replace_malloc.c:417)
==19529==    by 0x139041: __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::allocate(unsigned long, void const*) (new_allocator.h:114)
==19529==    by 0x138E29: std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::allocate(std::allocator<std::shared_ptr<lalr::RegexNode> >&, unsigned long) (alloc_traits.h:443)
==19529==    by 0x138A71: std::_Vector_base<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_allocate(unsigned long) (stl_vector.h:343)
==19529==    by 0x138213: void std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_realloc_insert<std::shared_ptr<lalr::RegexNode> const&>(__gnu_cxx::__normal_iterator<std::shared_ptr<lalr::RegexNode>*, std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > > >, std::shared_ptr<lalr::RegexNode> const&) (vector.tcc:440)
==19529==    by 0x137DF1: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::push_back(std::shared_ptr<lalr::RegexNode> const&) (stl_vector.h:1195)
==19529==    by 0x13BBA0: lalr::RegexSyntaxTree::end_bracket_expression() (RegexSyntaxTree.cpp:229)
==19529==    by 0x145395: lalr::RegexParser::match_bracket_expression() (RegexParser.cpp:133)
==19529==    by 0x1451C5: lalr::RegexParser::match_base_expression() (RegexParser.cpp:86)
==19529==    by 0x145115: lalr::RegexParser::match_postfix_expression() (RegexParser.cpp:63)
==19529==    by 0x1450CB: lalr::RegexParser::match_cat_expression() (RegexParser.cpp:50)
==19529==    by 0x14505B: lalr::RegexParser::match_or_expression() (RegexParser.cpp:37)
==19529== 
==19529== Invalid write of size 4
==19529==    at 0x11F727: __gnu_cxx::__atomic_add_single(int*, int) (atomicity.h:74)
==19529==    by 0x11F7B0: __gnu_cxx::__atomic_add_dispatch(int*, int) (atomicity.h:98)
==19529==    by 0x1217D8: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy() (shared_ptr_base.h:139)
==19529==    by 0x1208FC: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count(std::__shared_count<(__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:737)
==19529==    by 0x1388CC: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2> const&) (shared_ptr_base.h:1167)
==19529==    by 0x1388F2: std::shared_ptr<lalr::RegexNode>::shared_ptr(std::shared_ptr<lalr::RegexNode> const&) (shared_ptr.h:129)
==19529==    by 0x13B6B4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:151)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==    by 0x13CCC8: lalr::RegexSyntaxTree::parse_regular_expression(lalr::RegexToken const&) (RegexSyntaxTree.cpp:739)
==19529==    by 0x13CA4E: lalr::RegexSyntaxTree::calculate_combined_parse_tree(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&) (RegexSyntaxTree.cpp:686)
==19529==    by 0x13B503: lalr::RegexSyntaxTree::reset(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&, lalr::RegexGenerator*) (RegexSyntaxTree.cpp:122)
==19529==  Address 0x7eb41d0 is 16 bytes before a block of size 32 alloc'd
==19529==    at 0x4C336DB: operator new(unsigned long) (vg_replace_malloc.c:417)
==19529==    by 0x139041: __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::allocate(unsigned long, void const*) (new_allocator.h:114)
==19529==    by 0x138E29: std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::allocate(std::allocator<std::shared_ptr<lalr::RegexNode> >&, unsigned long) (alloc_traits.h:443)
==19529==    by 0x138A71: std::_Vector_base<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_allocate(unsigned long) (stl_vector.h:343)
==19529==    by 0x138213: void std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_realloc_insert<std::shared_ptr<lalr::RegexNode> const&>(__gnu_cxx::__normal_iterator<std::shared_ptr<lalr::RegexNode>*, std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > > >, std::shared_ptr<lalr::RegexNode> const&) (vector.tcc:440)
==19529==    by 0x137DF1: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::push_back(std::shared_ptr<lalr::RegexNode> const&) (stl_vector.h:1195)
==19529==    by 0x13BBA0: lalr::RegexSyntaxTree::end_bracket_expression() (RegexSyntaxTree.cpp:229)
==19529==    by 0x145395: lalr::RegexParser::match_bracket_expression() (RegexParser.cpp:133)
==19529==    by 0x1451C5: lalr::RegexParser::match_base_expression() (RegexParser.cpp:86)
==19529==    by 0x145115: lalr::RegexParser::match_postfix_expression() (RegexParser.cpp:63)
==19529==    by 0x1450CB: lalr::RegexParser::match_cat_expression() (RegexParser.cpp:50)
==19529==    by 0x14505B: lalr::RegexParser::match_or_expression() (RegexParser.cpp:37)
==19529== 
==19529== Invalid read of size 8
==19529==    at 0x120286: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:729)
==19529==    by 0x132E4D: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:1169)
==19529==    by 0x132E69: std::shared_ptr<lalr::RegexNode>::~shared_ptr() (shared_ptr.h:103)
==19529==    by 0x138ECF: void __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::destroy<std::shared_ptr<lalr::RegexNode> >(std::shared_ptr<lalr::RegexNode>*) (new_allocator.h:152)
==19529==    by 0x138B23: void std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::destroy<std::shared_ptr<lalr::RegexNode> >(std::allocator<std::shared_ptr<lalr::RegexNode> >&, std::shared_ptr<lalr::RegexNode>*) (alloc_traits.h:496)
==19529==    by 0x13D8E6: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::pop_back() (stl_vector.h:1226)
==19529==    by 0x13B6C4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:152)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==    by 0x13CCC8: lalr::RegexSyntaxTree::parse_regular_expression(lalr::RegexToken const&) (RegexSyntaxTree.cpp:739)
==19529==    by 0x13CA4E: lalr::RegexSyntaxTree::calculate_combined_parse_tree(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&) (RegexSyntaxTree.cpp:686)
==19529==    by 0x13B503: lalr::RegexSyntaxTree::reset(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&, lalr::RegexGenerator*) (RegexSyntaxTree.cpp:122)
==19529==  Address 0x7eb41d8 is 8 bytes before a block of size 32 alloc'd
==19529==    at 0x4C336DB: operator new(unsigned long) (vg_replace_malloc.c:417)
==19529==    by 0x139041: __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::allocate(unsigned long, void const*) (new_allocator.h:114)
==19529==    by 0x138E29: std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::allocate(std::allocator<std::shared_ptr<lalr::RegexNode> >&, unsigned long) (alloc_traits.h:443)
==19529==    by 0x138A71: std::_Vector_base<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_allocate(unsigned long) (stl_vector.h:343)
==19529==    by 0x138213: void std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_realloc_insert<std::shared_ptr<lalr::RegexNode> const&>(__gnu_cxx::__normal_iterator<std::shared_ptr<lalr::RegexNode>*, std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > > >, std::shared_ptr<lalr::RegexNode> const&) (vector.tcc:440)
==19529==    by 0x137DF1: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::push_back(std::shared_ptr<lalr::RegexNode> const&) (stl_vector.h:1195)
==19529==    by 0x13BBA0: lalr::RegexSyntaxTree::end_bracket_expression() (RegexSyntaxTree.cpp:229)
==19529==    by 0x145395: lalr::RegexParser::match_bracket_expression() (RegexParser.cpp:133)
==19529==    by 0x1451C5: lalr::RegexParser::match_base_expression() (RegexParser.cpp:86)
==19529==    by 0x145115: lalr::RegexParser::match_postfix_expression() (RegexParser.cpp:63)
==19529==    by 0x1450CB: lalr::RegexParser::match_cat_expression() (RegexParser.cpp:50)
==19529==    by 0x14505B: lalr::RegexParser::match_or_expression() (RegexParser.cpp:37)
==19529== 
==19529== Invalid read of size 8
==19529==    at 0x120292: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:730)
==19529==    by 0x132E4D: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:1169)
==19529==    by 0x132E69: std::shared_ptr<lalr::RegexNode>::~shared_ptr() (shared_ptr.h:103)
==19529==    by 0x138ECF: void __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::destroy<std::shared_ptr<lalr::RegexNode> >(std::shared_ptr<lalr::RegexNode>*) (new_allocator.h:152)
==19529==    by 0x138B23: void std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::destroy<std::shared_ptr<lalr::RegexNode> >(std::allocator<std::shared_ptr<lalr::RegexNode> >&, std::shared_ptr<lalr::RegexNode>*) (alloc_traits.h:496)
==19529==    by 0x13D8E6: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::pop_back() (stl_vector.h:1226)
==19529==    by 0x13B6C4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:152)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==    by 0x13CCC8: lalr::RegexSyntaxTree::parse_regular_expression(lalr::RegexToken const&) (RegexSyntaxTree.cpp:739)
==19529==    by 0x13CA4E: lalr::RegexSyntaxTree::calculate_combined_parse_tree(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&) (RegexSyntaxTree.cpp:686)
==19529==    by 0x13B503: lalr::RegexSyntaxTree::reset(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&, lalr::RegexGenerator*) (RegexSyntaxTree.cpp:122)
==19529==  Address 0x7eb41d8 is 8 bytes before a block of size 32 alloc'd
==19529==    at 0x4C336DB: operator new(unsigned long) (vg_replace_malloc.c:417)
==19529==    by 0x139041: __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::allocate(unsigned long, void const*) (new_allocator.h:114)
==19529==    by 0x138E29: std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::allocate(std::allocator<std::shared_ptr<lalr::RegexNode> >&, unsigned long) (alloc_traits.h:443)
==19529==    by 0x138A71: std::_Vector_base<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_allocate(unsigned long) (stl_vector.h:343)
==19529==    by 0x138213: void std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_realloc_insert<std::shared_ptr<lalr::RegexNode> const&>(__gnu_cxx::__normal_iterator<std::shared_ptr<lalr::RegexNode>*, std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > > >, std::shared_ptr<lalr::RegexNode> const&) (vector.tcc:440)
==19529==    by 0x137DF1: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::push_back(std::shared_ptr<lalr::RegexNode> const&) (stl_vector.h:1195)
==19529==    by 0x13BBA0: lalr::RegexSyntaxTree::end_bracket_expression() (RegexSyntaxTree.cpp:229)
==19529==    by 0x145395: lalr::RegexParser::match_bracket_expression() (RegexParser.cpp:133)
==19529==    by 0x1451C5: lalr::RegexParser::match_base_expression() (RegexParser.cpp:86)
==19529==    by 0x145115: lalr::RegexParser::match_postfix_expression() (RegexParser.cpp:63)
==19529==    by 0x1450CB: lalr::RegexParser::match_cat_expression() (RegexParser.cpp:50)
==19529==    by 0x14505B: lalr::RegexParser::match_or_expression() (RegexParser.cpp:37)
==19529== 
==19529== Invalid read of size 4
==19529==    at 0x11F6F2: __gnu_cxx::__exchange_and_add_single(int*, int) (atomicity.h:67)
==19529==    by 0x11F76C: __gnu_cxx::__exchange_and_add_dispatch(int*, int) (atomicity.h:84)
==19529==    by 0x121140: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:152)
==19529==    by 0x12029C: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:730)
==19529==    by 0x132E4D: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:1169)
==19529==    by 0x132E69: std::shared_ptr<lalr::RegexNode>::~shared_ptr() (shared_ptr.h:103)
==19529==    by 0x138ECF: void __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::destroy<std::shared_ptr<lalr::RegexNode> >(std::shared_ptr<lalr::RegexNode>*) (new_allocator.h:152)
==19529==    by 0x138B23: void std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::destroy<std::shared_ptr<lalr::RegexNode> >(std::allocator<std::shared_ptr<lalr::RegexNode> >&, std::shared_ptr<lalr::RegexNode>*) (alloc_traits.h:496)
==19529==    by 0x13D8E6: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::pop_back() (stl_vector.h:1226)
==19529==    by 0x13B6C4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:152)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==  Address 0x7eb41d0 is 16 bytes before a block of size 32 alloc'd
==19529==    at 0x4C336DB: operator new(unsigned long) (vg_replace_malloc.c:417)
==19529==    by 0x139041: __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::allocate(unsigned long, void const*) (new_allocator.h:114)
==19529==    by 0x138E29: std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::allocate(std::allocator<std::shared_ptr<lalr::RegexNode> >&, unsigned long) (alloc_traits.h:443)
==19529==    by 0x138A71: std::_Vector_base<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_allocate(unsigned long) (stl_vector.h:343)
==19529==    by 0x138213: void std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_realloc_insert<std::shared_ptr<lalr::RegexNode> const&>(__gnu_cxx::__normal_iterator<std::shared_ptr<lalr::RegexNode>*, std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > > >, std::shared_ptr<lalr::RegexNode> const&) (vector.tcc:440)
==19529==    by 0x137DF1: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::push_back(std::shared_ptr<lalr::RegexNode> const&) (stl_vector.h:1195)
==19529==    by 0x13BBA0: lalr::RegexSyntaxTree::end_bracket_expression() (RegexSyntaxTree.cpp:229)
==19529==    by 0x145395: lalr::RegexParser::match_bracket_expression() (RegexParser.cpp:133)
==19529==    by 0x1451C5: lalr::RegexParser::match_base_expression() (RegexParser.cpp:86)
==19529==    by 0x145115: lalr::RegexParser::match_postfix_expression() (RegexParser.cpp:63)
==19529==    by 0x1450CB: lalr::RegexParser::match_cat_expression() (RegexParser.cpp:50)
==19529==    by 0x14505B: lalr::RegexParser::match_or_expression() (RegexParser.cpp:37)
==19529== 
==19529== Invalid read of size 4
==19529==    at 0x11F6FB: __gnu_cxx::__exchange_and_add_single(int*, int) (atomicity.h:68)
==19529==    by 0x11F76C: __gnu_cxx::__exchange_and_add_dispatch(int*, int) (atomicity.h:84)
==19529==    by 0x121140: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:152)
==19529==    by 0x12029C: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:730)
==19529==    by 0x132E4D: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:1169)
==19529==    by 0x132E69: std::shared_ptr<lalr::RegexNode>::~shared_ptr() (shared_ptr.h:103)
==19529==    by 0x138ECF: void __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::destroy<std::shared_ptr<lalr::RegexNode> >(std::shared_ptr<lalr::RegexNode>*) (new_allocator.h:152)
==19529==    by 0x138B23: void std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::destroy<std::shared_ptr<lalr::RegexNode> >(std::allocator<std::shared_ptr<lalr::RegexNode> >&, std::shared_ptr<lalr::RegexNode>*) (alloc_traits.h:496)
==19529==    by 0x13D8E6: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::pop_back() (stl_vector.h:1226)
==19529==    by 0x13B6C4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:152)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==  Address 0x7eb41d0 is 16 bytes before a block of size 32 alloc'd
==19529==    at 0x4C336DB: operator new(unsigned long) (vg_replace_malloc.c:417)
==19529==    by 0x139041: __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::allocate(unsigned long, void const*) (new_allocator.h:114)
==19529==    by 0x138E29: std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::allocate(std::allocator<std::shared_ptr<lalr::RegexNode> >&, unsigned long) (alloc_traits.h:443)
==19529==    by 0x138A71: std::_Vector_base<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_allocate(unsigned long) (stl_vector.h:343)
==19529==    by 0x138213: void std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_realloc_insert<std::shared_ptr<lalr::RegexNode> const&>(__gnu_cxx::__normal_iterator<std::shared_ptr<lalr::RegexNode>*, std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > > >, std::shared_ptr<lalr::RegexNode> const&) (vector.tcc:440)
==19529==    by 0x137DF1: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::push_back(std::shared_ptr<lalr::RegexNode> const&) (stl_vector.h:1195)
==19529==    by 0x13BBA0: lalr::RegexSyntaxTree::end_bracket_expression() (RegexSyntaxTree.cpp:229)
==19529==    by 0x145395: lalr::RegexParser::match_bracket_expression() (RegexParser.cpp:133)
==19529==    by 0x1451C5: lalr::RegexParser::match_base_expression() (RegexParser.cpp:86)
==19529==    by 0x145115: lalr::RegexParser::match_postfix_expression() (RegexParser.cpp:63)
==19529==    by 0x1450CB: lalr::RegexParser::match_cat_expression() (RegexParser.cpp:50)
==19529==    by 0x14505B: lalr::RegexParser::match_or_expression() (RegexParser.cpp:37)
==19529== 
==19529== Invalid write of size 4
==19529==    at 0x11F706: __gnu_cxx::__exchange_and_add_single(int*, int) (atomicity.h:68)
==19529==    by 0x11F76C: __gnu_cxx::__exchange_and_add_dispatch(int*, int) (atomicity.h:84)
==19529==    by 0x121140: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:152)
==19529==    by 0x12029C: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:730)
==19529==    by 0x132E4D: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:1169)
==19529==    by 0x132E69: std::shared_ptr<lalr::RegexNode>::~shared_ptr() (shared_ptr.h:103)
==19529==    by 0x138ECF: void __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::destroy<std::shared_ptr<lalr::RegexNode> >(std::shared_ptr<lalr::RegexNode>*) (new_allocator.h:152)
==19529==    by 0x138B23: void std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::destroy<std::shared_ptr<lalr::RegexNode> >(std::allocator<std::shared_ptr<lalr::RegexNode> >&, std::shared_ptr<lalr::RegexNode>*) (alloc_traits.h:496)
==19529==    by 0x13D8E6: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::pop_back() (stl_vector.h:1226)
==19529==    by 0x13B6C4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:152)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==  Address 0x7eb41d0 is 16 bytes before a block of size 32 alloc'd
==19529==    at 0x4C336DB: operator new(unsigned long) (vg_replace_malloc.c:417)
==19529==    by 0x139041: __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::allocate(unsigned long, void const*) (new_allocator.h:114)
==19529==    by 0x138E29: std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::allocate(std::allocator<std::shared_ptr<lalr::RegexNode> >&, unsigned long) (alloc_traits.h:443)
==19529==    by 0x138A71: std::_Vector_base<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_allocate(unsigned long) (stl_vector.h:343)
==19529==    by 0x138213: void std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_realloc_insert<std::shared_ptr<lalr::RegexNode> const&>(__gnu_cxx::__normal_iterator<std::shared_ptr<lalr::RegexNode>*, std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > > >, std::shared_ptr<lalr::RegexNode> const&) (vector.tcc:440)
==19529==    by 0x137DF1: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::push_back(std::shared_ptr<lalr::RegexNode> const&) (stl_vector.h:1195)
==19529==    by 0x13BBA0: lalr::RegexSyntaxTree::end_bracket_expression() (RegexSyntaxTree.cpp:229)
==19529==    by 0x145395: lalr::RegexParser::match_bracket_expression() (RegexParser.cpp:133)
==19529==    by 0x1451C5: lalr::RegexParser::match_base_expression() (RegexParser.cpp:86)
==19529==    by 0x145115: lalr::RegexParser::match_postfix_expression() (RegexParser.cpp:63)
==19529==    by 0x1450CB: lalr::RegexParser::match_cat_expression() (RegexParser.cpp:50)
==19529==    by 0x14505B: lalr::RegexParser::match_or_expression() (RegexParser.cpp:37)
==19529== 
==19529== Invalid read of size 8
==19529==    at 0x12114F: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:155)
==19529==    by 0x12029C: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:730)
==19529==    by 0x132E4D: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:1169)
==19529==    by 0x132E69: std::shared_ptr<lalr::RegexNode>::~shared_ptr() (shared_ptr.h:103)
==19529==    by 0x138ECF: void __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::destroy<std::shared_ptr<lalr::RegexNode> >(std::shared_ptr<lalr::RegexNode>*) (new_allocator.h:152)
==19529==    by 0x138B23: void std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::destroy<std::shared_ptr<lalr::RegexNode> >(std::allocator<std::shared_ptr<lalr::RegexNode> >&, std::shared_ptr<lalr::RegexNode>*) (alloc_traits.h:496)
==19529==    by 0x13D8E6: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::pop_back() (stl_vector.h:1226)
==19529==    by 0x13B6C4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:152)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==    by 0x13CCC8: lalr::RegexSyntaxTree::parse_regular_expression(lalr::RegexToken const&) (RegexSyntaxTree.cpp:739)
==19529==    by 0x13CA4E: lalr::RegexSyntaxTree::calculate_combined_parse_tree(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&) (RegexSyntaxTree.cpp:686)
==19529==  Address 0x7eb41c8 is 24 bytes before a block of size 32 alloc'd
==19529==    at 0x4C336DB: operator new(unsigned long) (vg_replace_malloc.c:417)
==19529==    by 0x139041: __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::allocate(unsigned long, void const*) (new_allocator.h:114)
==19529==    by 0x138E29: std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::allocate(std::allocator<std::shared_ptr<lalr::RegexNode> >&, unsigned long) (alloc_traits.h:443)
==19529==    by 0x138A71: std::_Vector_base<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_allocate(unsigned long) (stl_vector.h:343)
==19529==    by 0x138213: void std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::_M_realloc_insert<std::shared_ptr<lalr::RegexNode> const&>(__gnu_cxx::__normal_iterator<std::shared_ptr<lalr::RegexNode>*, std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > > >, std::shared_ptr<lalr::RegexNode> const&) (vector.tcc:440)
==19529==    by 0x137DF1: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::push_back(std::shared_ptr<lalr::RegexNode> const&) (stl_vector.h:1195)
==19529==    by 0x13BBA0: lalr::RegexSyntaxTree::end_bracket_expression() (RegexSyntaxTree.cpp:229)
==19529==    by 0x145395: lalr::RegexParser::match_bracket_expression() (RegexParser.cpp:133)
==19529==    by 0x1451C5: lalr::RegexParser::match_base_expression() (RegexParser.cpp:86)
==19529==    by 0x145115: lalr::RegexParser::match_postfix_expression() (RegexParser.cpp:63)
==19529==    by 0x1450CB: lalr::RegexParser::match_cat_expression() (RegexParser.cpp:50)
==19529==    by 0x14505B: lalr::RegexParser::match_or_expression() (RegexParser.cpp:37)
==19529== 
==19529== Invalid read of size 8
==19529==    at 0x121156: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:155)
==19529==    by 0x12029C: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:730)
==19529==    by 0x132E4D: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:1169)
==19529==    by 0x132E69: std::shared_ptr<lalr::RegexNode>::~shared_ptr() (shared_ptr.h:103)
==19529==    by 0x138ECF: void __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::destroy<std::shared_ptr<lalr::RegexNode> >(std::shared_ptr<lalr::RegexNode>*) (new_allocator.h:152)
==19529==    by 0x138B23: void std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::destroy<std::shared_ptr<lalr::RegexNode> >(std::allocator<std::shared_ptr<lalr::RegexNode> >&, std::shared_ptr<lalr::RegexNode>*) (alloc_traits.h:496)
==19529==    by 0x13D8E6: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::pop_back() (stl_vector.h:1226)
==19529==    by 0x13B6C4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:152)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==    by 0x13CCC8: lalr::RegexSyntaxTree::parse_regular_expression(lalr::RegexToken const&) (RegexSyntaxTree.cpp:739)
==19529==    by 0x13CA4E: lalr::RegexSyntaxTree::calculate_combined_parse_tree(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&) (RegexSyntaxTree.cpp:686)
==19529==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
==19529== 
==19529== 
==19529== Process terminating with default action of signal 11 (SIGSEGV)
==19529==  Access not within mapped region at address 0x10
==19529==    at 0x121156: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:155)
==19529==    by 0x12029C: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (shared_ptr_base.h:730)
==19529==    by 0x132E4D: std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (shared_ptr_base.h:1169)
==19529==    by 0x132E69: std::shared_ptr<lalr::RegexNode>::~shared_ptr() (shared_ptr.h:103)
==19529==    by 0x138ECF: void __gnu_cxx::new_allocator<std::shared_ptr<lalr::RegexNode> >::destroy<std::shared_ptr<lalr::RegexNode> >(std::shared_ptr<lalr::RegexNode>*) (new_allocator.h:152)
==19529==    by 0x138B23: void std::allocator_traits<std::allocator<std::shared_ptr<lalr::RegexNode> > >::destroy<std::shared_ptr<lalr::RegexNode> >(std::allocator<std::shared_ptr<lalr::RegexNode> >&, std::shared_ptr<lalr::RegexNode>*) (alloc_traits.h:496)
==19529==    by 0x13D8E6: std::vector<std::shared_ptr<lalr::RegexNode>, std::allocator<std::shared_ptr<lalr::RegexNode> > >::pop_back() (stl_vector.h:1226)
==19529==    by 0x13B6C4: lalr::RegexSyntaxTree::or_expression() (RegexSyntaxTree.cpp:152)
==19529==    by 0x1450A5: lalr::RegexParser::match_or_expression() (RegexParser.cpp:41)
==19529==    by 0x145025: lalr::RegexParser::parse(char const*, char const*) (RegexParser.cpp:32)
==19529==    by 0x13CCC8: lalr::RegexSyntaxTree::parse_regular_expression(lalr::RegexToken const&) (RegexSyntaxTree.cpp:739)
==19529==    by 0x13CA4E: lalr::RegexSyntaxTree::calculate_combined_parse_tree(std::vector<lalr::RegexToken, std::allocator<lalr::RegexToken> > const&) (RegexSyntaxTree.cpp:686)
==19529==  If you believe this happened as a result of a stack
==19529==  overflow in your program's main thread (unlikely but
==19529==  possible), you can try to increase the size of the
==19529==  main thread stack using the --main-stacksize= flag.
==19529==  The main thread stack size used in this run was 8388608.
==19529== 
==19529== HEAP SUMMARY:
==19529==     in use at exit: 37,577,545 bytes in 856,454 blocks
==19529==   total heap usage: 5,431,558 allocs, 4,575,104 frees, 260,331,417 bytes allocated
==19529== 
==19529== LEAK SUMMARY:
==19529==    definitely lost: 0 bytes in 0 blocks
==19529==    indirectly lost: 0 bytes in 0 blocks
==19529==      possibly lost: 0 bytes in 0 blocks
==19529==    still reachable: 37,577,545 bytes in 856,454 blocks
==19529==         suppressed: 0 bytes in 0 blocks
==19529== Rerun with --leak-check=full to see details of leaked memory
==19529== 
==19529== For lists of detected and suppressed errors, rerun with: -s
==19529== ERROR SUMMARY: 11 errors from 11 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

@mingodad
Copy link
Contributor Author

mingodad commented Mar 10, 2023

I just added this test:

    TEST( MultiWordKeywords )
    {
        void* union_all;
        RegexCompiler compiler;
        //compiler.compile( "UNION[:space:]+ALL", &union_all );
        compiler.compile( "(UNION[ \f\t\n]+ALL)|UNION", &union_all );
        Lexer<PositionIterator<const char*> > lexer( compiler.state_machine(), nullptr );

        const char* str_UA = "UNION   ALL";
        lexer.reset( PositionIterator<const char*>(str_UA, str_UA + strlen(str_UA)), PositionIterator<const char*>() );
        lexer.advance();
        CHECK( lexer.symbol() == &union_all );
	//printf("str_UA lexeme = %s\n", lexer.lexeme().c_str());
        CHECK( lexer.lexeme() == "UNION   ALL" );

        const char* str_U = "UNION";
        lexer.reset( PositionIterator<const char*>(str_U, str_U + strlen(str_U)), PositionIterator<const char*>() );
        lexer.advance();
        CHECK( lexer.symbol() == &union_all );
        CHECK( lexer.lexeme() == "UNION" );

        const char* str_UA2 = "UNION\n\tALL";
        lexer.reset( PositionIterator<const char*>(str_UA2, str_UA2 + strlen(str_UA2)), PositionIterator<const char*>() );
        lexer.advance();
	//printf("str_UA2 lexeme = %s\n", lexer.lexeme().c_str());
        CHECK( lexer.symbol() == &union_all );
        CHECK( lexer.lexeme() == "UNION\n\tALL" );
    }

And if I use [ \f\t\n]+ instead of [:space:] then the tests pass otherwise they fail.

@mingodad
Copy link
Contributor Author

I also noticed that in several tests there is an unnecessary repetition of string literals, see the same test shown above without the repetitions:

    TEST( MultiWordKeywords )
    {
        void* union_all;
        RegexCompiler compiler;
        //compiler.compile( "UNION[:space:]+ALL", &union_all );
        compiler.compile( "(UNION[ \f\t\n]+ALL)|UNION", &union_all );
        Lexer<PositionIterator<const char*> > lexer( compiler.state_machine(), nullptr );

        const char* str_UA = "UNION   ALL";
        lexer.reset( PositionIterator<const char*>(str_UA, str_UA + strlen(str_UA)), PositionIterator<const char*>() );
        lexer.advance();
        CHECK( lexer.symbol() == &union_all );
	//printf("str_UA lexeme = %s\n", lexer.lexeme().c_str());
        CHECK( lexer.lexeme() == str_UA );

        const char* str_U = "UNION";
        lexer.reset( PositionIterator<const char*>(str_U, str_U + strlen(str_U)), PositionIterator<const char*>() );
        lexer.advance();
        CHECK( lexer.symbol() == &union_all );
        CHECK( lexer.lexeme() == str_U );

        const char* str_UA2 = "UNION\n\tALL";
        lexer.reset( PositionIterator<const char*>(str_UA2, str_UA2 + strlen(str_UA2)), PositionIterator<const char*>() );
        lexer.advance();
	//printf("str_UA2 lexeme = %s\n", lexer.lexeme().c_str());
        CHECK( lexer.symbol() == &union_all );
        CHECK( lexer.lexeme() == str_UA2 );
    }

@mingodad
Copy link
Contributor Author

I was mistakenly using [:space:] instead of [[:space:]], but even after fixing that when trying to execute cql.g witn cql_test.sql there is a segmentation fault:

gdb -args ./grammar_test -g cql.g -i cg_test.sql 
GNU gdb (Ubuntu 10.2-0ubuntu1~18.04~2) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./grammar_test...
(gdb) r
Starting program: /home/mingo/dev/c/A_grammars/lalr-dad/src/lalr/lalr_examples/grammar_test -g cql.g -i cg_test.sql
Grammar size = 27821

Program received signal SIGSEGV, Segmentation fault.
0x000055555556b71c in __gnu_cxx::__atomic_add_single (__mem=0x39, __val=1) at /usr/include/c++/9/ext/atomicity.h:74
74	  { *__mem += __val; }
(gdb) bt
#0  0x000055555556b71c in __gnu_cxx::__atomic_add_single (__mem=0x39, __val=1)
    at /usr/include/c++/9/ext/atomicity.h:74
#1  0x000055555556b7b1 in __gnu_cxx::__atomic_add_dispatch (__mem=0x39, __val=1)
    at /usr/include/c++/9/ext/atomicity.h:98
#2  0x000055555556d7d9 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy (this=0x31)
    at /usr/include/c++/9/bits/shared_ptr_base.h:139
#3  0x000055555556c8fd in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count (this=0x7fffffffd068, 
    __r=...) at /usr/include/c++/9/bits/shared_ptr_base.h:737
#4  0x00005555555848cd in std::__shared_ptr<lalr::RegexNode, (__gnu_cxx::_Lock_policy)2>::__shared_ptr (
    this=0x7fffffffd060) at /usr/include/c++/9/bits/shared_ptr_base.h:1167
#5  0x00005555555848f3 in std::shared_ptr<lalr::RegexNode>::shared_ptr (this=0x7fffffffd060)
    at /usr/include/c++/9/bits/shared_ptr.h:129
#6  0x00005555555876b5 in lalr::RegexSyntaxTree::or_expression (this=0x555555b8d420)
    at /home/mingo/dev/c/A_grammars/lalr-dad/src/lalr/RegexSyntaxTree.cpp:151
#7  0x00005555555910a6 in lalr::RegexParser::match_or_expression (this=0x7fffffffd110)
    at /home/mingo/dev/c/A_grammars/lalr-dad/src/lalr/RegexParser.cpp:41
#8  0x0000555555591026 in lalr::RegexParser::parse (this=0x7fffffffd110, 
    begin=0x5555557d4ae0 "([ \\t\\r\\n]*)|(--[^\\n]*)", end=0x5555557d4af7 "")
    at /home/mingo/dev/c/A_grammars/lalr-dad/src/lalr/RegexParser.cpp:32
#9  0x0000555555588cc9 in lalr::RegexSyntaxTree::parse_regular_expression (this=0x555555b8d420, token=...)
    at /home/mingo/dev/c/A_grammars/lalr-dad/src/lalr/RegexSyntaxTree.cpp:739
#10 0x0000555555588a4f in lalr::RegexSyntaxTree::calculate_combined_parse_tree (this=0x555555b8d420, 
    tokens=std::vector of length 1, capacity 1 = {...})
    at /home/mingo/dev/c/A_grammars/lalr-dad/src/lalr/RegexSyntaxTree.cpp:686
#11 0x0000555555587504 in lalr::RegexSyntaxTree::reset (this=0x555555b8d420, 
    tokens=std::vector of length 1, capacity 1 = {...}, generator=0x7fffffffd230)
--Type <RET> for more, q to quit, c to continue without paging--

@mingodad
Copy link
Contributor Author

When I try this with grep I get an informative error message:

grep '[:punct:]' *
grep: character class syntax is [[:space:]], not [:space:]

@cwbaker
Copy link
Owner

cwbaker commented Mar 15, 2023

How to declare comments like the ones accepted by SQL ?

The easiest way is to use lexer actions within the whitespace token. See the tests LineCommentInWhitespaceDirective and BlockCommentInWhitespaceDirective for example. The %whitespace token in your SQL grammar should look something like:

%whitespace "([ \t\r\n]|\-\-:line_comment:|/\*:block_comment:)*";

Then you need to register handlers for the line_comment and block_comment lexer actions when creating your parser in C++. With the latest simplifications I've just added to main something like the following should do it:

#include <lalr/line_comment.hpp>
#include <lalr/block_comment.hpp>

//...
lalr::Parser<const char*, int> parser( parser_state_machine, &error_policy );
parser.lexer_action_handlers()
    ("line_comment", &lalr::line_comment<const char*>)
    ("block_comment", &lalr::block_comment<const char*>)
//...

I've not tested that in your SQL grammar yet so YMMV. I'm just trying that now but I'm almost out of time for the evening.

@cwbaker
Copy link
Owner

cwbaker commented Mar 15, 2023

When I try this with grep I get an informative error message:

grep '[:punct:]' *
grep: character class syntax is [[:space:]], not [:space:]

Good point. I've created #15 to track this.

@cwbaker
Copy link
Owner

cwbaker commented Mar 15, 2023

When executing the example attached above I'm getting this:

valgrind ./grammar_test -g cql.g -i cg_test.sql
==19529== Memcheck, a memory error detector
==19529== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==19529== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==19529== Command: ./grammar_test -g cql.g -i cg_test.sql
==19529== 
Grammar size = 27765
==19529== Invalid read of size 8

This is another problem that would be solved with better error reporting for regular expressions.

The - characters in the whitespace token need to be escaped, e.g. "([ \t\r\n]*)|(\-\-[^\n]*)" gets the grammar building but it then fails to skip the first new line. If you use the line_comment and block_comment lexer actions I described above then you can get that example running, at least to the point of reporting some syntax errors in the input.

That syntax error seems like a bug in lalr rather than a true error, at the very least it's not clear what the actual problem is.

See the cql branch for an example that works up to the syntax error.

@mingodad
Copy link
Contributor Author

Would be nice to have a way to describe the comments (line/block) directly with regular expressions otherwise the online playground will not work for cases not hardcoded.

Also on you block_comment you are not somehow registering newline increment if you find/skip \n.

@mingodad
Copy link
Contributor Author

When using %whitespace "([ \t\r\n]|\-\-:line_comment:|/\*:block_comment:)*"; without implement/register then we get a segfault instead of an error message:

./grammar_test -g cql.g -i cg_test.sql 
Grammar size = 27943
Input size = 193316
grammar_test: ../../lalr/Lexer.ipp:257: void lalr::Lexer<Iterator, Char, Traits, Allocator>::skip() [with Iterator = const char*; Char = char; Traits = std::char_traits<char>; Allocator = std::allocator<char>]: Assertion `(function)' failed.
Aborted (core dumped)

In several place where LALR_ASSERT( expr ) is used an error message and exiting gracefully would be better for the possible online tool.

@mingodad
Copy link
Contributor Author

Looking at the code generated by this small grammar:

whitespace {
   %whitespace "[ \t\r\n]*";
   //%whitespace "[ \t\r\n]*|//[^\n]*";
   document: '{'  '}';
}

Partial output:

const LexerTransition whitespace_lexer_transitions [] = 
{
    {9, 11, &whitespace_lexer_states[0], nullptr},
    {13, 14, &whitespace_lexer_states[0], nullptr},
    {32, 33, &whitespace_lexer_states[0], nullptr},
    {-1, -1, nullptr, nullptr}
};

const LexerState whitespace_lexer_states [] = 
{
    {0, 3, &whitespace_lexer_transitions[0], nullptr},
    {-1, 0, nullptr, nullptr}
};

And comparing with this grammar:

whitespace {
   //%whitespace "[ \t\r\n]*";
   %whitespace "[ \t\r\n]*|//[^\n]*";
   document: '{'  '}';
}

Partial output:

const LexerTransition whitespace_lexer_transitions [] = 
{
    {9, 11, &whitespace_lexer_states[1], nullptr},
    {13, 14, &whitespace_lexer_states[1], nullptr},
    {32, 33, &whitespace_lexer_states[1], nullptr},
    {47, 48, &whitespace_lexer_states[3], nullptr},
    {9, 11, &whitespace_lexer_states[1], nullptr},
    {13, 14, &whitespace_lexer_states[1], nullptr},
    {32, 33, &whitespace_lexer_states[1], nullptr},
    {0, 10, &whitespace_lexer_states[2], nullptr},
    {11, 2147483647, &whitespace_lexer_states[2], nullptr},
    {47, 48, &whitespace_lexer_states[2], nullptr},
    {-1, -1, nullptr, nullptr}
};

const LexerState whitespace_lexer_states [] = 
{
    {0, 4, &whitespace_lexer_transitions[0], nullptr},
    {1, 3, &whitespace_lexer_transitions[4], nullptr},
    {2, 2, &whitespace_lexer_transitions[7], nullptr},
    {3, 1, &whitespace_lexer_transitions[9], nullptr},
    {-1, 0, nullptr, nullptr}
};

I've noticed this appears repeated:

    {9, 11, &whitespace_lexer_states[1], nullptr},
    {13, 14, &whitespace_lexer_states[1], nullptr},
    {32, 33, &whitespace_lexer_states[1], nullptr},

And also this strange LexerTransition:

    {11, 2147483647, &whitespace_lexer_states[2], nullptr},

@cwbaker
Copy link
Owner

cwbaker commented Mar 22, 2023

Would be nice to have a way to describe the comments (line/block) directly with regular expressions otherwise the online playground will not work for cases not hardcoded.

Ah, of course. There's nothing stopping you using regular expression for these too. It was just easier for me to use the lexer actions.

Also on you block_comment you are not somehow registering newline increment if you find/skip \n.

Thanks. Fixed in main now.

@cwbaker
Copy link
Owner

cwbaker commented Mar 22, 2023

Looking at the code generated by this small grammar:

...

I've noticed this appears repeated:

    {9, 11, &whitespace_lexer_states[1], nullptr},
    {13, 14, &whitespace_lexer_states[1], nullptr},
    {32, 33, &whitespace_lexer_states[1], nullptr},

It's not unexpected for transitions to be repeated, they are transitions from different states which I don't feel are worth the trouble of combining. In this particular case I can't see why the state matching //[^n]* would ever jump back to the state matching [ \t\r\n]*.

And also this strange LexerTransition:

    {0, 10, &whitespace_lexer_states[2], nullptr},
    {11, 2147483647, &whitespace_lexer_states[2], nullptr},

This, and the transition before it, are matching [^\n]. That's any characters up to \n (10) and then any characters after it, all the way up to INT_MAX rather than 256 because I've half a mind in supporting Unicode. It should probably be UINT_MAX.

@cwbaker
Copy link
Owner

cwbaker commented Mar 22, 2023

I've made a few fixes to handling whitespace and to matching tokens where some tokens are prefixes of longer tokens. These fixes are all in main now.

I've also updated the CQL grammar and example so that they parse. I've updated cql.g where fixes were obvious to me and commented out parts of cg_test.sql that I couldn't work out, e.g. some keywords like @emit... and @blob... aren't set and some of the syntax in the test file I can't match to the grammar. See the cql branch which should compile and run for you.

I don't quite have time to describe the changes I've made and the parts of the example script that I can't match to the grammar right now but hopefully the branch and commits are enough of an example to get you going a bit further.

@mingodad
Copy link
Contributor Author

Thank you !
I'm looking at it now.

@mingodad
Copy link
Contributor Author

Looking at your fixes to accomplish case insensitive is not good enough UPDATE : "UPDATE|update" ; because Update is also valid.
But it's a good way to test the parser anyway.
Another problem now is the performance of parsing/evaluate input.

@mingodad
Copy link
Contributor Author

mingodad commented Mar 23, 2023

Testing with this grammar https://github.com/AthrunArthur/cxxparser.git I'm getting 3 unresolved shift/reduce conflicts but bison/byacc reports none (see attached grammars):

$lalr-nb cxx-parser.g
lalr (157:0): ERROR: shift/reduce conflict for 'id' on '<'
lalr (625:0): ERROR: shift/reduce conflict for 'storage_class_specifier' on '[\"](\\.|[^\\"])*[\"]'
lalr (879:0): ERROR: shift/reduce conflict for 'elaborated_class_specifier' on ':'

$bison-nb cxx-parser.y # no conflict
$byacc-nb cxx-parser.y # no conflict

cxx-parser.zip

@mingodad
Copy link
Contributor Author

Changing src/lalr/GrammarGenerator.cpp seems to remove the shift/reduce conflict:

@@ -741,8 +741,9 @@ void GrammarGenerator::generate_reduce_transition( GrammarState* state, const Gr
         {
             case TRANSITION_SHIFT:
             {
-                if ( production->precedence() == 0 || symbol->precedence() == 0 || (symbol->precedence() == production->precedence() && symbol->associativity() == ASSOCIATE_NULL) )
+                if ( (production->precedence() == 0 && symbol->precedence() == 0) || (symbol->precedence() == production->precedence() && symbol->associativity() == ASSOCIATE_NULL) )
                 {
+                    //printf("production->precedence = %d, symbol->precedence = %d, symbol->associativity = %d\n", production->precedence(), symbol->precedence(), symbol->associativity());
                     error( production->line(), PARSER_ERROR_PARSE_TABLE_CONFLICT, "shift/reduce conflict for '%s' on '%s'", production->symbol()->identifier().c_str(), symbol->lexeme().c_str() );
                 }
                 else if ( production->precedence() > symbol->precedence() || (symbol->precedence() == production->precedence() && symbol->associativity() == ASSOCIATE_RIGHT) )
@@ -755,7 +756,7 @@ void GrammarGenerator::generate_reduce_transition( GrammarState* state, const Gr
             
             case TRANSITION_REDUCE:
             {
-                if ( production->precedence() == 0 || transition->precedence() == 0 || production->precedence() == transition->precedence() )
+                if ( (production->precedence() == 0 && transition->precedence() == 0) || production->precedence() == transition->precedence() )
                 {
                     error( production->line(), PARSER_ERROR_PARSE_TABLE_CONFLICT, "reduce/reduce conflict for '%s' and '%s' on '%s'", production->symbol()->identifier().c_str(), transition->reduced_symbol()->identifier().c_str(), symbol->lexeme().c_str() );
                 }

@cwbaker
Copy link
Owner

cwbaker commented May 31, 2023

Ah. You also need to add %precedence '(' to set the precedence of the production on line 225 that gives the error.

@cwbaker
Copy link
Owner

cwbaker commented May 31, 2023

Closing now. I believe all of the problems raised here are addressed.

@mingodad
Copy link
Contributor Author

I've just got an initial version of the playground working at https://meimporta.eu/lalr-playground/ see attached the source.

lalr-with-playground.zip

@mingodad
Copy link
Contributor Author

It would be nice to have the column information on the grammar parsing errors.
Actually if we try to parse postgresql-16.g and test.sql we bet an error memory out of bounds that need to be investigated because executing under nodejs it does work:

/usr/bin/time node-env node --enable-source-maps grammar_test.js -g /home/mingo/dev/c/A_grammars/lalr/src/lalr/lalr_examples/postgresql-16.g -i /home/mingo/dev/c/A_grammars/lalr/src/lalr/lalr_examples/test.sql 
read grammar: Time taken 0 seconds 2 milliseconds
Grammar size = 138971
compile grammar: Time taken 3 seconds 114 milliseconds
read input: Time taken 0 seconds 1 milliseconds
Input size = 125514
parse input: Time taken 0 seconds 11 milliseconds
accepted = 1, full = 1
3.32user 0.10system 0:03.20elapsed 106%CPU (0avgtext+0avgdata 297776maxresident)k
0inputs+0outputs (0major+72774minor)pagefaults 0swaps

Also would be nice to be able to output an AST of the parsed source.

@mingodad
Copy link
Contributor Author

Notice that when there is errors we can click on then to jump to the error on the corresponding editor.

@mingodad
Copy link
Contributor Author

Also notice that if we check the Lexer checkbox then instead of fully parse the Source Code only the lexer will be executed and dumped on the screen.

@mingodad
Copy link
Contributor Author

Just added an option to load examples to play with (Lua parser, Json parser, Carbon parser).
To see other similar playgrounds see the first message.

@mingodad
Copy link
Contributor Author

I've got postgresql-16.g to work on the playground adding -s ALLOW_MEMORY_GROWTH=1 -s TOTAL_STACK=128MB -s INITIAL_MEMORY=256MB to the em++ compiler arguments.

@mingodad
Copy link
Contributor Author

I just added a cxx grammar that somehow enters in a busy loop when we remove the line comment in any line starting with //std:: .

@mingodad
Copy link
Contributor Author

At the end only -s ALLOW_MEMORY_GROWTH=1 -s TOTAL_STACK=8MB is enough to get postgresql-16 grammar to work on the browser, see again everything attached bellow.

lalr-with-playground.zip

@mingodad
Copy link
Contributor Author

Testing the playground I noticed that when checking Lexer with the Json parser I can see this:

3:9:1:22:[string]:[\"(\\[\"\\\\]|[^\"\n])*\"|'(\\['\\\\]|[^'\n])*']:["section"]

Using this rule:

   string:
	"\"(\\[\"\\\\]|[^\"\n])*\"|'(\\['\\\\]|[^'\n])*'"
	;

But when using this rule:

   string:
	"\"(\\[\"\\\\]|[^\"\n])*\""
	| "'(\\['\\\\]|[^'\n])*'"
	;

I'm seeing this:

3:9:1:23:[backslash__double_quote__left_paren__backslash__backslash__left_square_paren__backslash__double_quote__backslash__backslash__backslash__backslash__right_square_paren__pipe__left_square_paren__hat__backslash__double_quote__backslash_n_right_square_paren__right_paren__star__backslash__double_quote_terminal]:[\"(\\[\"\\\\]|[^\"\n])*\"]:["section"]

I was expecting something like:

3:9:1:22:[string]:[\"(\\[\"\\\\]|[^\"\n])*\"]:["section"]

@mingodad
Copy link
Contributor Author

Also when trying the Lua parser if we replace STRING : "[\"][^\"\n]*[\"]|['][^'\n]*[']" ; by STRING : "\"[^\"\n]*\"" | "'[^'\n]*'" ; then we get this error:

lalr (259:0): ERROR: shift/reduce conflict for 'nobr_variable' on '\"[^\"\n]*\"'
lalr (259:0): ERROR: shift/reduce conflict for 'nobr_variable' on ''[^'\n]*''
Error compiling grammar. Error count = 2

It seems that the %left STRING ; is not been applied to each individual STRING alternate rules.

@mingodad
Copy link
Contributor Author

Also when trying the Linden Script parser if we replace (/\*[^*]+\*/) by (/\*.+\*/) in %whitespace ; then it seems to enter an infinite loop.

@mingodad
Copy link
Contributor Author

JUst added EBNF generation to generate railroad diagrams at https://www.bottlecaps.de/rr/ui , check the checkbox Gen. EBNF then click Parse.

@mingodad
Copy link
Contributor Author

Just added a Bison parser (not working) to show the need to somehow have a way to lex blocks of text like in bison represented by {...} to mean lex everything inside curly braces accounting for nested curly braces ad skip curly braces inside strings or chars.

@mingodad
Copy link
Contributor Author

I also added a not yet working grammar for https://github.com/jplevyak/dparser that shows lots of shift/reduce conflicts, would be nice if the error messages could also include some kind of clue/suggestion to help fix it.

@mingodad
Copy link
Contributor Author

Looking through lalr code and to generate a generic parser tree it seems that something like the pseudo code shown bellow could be a starting poit.

But the there is no example of usage of set_default_action_handler and I don't know right now what to do inside it.

Can you give some help here ?

struct AstUserData {
    int data;
    AstUserData():data(0) {};
};

static AstUserData astMaker( const AstUserData* start, const lalr::ParserNode<uint8_t>* nodes, size_t length )
{
    const AstUserData* end = start + length;
    while ( start != end && !start[0].data )
    {
        ++start;
    }
    //printf("astMaker: %s\n", nodes[0].lexeme().c_str());
    return start != end ? start[0] : AstUserData();
}
...
			//lalr::Parser<const unsigned char*, int> parser( compiler.parser_state_machine(), &error_policy_input );
                        lalr::Parser<const uint8_t*, AstUserData> parser( compiler.parser_state_machine(), &error_policy_input );
                        parser.set_default_action_handler(astMaker);
			parser.parse( &input_txt[0], &input_txt[0] + input_txt_size );

@mingodad
Copy link
Contributor Author

Would be nice to output a parser tree like here https://mingodad.github.io/lua-wasm-playground/ or something similar.

@mingodad
Copy link
Contributor Author

Using this function while parsing with Lua parser with this input print("Hello !") give the output shown bellow:

static AstUserData astMaker( const AstUserData* start, const lalr::ParserNode<uint8_t>* nodes, size_t length )
{
    const AstUserData* end = start + length;
    while ( start != end && !start[0].data )
    {
        ++start;
    }
    //printf("astMaker: %s\n", nodes[0].lexeme().c_str());
    const char *lexstr = (length > 0 ? (const char *)nodes[0].lexeme().c_str() : "::lnull");
    const char *idstr = (length > 0 ? nodes[0].symbol()->identifier : "::inull");
    printf("astMaker: %p %p %zd -> %s : %s\n", start, nodes, length, idstr, lexstr);
    return start != end ? start[0] : AstUserData();
}

Output:

Parser state machine stats:
    Symbols          : 127
    Actions          : 0
    States           : 242
    Transitions      : 4332
    Solved conflicts : shift/reduce = 138, reduce/reduce = 0
Lexer state machine stats:
    Strings          : 0
    Actions          : 0
    States           : 135
    Transitions      : 548
read input: Time taken 0 seconds 0 milliseconds
Input size = 17
astMaker: 0x55dbd6538d68 0x55dbd64fb2d8 1 -> IDENTIFIER : print
astMaker: 0x55dbd6538d68 0x55dbd64fb2d8 1 -> nobr_variable : 
astMaker: 0x55dbd6538d70 0x55dbd64fb348 1 -> STRING : "Hello !"
astMaker: 0x55dbd6538d70 0x55dbd64fb348 1 -> expression : 
astMaker: 0x55dbd6538d74 0x55dbd64fb310 3 -> left_paren_terminal : (
astMaker: 0x55dbd6538d6c 0x55dbd64fb2d8 2 -> nobr_prefix_expression : 
astMaker: 0x55dbd6538d68 0x55dbd64fb2d8 1 -> nobr_function_call : 
astMaker: 0x55dbd6538d68 0x55dbd64fb2d8 1 -> class_1_statement : 
astMaker: 0x55dbd6538d68 0x55dbd64fb2d8 1 -> statement_list_1 : 
astMaker: 0x55dbd6538d68 0x55dbd64fb2d8 1 -> statement_list : 
astMaker: 0x55dbd6538d68 0x55dbd64fb2d8 1 -> opt_block_statements : 
astMaker: 0x55dbd6538d68 0x55dbd64fb2d8 1 -> opt_block : 
parse input: Time taken 0 seconds 0 milliseconds
accepted = 1, full = 1

@mingodad
Copy link
Contributor Author

After add column info to error messages and reviewing the with the rust grammar I found a weird behavior where one terminal definition was missing MUT : 'mut' ; but lalr was giving several erro messages about shift/reduce conflicts instead of a missing definition.

You can see it here https://meimporta.eu/lalr-playground/ , select the Rust parser then scroll the grammar editor to line 1851 where I added a comment about this weird behavior, right now it's compiling the grammar and parsing the input but if we comment the MUT definition then we get:

lalr (978:5): ERROR: shift/reduce conflict for 'maybe_mut' on ''
lalr (845:7): ERROR: shift/reduce conflict for 'binding_mode' on ''
lalr (845:7): ERROR: shift/reduce conflict for 'binding_mode' on ''
lalr (845:7): ERROR: shift/reduce conflict for 'binding_mode' on ''
lalr (845:7): ERROR: shift/reduce conflict for 'binding_mode' on ''
lalr (978:5): ERROR: shift/reduce conflict for 'maybe_mut' on ''
lalr (845:7): ERROR: shift/reduce conflict for 'binding_mode' on ''
lalr (845:7): ERROR: shift/reduce conflict for 'binding_mode' on ''
lalr (845:7): ERROR: shift/reduce conflict for 'binding_mode' on ''
lalr (845:7): ERROR: shift/reduce conflict for 'binding_mode' on ''
Error compiling grammar. Error count = 10

@mingodad
Copy link
Contributor Author

The problem for not showing an undefined symbol error is this && !symbol->referenced_in_precedence_directive() (see bellow), we should check if the symbol is also referenced in a rule.

if ( symbol->symbol_type() == SYMBOL_NON_TERMINAL && symbol->productions().empty() && !symbol->referenced_in_precedence_directive() )
{
   error(symbol->line(), 0, PARSER_ERROR_UNDEFINED_SYMBOL, "undefined symbol '%s'", symbol->identifier().c_str());
}

@mingodad
Copy link
Contributor Author

I did a possible fix by adding another flag GrammarSymbol::referenced_in_rule_ setting it in Grammar& Grammar::identifier( const char* identifier, int line, int column ) and checking it in void GrammarGenerator::check_for_undefined_symbol_errors().

 if ( symbol->symbol_type() == SYMBOL_NON_TERMINAL && symbol->productions().empty() && (symbol->referenced_in_rule() || !symbol->referenced_in_precedence_directive()) )
{
   error(symbol->line(), 0, PARSER_ERROR_UNDEFINED_SYMBOL, "undefined symbol '%s'", symbol->identifier().c_str());
}

@cwbaker
Copy link
Owner

cwbaker commented Jul 15, 2023

Also when trying the Lua parser if we replace STRING : "[\"][^\"\n]*[\"]|['][^'\n]*[']" ; by STRING : "\"[^\"\n]*\"" | "'[^'\n]*'" ; then we get this error:

lalr (259:0): ERROR: shift/reduce conflict for 'nobr_variable' on '\"[^\"\n]*\"'
lalr (259:0): ERROR: shift/reduce conflict for 'nobr_variable' on ''[^'\n]*''
Error compiling grammar. Error count = 2

It seems that the %left STRING ; is not been applied to each individual STRING alternate rules.

Moving the | operator out of the regular expression and into the grammar makes STRING a non-terminal that is reduced from either of those regular expressions and introduces the shift/reduce conflicts. In that case it's no longer valid to say %left STRING at all as precedence directives should only apply to tokens.

Put the | inside the regular expression to have STRING evaluate as a terminal, i.e. a token that is handled by the lexer and isn't reduced by the parser.

@cwbaker
Copy link
Owner

cwbaker commented Jul 15, 2023

Looking through lalr code and to generate a generic parser tree it seems that something like the pseudo code shown bellow could be a starting poit.

But the there is no example of usage of set_default_action_handler and I don't know right now what to do inside it.

Can you give some help here ?

struct AstUserData {
    int data;
    AstUserData():data(0) {};
};

static AstUserData astMaker( const AstUserData* start, const lalr::ParserNode<uint8_t>* nodes, size_t length )
{
    const AstUserData* end = start + length;
    while ( start != end && !start[0].data )
    {
        ++start;
    }
    //printf("astMaker: %s\n", nodes[0].lexeme().c_str());
    return start != end ? start[0] : AstUserData();
}
...
			//lalr::Parser<const unsigned char*, int> parser( compiler.parser_state_machine(), &error_policy_input );
                        lalr::Parser<const uint8_t*, AstUserData> parser( compiler.parser_state_machine(), &error_policy_input );
                        parser.set_default_action_handler(astMaker);
			parser.parse( &input_txt[0], &input_txt[0] + input_txt_size );

From looking at your playground online it looks like you've solved this already.

But, if not, then set a default action handler function to handle all reductions. Inside that handler you'd copy out, at least, the symbols and lexemes in the array of ParserNode objects passed to the handler. You could copy of that array of ParserNode objects and return it within the UserData for the parser, or copy that array into a tree of data that you were managing yourself. Either way is equally valid in my mind, depending on which is most convenient.

@mingodad
Copy link
Contributor Author

Also when trying the Lua parser if we replace STRING : "[\"][^\"\n]*[\"]|['][^'\n]*[']" ; by STRING : "\"[^\"\n]*\"" | "'[^'\n]*'" ; then we get this error:

lalr (259:0): ERROR: shift/reduce conflict for 'nobr_variable' on '\"[^\"\n]*\"'
lalr (259:0): ERROR: shift/reduce conflict for 'nobr_variable' on ''[^'\n]*''
Error compiling grammar. Error count = 2

It seems that the %left STRING ; is not been applied to each individual STRING alternate rules.

Moving the | operator out of the regular expression and into the grammar makes STRING a non-terminal that is reduced from either of those regular expressions and introduces the shift/reduce conflicts. In that case it's no longer valid to say %left STRING at all as precedence directives should only apply to tokens.

Put the | inside the regular expression to have STRING evaluate as a terminal, i.e. a token that is handled by the lexer and isn't reduced by the parser.

Probably it's better to check if precedence is assigned to non-terminal and show an error message and maybe have an explicit grammar section for the lexer that allow more flexible regex constructions.

@mingodad
Copy link
Contributor Author

Now the playground is also available at https://mingodad.github.io/lalr/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants