all my GSoC works are in pull request(in review) and one commit(merged)
This project's autocompletion is based on popular python packages - mycli. And do some enhancement, bug fix, and extend its functionality.
the presentation's google slides' link
There are four components.
SQLAnalyze
is responsible for analyzing the SQL statement and providing a function that inputs the SQL statement and text cursor's position, then gets the suggestion.
SQLFetch
is responsible for providing convenient function fetching data. Such as provide a databases
function that would return a list of database names that are current DB's databases.
Autocompleter
is responsible for simplifying the use of SQLAnalyze
and SQLFetch
. It would provide two functions. One is a refresh that can refresh the DB's info. The refresh way can decide to be sync or async, and another is get_suggestions. It would get suggestions based on text and cursor position.
The introspector
is responsible for providing a inspect
function that can given text and cursor position return the HTML that would be shown to users.
The relationship of these four components
Autocompleter
uses SQLFetch
to update the data inside SQLAnalyze
. Such as update dbmetadata. So in SQLAnalyze
, when need to suggest tables in the currently selected database. It can find in dbmetadata.
Introspector
uses autocompleter
because it needs to shares the data that autocompleter's SQLAnalyze
has. And use autocompleter’s SQLFetch
to get some HTML(such as when inspecting table word, need to get the table's schema HTML for showing to the user)
implement three main classes - 「SQLFetch, SQLAnalyze, Autocompleter」 describe in this comment
-
Successfully create basic autocomplete feature on mariadb_kernel
-
Refactor for SQL command position and SQLFetch.Refresher's arguments passing
-
let autocompletion can show the type of completion in the list
show_type_of_completion.mp4
-
asyc refresh autocompleter's data
-
can set the keyword and function list and would automatically fetch the lastest keyword and function list from new information_schema tables that work in my GSoC(link)
-
Database suggestion before 「.」. Ex: 「insert into db_name_to_be_completed.table_name VALUES (...) 」
commit
database_suggestion_before_dot.mp4
-
add global and session variable suggestions
commit
- add global and session variable suggestion on completion_engine and update sql_fetch for them
- Add global and session variable suggestions and fix some problem
session_variable_suggestion.mp4
-
enhancement of database suggestions
commit
- Add column suggest for system table like mysql.user. And fix the problem like 「select user,
<tab>
from mysql.user;」 would get a user list - Autocompletion about database_name.table_name_to_be_completed could suggest the table not in currently seleted database
- Autocompletion for database in select, insert into, ... , like statement
database_and_table_and_column_suggestion_in_system_table.mp4
more_database_suggestion.mp4
- Add column suggest for system table like mysql.user. And fix the problem like 「select user,
-
fix some bugs related to the name
user
is the column name and keywordcommit:
fix_keyword_suggestion_after_user_bug.mp4
-
Remove autocompletion suggest column name from a statement like 「insert into table_name values ( 」
commit:
- Remove autocompletion suggest column name from statement like 「insert into table_name values ( 」
- fix the bug for 【Remove autocompletion suggest column name from statement like 「insert into table_name values ( 」】
not_suggest_column_after_values.mp4
According to a different type of word show different info. like below table:
type | show Infos |
---|---|
database | show the tables in this database |
table | show its schema, and partial table rows |
column | show the data type it stores, and some data in this column |
function | show its documentation(has syntax, description, example sections) |
user | show user list. |
-
Let introspection show some real data
introspection_demo.mp4
-
feature about introspection provide column hint can handle multi values
introspection_column_hint_demo.mp4
-
introspect function would provide documentation
introspection_function_doc_demo.mp4
-
Solve some introspection problems about the different type of words have the same name
- fix mariadb_client's run_statement courrent problem
- Add more testing about mariadb_client concurrent execute run_statement and the multi mariadb_client's selected database sync
This project encounters some tricky problems. In this section, I would discuss them.
First, use the mycli's way to implement introspection is difficult in some situations. Such as column name is same as keyword name. This result in introspection is difficult on mycli way to resolve word to the specific type.
The reason needs to describe the internal of mycli to implement autocompletion. mycli parsing base on the python package - sqlparse. And sqlparse have some limitation. Such as, it can't resolve the name into the database, table, column, and so on.
ex:
from sqlparse import parse
result = parse("select min(col1), col2 from tabl1;")
result[0]._pprint_tree()
the code above would get parse tree be printed at below
sqlparse
can recognize DML like known this statement is a select statement. And Keyword, Punctuation, Identifier, Identifier List, Function, and so on.
But it can’t resolve database, column, table, and so on. These would all be Name. Look at the place I mark green color, the token min
should be function; The token col1
should be column; The token col2
should be column; The token tabl1
should be table
And the mycli look at the parse tree, resolve Name to more specific type, such as database, table, column. For example: There are a statement.
select min(col1), col2 from tabl1;
we need suggest type after min( the token before the cursor is left parenthesis So we need to consider five cases list below:
- In Where statement
=>
look at more cases~ - After Using - tbl1 INNER JOIN tbl2 USING (col1, col2)
=>
Suggest columns/functions - Subquery expression like "select … from (“
=>
Suggest keyword - show statement like “show “
=>
Suggest the list of special - keyword appear after show
function call like “select min(“
=>
Suggest column
these five case is under the condition is top level’s last token before the text cursor
is punctuation - left parenthesis. And have their own conditions.
First, check the top level’s last token is where instance, then look at more cases. Here not discuss that.
Second, check the top level’s second last token before the text cursor
, check it is using word
or not. If it is, suggest column
Third, check the top level’s first token is select, and the last token before the text cursor
is start with left parenthesis. If it is, suggest keyword. But in this example, last word would be min(, so it is not belong this case.
Fourth, check the top level’s first token is show word, if it is, suggest a list of special keyword appear after show
Fifiveth, is not above cases, it would be in simple function call, so suggest column
In conclusion, mycli suggestion is multi options in most scenarios. But introspection needs to resolve to the specific type. There is a way to solve this problem. It is looking at the databases’ info. For example: for sql statement
select column1 from table1;
and complter.dbmetadata
{'tables': {'db1': {
“table1”: [“*”, “column1”]
}
},
'views': {'db1': {}},
'functions': {'db1': {}}}
check priority : column_hint -> column -> table -> database -> function -> keyword
the suggestion in column1 would be column, function or keyword or alias. we have special resolve order. So first we check it is column or not. We look at dbmetadata’s tables field. Find in current database db1 have table named table1 or not. If exists, then check in that table has column1 or not. In this example. There exists column1. So this word is column. The introspector would get this(point to introspector.get_introspection) And it not found, would check the next suggestion type in the priority - function type. final the introspector.get_introspection return :
{'word': 'column1', 'type': 'column',
'database': 'db1', 'table': 'table1'}
But there exists a problem if there exists the same name be multiple types. It may resolve to the wrong type.
For example : for sql statement:
select min(column1) from table1;
if table1 has a column named min, and there exists a function named min the word 「min 」 would be recognized as a column in the check priority
So I need to do more check on that. In this case would check the next token after cursor is left parenthesis or not. If it is left parenthesis, it would be function.
And the check priority is also important. For example: there are two priority:
check priority 1: column_hint -> column -> table -> database -> function -> keyword
check priority 2: keyword -> function -> database -> table -> column -> column_hint
if the check priority is the check priority 1
. Then user word would be resolved to keyword. Because there has a user column in mysql.user and has user keyword. According check priority 2
. Function has higher priority than column.
And in the right check priority, it can solve most of wrong cases. Because mycli the most suggestion redundant types is fucntion and keyword, so their priority is lowest.
Now I would talk about the comparison of two types of autocompletion engines. There are two type of completion engine
- use parser like sqlparse, which is hand-writing parser
- use bison like parser, which is auto-generated by grammar One is to use the hand-writing parser,if this kind of parser if maintained by the compiler or interpreter team, then it is fine to use it to implement the tool about autocompletion, go to definition, and so on. But if use the parser like sqlparse, which is not maintained by original team. The real-world parser is very complex. So sqlparse has many defects. Such as this does not match the real-world database grammar and it commonly parses the wrong result. Like the cases I show here. sqlprase mistake 【t1 (a, b, c)】 「table name and column list」 as function. This is a big problem. For exmaple:
insert into t1 (a, b, c) VALUES (1 ,2 ,3);
And for the second, if the grammar feed into bison is complete and no ambiguity, then the parse result would not be wrong. Can make sure it cover the whole grammar and nearly no error. And the mariadb is maintained the 「sql/sql_yacc.yy」, it can be used and can has the same parser with the mariadb sql database. And we can easily annotated something into parsing process and get the information that help a lot in autocompletion.
there are three main component lexer_caller, bison_xml_reader and autocompletion_engine.
Because our project is using python but bison generated parser and lexer is written in C. So I need to build a .so file for lexer and create a lexer_caller to load the library file and execute it get a list of tokens.
And I would need to annotated info at parsing process. So I create a bison_xml_reader read the xml report get the LR1 parsing need info like goto_table and action_table. So I can parse on them.
If we parse the statement at the left, then we will get the result at the right.
And in this example, I only show the name token.
So I can base on the token matched rule number and matched index to deduce it should be database, table or function, and so on.
Such as the word Customers. It match on rule 45 - table_factor. And in this rule first token should be table, so the word Customers is table name.
And the autocompletion is not only parsed the complete SQL statement. It needs to predicate the next word is what. With the info in the bison report, it would be easy. According to the current state's transitions info. We can know the next word is what kind of token. In this case, the next token can be NAME, STRING, INTNUM, BOOL, APPROXNUM, USERVAR, NOT, ... etc. And it can reduce the matched symbols to select_expr_list, select_expr, expr. And the NAME token needs to digging more details types like database, column, table, and so on. So we look at the NAME token's transition. It would be from state 25 to state 51. And in State 51 has three rules that consume the NAME token. Rule 221's NAME token consumed is a column; rule 223's NAME token consumed is a table. Rule 264's NAME token consumed is function. So it not only suggests keywords and functions like STRING, INTNUM, BOOL, ... etc but also suggests a column, table, and function.
But this autocompletion engine is still in early stage. So it is not adopted in this project.
my autocompletion engine link(still in progress) - currently support basic introspection ability