Skip to content

a97410985/GSoC2021_MariaDB_autocomplete_and_related_feature_report

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 

Repository files navigation

all my GSoC works are in pull request(in review) and one commit(merged)

This project's autocompletion is based on popular python packages - mycli. And do some enhancement, bug fix, and extend its functionality.

the presentation video

Everything Is AWESOME

the presentation's google slides' link

The design of this project

mariadb_kernel_autocompletion_feature_design

There are four components. SQLAnalyze is responsible for analyzing the SQL statement and providing a function that inputs the SQL statement and text cursor's position, then gets the suggestion.

SQLFetch is responsible for providing convenient function fetching data. Such as provide a databases function that would return a list of database names that are current DB's databases.

Autocompleter is responsible for simplifying the use of SQLAnalyze and SQLFetch. It would provide two functions. One is a refresh that can refresh the DB's info. The refresh way can decide to be sync or async, and another is get_suggestions. It would get suggestions based on text and cursor position.

The introspector is responsible for providing a inspect function that can given text and cursor position return the HTML that would be shown to users.

The relationship of these four components

Autocompleter uses SQLFetch to update the data inside SQLAnalyze. Such as update dbmetadata. So in SQLAnalyze, when need to suggest tables in the currently selected database. It can find in dbmetadata.

Introspector uses autocompleter because it needs to shares the data that autocompleter's SQLAnalyze has. And use autocompleter’s SQLFetch to get some HTML(such as when inspecting table word, need to get the table's schema HTML for showing to the user)

the list of my work(not list trivial commits)

integrate mycli with mariadb_kernel

implement three main classes - 「SQLFetch, SQLAnalyze, Autocompleter」 describe in this comment

related commits

  1. implement SQLFetch class

  2. Implement AutoCompleter and SQLAnalyze

  3. Successfully create basic autocomplete feature on mariadb_kernel

  4. fix not select any database error in SQLFetch

  5. Refactor for SQL command position and SQLFetch.Refresher's arguments passing

  6. let autocompletion can show the type of completion in the list

    show_type_of_completion.mp4
  7. asyc refresh autocompleter's data

    1. create basic async refresh by thread
    2. Merge branch 'async_refresh' into integrated_with_mycli

enhancement of mycli

  1. can set the keyword and function list and would automatically fetch the lastest keyword and function list from new information_schema tables that work in my GSoC(link)

    commit

  2. Database suggestion before 「.」. Ex: 「insert into db_name_to_be_completed.table_name VALUES (...) 」

    commit

    1. Database suggestion before .table_name
    2. Enhancement of database suggestion before 「.」
    database_suggestion_before_dot.mp4
  3. add global and session variable suggestions

    commit

    1. add global and session variable suggestion on completion_engine and update sql_fetch for them
    2. Add global and session variable suggestions and fix some problem
    session_variable_suggestion.mp4
  4. enhancement of database suggestions

    commit

    1. Add column suggest for system table like mysql.user. And fix the problem like 「select user, <tab> from mysql.user;」 would get a user list
    2. Autocompletion about database_name.table_name_to_be_completed could suggest the table not in currently seleted database
    3. Autocompletion for database in select, insert into, ... , like statement
    database_and_table_and_column_suggestion_in_system_table.mp4
    more_database_suggestion.mp4
  5. fix some bugs related to the name user is the column name and keyword

    commit:

    1. Autocompletion for keyword after user column name
    fix_keyword_suggestion_after_user_bug.mp4
  6. Remove autocompletion suggest column name from a statement like 「insert into table_name values ( 」

    commit:

    1. Remove autocompletion suggest column name from statement like 「insert into table_name values ( 」
    2. fix the bug for 【Remove autocompletion suggest column name from statement like 「insert into table_name values ( 」】
    not_suggest_column_after_values.mp4

extend of autocomplete feature - introspection

According to a different type of word show different info. like below table:

type show Infos
database show the tables in this database
table show its schema, and partial table rows
column show the data type it stores, and some data in this column
function show its documentation(has syntax, description, example sections)
user show user list.
  1. Complete basic introspection functionality

  2. Let introspection show some real data

    introspection_demo.mp4
  3. Change setup.py to include data folder and create a new feature about introspection provide column hint

  4. feature about introspection provide column hint can handle multi values

    introspection_column_hint_demo.mp4
  5. introspect function would provide documentation

    introspection_function_doc_demo.mp4
  6. introsptect user keyword would pop up user list

  7. Merge branch use_mycli_for_introspection('febd264e8a0e841aa850b5b84a22a9aea491c67d') into integrated_with_mycli

  8. Solve some introspection problems about the different type of words have the same name

related to mariadb_kernel

  1. fix mariadb_client's run_statement courrent problem
  2. Add more testing about mariadb_client concurrent execute run_statement and the multi mariadb_client's selected database sync

The difficulty of this project

This project encounters some tricky problems. In this section, I would discuss them.

First, use the mycli's way to implement introspection is difficult in some situations. Such as column name is same as keyword name. This result in introspection is difficult on mycli way to resolve word to the specific type.

The reason needs to describe the internal of mycli to implement autocompletion. mycli parsing base on the python package - sqlparse. And sqlparse have some limitation. Such as, it can't resolve the name into the database, table, column, and so on.

ex:

from sqlparse import parse
result = parse("select min(col1), col2 from tabl1;")
result[0]._pprint_tree()

the code above would get parse tree be printed at below

sqlparse_parse_tree

sqlparse can recognize DML like known this statement is a select statement. And Keyword, Punctuation, Identifier, Identifier List, Function, and so on. But it can’t resolve database, column, table, and so on. These would all be Name. Look at the place I mark green color, the token min should be function; The token col1 should be column; The token col2 should be column; The token tabl1 should be table

And the mycli look at the parse tree, resolve Name to more specific type, such as database, table, column. For example: There are a statement.

select min(col1), col2 from tabl1;

we need suggest type after min( the token before the cursor is left parenthesis So we need to consider five cases list below:

  1. In Where statement => look at more cases~
  2. After Using - tbl1 INNER JOIN tbl2 USING (col1, col2) => Suggest columns/functions
  3. Subquery expression like "select … from (“ => Suggest keyword
  4. show statement like “show “ => Suggest the list of special
  5. keyword appear after show function call like “select min(“ => Suggest column

these five case is under the condition is top level’s last token before the text cursor is punctuation - left parenthesis. And have their own conditions. First, check the top level’s last token is where instance, then look at more cases. Here not discuss that. Second, check the top level’s second last token before the text cursor, check it is using word or not. If it is, suggest column Third, check the top level’s first token is select, and the last token before the text cursor is start with left parenthesis. If it is, suggest keyword. But in this example, last word would be min(, so it is not belong this case. Fourth, check the top level’s first token is show word, if it is, suggest a list of special keyword appear after show Fifiveth, is not above cases, it would be in simple function call, so suggest column

In conclusion, mycli suggestion is multi options in most scenarios. But introspection needs to resolve to the specific type. There is a way to solve this problem. It is looking at the databases’ info. For example: for sql statement

select column1 from table1;

and complter.dbmetadata

{'tables': {'db1': {
                     “table1”: [“*”, “column1”]            
                }
  }, 
 'views': {'db1': {}}, 
 'functions': {'db1': {}}}

check priority : column_hint -> column -> table -> database -> function -> keyword

the suggestion in column1 would be column, function or keyword or alias. we have special resolve order. So first we check it is column or not. We look at dbmetadata’s tables field. Find in current database db1 have table named table1 or not. If exists, then check in that table has column1 or not. In this example. There exists column1. So this word is column. The introspector would get this(point to introspector.get_introspection) And it not found, would check the next suggestion type in the priority - function type. final the introspector.get_introspection return :

{'word': 'column1', 'type': 'column',   
 'database': 'db1', 'table': 'table1'}

But there exists a problem if there exists the same name be multiple types. It may resolve to the wrong type.

For example : for sql statement:

select min(column1) from table1;

if table1 has a column named min, and there exists a function named min the word 「min 」 would be recognized as a column in the check priority

So I need to do more check on that. In this case would check the next token after cursor is left parenthesis or not. If it is left parenthesis, it would be function.

And the check priority is also important. For example: there are two priority:

check priority 1: column_hint -> column -> table -> database -> function -> keyword

check priority 2: keyword -> function -> database -> table -> column -> column_hint

if the check priority is the check priority 1. Then user word would be resolved to keyword. Because there has a user column in mysql.user and has user keyword. According check priority 2. Function has higher priority than column. And in the right check priority, it can solve most of wrong cases. Because mycli the most suggestion redundant types is fucntion and keyword, so their priority is lowest.

Two type of completion engine discussion

Now I would talk about the comparison of two types of autocompletion engines. There are two type of completion engine

  1. use parser like sqlparse, which is hand-writing parser
  2. use bison like parser, which is auto-generated by grammar One is to use the hand-writing parser,if this kind of parser if maintained by the compiler or interpreter team, then it is fine to use it to implement the tool about autocompletion, go to definition, and so on. But if use the parser like sqlparse, which is not maintained by original team. The real-world parser is very complex. So sqlparse has many defects. Such as this does not match the real-world database grammar and it commonly parses the wrong result. Like the cases I show here. sqlprase mistake 【t1 (a, b, c)】 「table name and column list」 as function. This is a big problem. For exmaple:

insert into t1 (a, b, c) VALUES (1 ,2 ,3);

sql_parse_error

And for the second, if the grammar feed into bison is complete and no ambiguity, then the parse result would not be wrong. Can make sure it cover the whole grammar and nearly no error. And the mariadb is maintained the 「sql/sql_yacc.yy」, it can be used and can has the same parser with the mariadb sql database. And we can easily annotated something into parsing process and get the information that help a lot in autocompletion.

my autocompletion engine

the_design_of_my_autocompletion_engine there are three main component lexer_caller, bison_xml_reader and autocompletion_engine. Because our project is using python but bison generated parser and lexer is written in C. So I need to build a .so file for lexer and create a lexer_caller to load the library file and execute it get a list of tokens.

And I would need to annotated info at parsing process. So I create a bison_xml_reader read the xml report get the LR1 parsing need info like goto_table and action_table. So I can parse on them.

myautocomplete_engine_internal1 If we parse the statement at the left, then we will get the result at the right. And in this example, I only show the name token. So I can base on the token matched rule number and matched index to deduce it should be database, table or function, and so on. Such as the word Customers. It match on rule 45 - table_factor. And in this rule first token should be table, so the word Customers is table name.

myautocomplete_engine_internal2 And the autocompletion is not only parsed the complete SQL statement. It needs to predicate the next word is what. With the info in the bison report, it would be easy. According to the current state's transitions info. We can know the next word is what kind of token. In this case, the next token can be NAME, STRING, INTNUM, BOOL, APPROXNUM, USERVAR, NOT, ... etc. And it can reduce the matched symbols to select_expr_list, select_expr, expr. And the NAME token needs to digging more details types like database, column, table, and so on. So we look at the NAME token's transition. It would be from state 25 to state 51. And in State 51 has three rules that consume the NAME token. Rule 221's NAME token consumed is a column; rule 223's NAME token consumed is a table. Rule 264's NAME token consumed is function. So it not only suggests keywords and functions like STRING, INTNUM, BOOL, ... etc but also suggests a column, table, and function. But this autocompletion engine is still in early stage. So it is not adopted in this project. my autocompletion engine link(still in progress) - currently support basic introspection ability

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published