Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mechanism for ranking results from SQLite full-text search #268

Open
simonw opened this issue May 16, 2018 · 12 comments
Open

Mechanism for ranking results from SQLite full-text search #268

simonw opened this issue May 16, 2018 · 12 comments

Comments

@simonw
Copy link
Owner

simonw commented May 16, 2018

This isn't particularly straight-forward - all the more reason for Datasette to implement it for you. This article is helpful: http://charlesleifer.com/blog/using-sqlite-full-text-search-with-python/

@simonw
Copy link
Owner Author

simonw commented Jun 24, 2019

I did a bunch of research relevant to this a while ago: https://simonwillison.net/2019/Jan/7/exploring-search-relevance-algorithms-sqlite/

@simonw
Copy link
Owner Author

simonw commented Aug 18, 2020

I want this on the table page - but that means that the table page will need to run a slightly more complex query since it needs access to a rank column to sort by - which it gets from running a join.

BUT... that join needs to be constructed in a way that keeps existing filters, ?_where= clauses etc intact.

Here's a prototype using SQLite CTEs: https://register-of-members-interests.datasettes.com/regmem?sql=with+original+as+%28select+rowid%2C+*+from+items%29%0D%0Aselect%0D%0A++original.*%2C%0D%0A++items_fts.rank+as+items_fts_rank%0D%0Afrom%0D%0A++original+join+items_fts+on+original.rowid+%3D+items_fts.rowid%0D%0Awhere%0D%0A++items_fts+match+escape_fts%28%3Asearch%29%0D%0Aorder+by+items_fts_rank+desc+limit+10&search=hotel

with original as (
  select
    rowid,
    *
  from
    items
)
select
  original.*,
  items_fts.rank as items_fts_rank
from
  original
  join items_fts on original.rowid = items_fts.rowid
where
  items_fts match escape_fts(:search)
order by
  items_fts_rank desc
limit
  10

@simonw
Copy link
Owner Author

simonw commented Nov 4, 2020

Worth noting that joining to get the rank works for FTS5 but not for FTS4 - see comment here: simonw/sqlite-utils#192 (comment)

Easiest solution would be to only support sort-by-rank for FTS5 tables. Alternative would be to depend on https://github.com/simonw/sqlite-fts4

@simonw
Copy link
Owner Author

simonw commented Nov 9, 2020

I should depend on sqlite-fts4 - I'm doing that in sqlite-utils now and it works great: simonw/sqlite-utils#198

@simonw
Copy link
Owner Author

simonw commented Nov 13, 2020

Part of the challenge here is that this is the first time the TableView will have had a complete rewrite of the SQL it is going to execute. That SQL is currently constructed here:

sql_no_limit = "select {select} from {table_name} {where}{order_by}".format(
select=select,
table_name=escape_sqlite(table),
where=where_clause,
order_by=order_by,
)
sql = "{sql_no_limit} limit {limit}{offset}".format(
sql_no_limit=sql_no_limit.rstrip(), limit=page_size + 1, offset=offset
)

@simonw simonw modified the milestones: Datasette 0.52, Datasette Next Nov 28, 2020
@mhalle
Copy link

mhalle commented Mar 3, 2021

In FTS5, I think doing an FTS search is actually much easier than doing a join against the main table like datasette does now. In fact, FTS5 external content tables provide a transparent interface back to the original table or view.

Here's what I'm currently doing:

  • build a view that joins whatever tables I want and rename the columns to non-joiny names (e.g, chapter.name AS chapter_name in the view where needed)
  • Create an FTS5 table with content="viewname"
  • As described in the "external content tables" section (https://www.sqlite.org/fts5.html#external_content_tables), sql queries can be made directly to the FTS table, which behind the covers makes select calls to the content table when the content of the original columns are needed.
  • In addition, you get "rank" and "bm25()" available to you when you select on the _fts table.

Unfortunately, datasette doesn't currently seem happy being coerced into doing a real query on an fts5 table. This works:
select col1, col2, col3 from table_fts where coll1="value" and table_fts match escape_fts("search term") order by rank

But this doesn't work in the datasette SQL query interface:
select col1, col2, col3 from table_fts where coll1="value" and table_fts match escape_fts(:search) order by rank (the "search" input text field doesn't show up)

For what datasette is doing right now, I think you could just use contentless fts5 tables (content=""), since all you care about is the rowid since all you're doing a subselect to get the rowid anyway. In fts5, that's just a contentless table.

I guess if you want to follow this suggestion, you'd need a somewhat different code path for fts5.

@mhalle
Copy link

mhalle commented Mar 4, 2021

It's kind of an ugly hack, but you can try out what using the fts5 table as an actual datasette-accessible table looks like without changing any datasette code by creating yet another view on top of the fts5 table:

create view proxyview as select *, rank, table_fts as fts from table_fts;

That's now visible from datasette, just like any other view, but you can use fts match escape_fts(search_string) order by rank.

This is only good as a proof of concept because you're inefficiently going from view -> fts5 external content table -> view -> data table. However, it does show it works.

@rayvoelker
Copy link

I had setup a full text search on my instance of Datasette for title data for our public library, and was noticing that some of the features of the SQLite FTS weren't working as expected ... and maybe the issue is in the escape_fts() function

image
vs removing the function...
image

Also, on the issue of sorting by rank by default .. perhaps something like this could work for the baked-in default SQL query for Datasette?
image

link to the above search in my instance of Datasette

@simonw
Copy link
Owner Author

simonw commented Jul 8, 2021

I had setup a full text search on my instance of Datasette for title data for our public library, and was noticing that some of the features of the SQLite FTS weren't working as expected ... and maybe the issue is in the escape_fts() function

That's a deliberate feature (albeit controversial, see #759) - part of the main problem here is that it's easy to construct a SQLite full-text search string which results in a database error. This is a bad user-experience!

You can opt-in to raw SQL queries by appending ?_searchmode=raw to the page, see https://docs.datasette.io/en/stable/full_text_search.html#advanced-sqlite-search-queries

But maybe there should be an option for turning that on by default without needing the query string?

@rayvoelker
Copy link

I do like the idea of there being a option for turning that on by default so that you could use those terms in the default "Search" bar presented when you browse to a table where FTS has been enabled. Maybe even a small inline pop up with a short bit explaining the FTS feature and the keywords (e.g. case matters). What are the side-effects of turning that on in the query string, or even by default as you suggested? I see that you stated in the docs... "to ensure they do not cause any confusion for users who are not aware of them", but I'm not sure what those could be.

Isn't it the case that those keywords are only picked up by sqlite in where you're using the MATCH clause?

Seems like a really powerful feature (even though there are a lot of hurdles around setting it up in the sqlite db ... sqlite-utils makes that so simple by the way!)

@simonw
Copy link
Owner Author

simonw commented Jul 14, 2021

What are the side-effects of turning that on in the query string, or even by default as you suggested? I see that you stated in the docs... "to ensure they do not cause any confusion for users who are not aware of them", but I'm not sure what those could be.

Mainly that it's possible to generate SQL queries that crash with an error. This was the example that convinced me to default to escaping:

@simonw
Copy link
Owner Author

simonw commented Jul 14, 2021

... though interestingly I can't replicate that error on latest.datasette.io - https://latest.datasette.io/fixtures/searchable?_search=park.&_searchmode=raw

That's running https://latest.datasette.io/-/versions SQLite 3.35.4 whereas https://www.niche-museums.com/-/versions is running 3.27.2 (the most recent version available with Vercel) - but there's nothing in the SQLite changelog between those two versions that suggests changes to how the FTS5 parser works. https://www.sqlite.org/changes.html

@simonw simonw removed this from the Datasette Next milestone Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants