Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite-utils search command #192

Closed
simonw opened this issue Nov 3, 2020 · 9 comments
Closed

sqlite-utils search command #192

simonw opened this issue Nov 3, 2020 · 9 comments
Labels
cli-tool enhancement New feature or request
Milestone

Comments

@simonw
Copy link
Owner

simonw commented Nov 3, 2020

A command that knows how to run a search against a FTS enabled table and return results ranked by relevance.

@simonw simonw added enhancement New feature or request cli-tool labels Nov 3, 2020
@simonw
Copy link
Owner Author

simonw commented Nov 3, 2020

Relevant example using a SQLite CTE: simonw/datasette#268 (comment)

CTEs were added in SQLite 3.8.3 in 2014-02-03 so they should be safe to use. If someone tries to run sqlite-utils search no an older version of SQLite they'll get an error, which I think is OK.

@simonw
Copy link
Owner Author

simonw commented Nov 3, 2020

Just realized there's a problem with the SQL I am using here: joining to get rank in this way only works against FTS5 tables, it doesn't work against FTS4.

with {original} as (
select
rowid,
{columns}
from [{dbtable}]
)
select
{original}.*,
[{fts}].rank as {rank}
from
[{original}]
join [{fts}] on [{original}].rowid = [{fts}].rowid
where
[{fts}] match :query
order by
{order}
{limit}

@simonw
Copy link
Owner Author

simonw commented Nov 3, 2020

I could depend on my sqlite-fts4 library to solve this: https://github.com/simonw/sqlite-fts4

Or I could say that only FTS5 is supported for ranked searches - but still support .search() against FTS4 just without the option to sort by relevance.

@simonw
Copy link
Owner Author

simonw commented Nov 4, 2020

FTS5 was added in SQLite 3.9.0 in 2015-10-14 - so about a year after CTEs, which means CTEs will always be safe to use with FTS5 queries.

@simonw
Copy link
Owner Author

simonw commented Nov 5, 2020

I want .search() to work against both FTS5 and FTS4 tables - but sort by rank should only work for FTS5.

This means I need to be able to introspect and tell if a table is FTS4 or FTS5.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2020

This is a bit surprising:

(sqlite-utils) sqlite-utils % sqlite-utils search 24ways-fts4.db articles maps -c title      
[{"rowid": 41, "title": "What Is Vagrant and Why Should I Care?", "rank": -1.9252039178908076},
 {"rowid": 298, "title": "First Steps in VR", "rank": -1.9945466378736434},
 {"rowid": 43, "title": "Content Production Planning", "rank": -2.1928058363046143},
 {"rowid": 100, "title": "Moo'y Christmas", "rank": -2.2698482999851675},
 {"rowid": 91, "title": "Infinite Canvas: Moving Beyond the Page", "rank": -2.290928999035195},
 {"rowid": 175, "title": "Front-End Code Reusability with CSS and JavaScript", "rank": -2.498731782924352},
 {"rowid": 209, "title": "Feeding the Audio Graph", "rank": -2.619968930100356},
 {"rowid": 296, "title": "Animation in Design Systems", "rank": -2.62060151817201},
 {"rowid": 118, "title": "Ghosts On The Internet", "rank": -2.7224894534521087},
 {"rowid": 77, "title": "Colour Accessibility", "rank": -2.7389782859427343},
 {"rowid": 245, "title": "Web Content Accessibility Guidelines 2.1\u2014for People Who Haven\u2019t Read the Update", "rank": -2.9750992611162888},
 {"rowid": 56, "title": "Helping VIPs Care About Performance", "rank": -3.0819662908932535},
 {"rowid": 109, "title": "Geotag Everywhere with Fire Eagle", "rank": -3.1371975973877277},
 {"rowid": 203, "title": "Jobs-to-Be-Done in Your UX Toolbox", "rank": -3.2416719461682733},
 {"rowid": 276, "title": "Your jQuery: Now With 67% Less Suck", "rank": -3.4947916564653028},
 {"rowid": 58, "title": "Beyond the Style Guide", "rank": -3.7508321464447905},
 {"rowid": 225, "title": "Good Ideas Grow on Paper", "rank": -4.120077674716844},
 {"rowid": 168, "title": "Unobtrusively Mapping Microformats with jQuery", "rank": -4.662224207228984},
 {"rowid": 27, "title": "Putting Design on the Map", "rank": -5.667327088267961},
 {"rowid": 220, "title": "Finding Your Way with Static Maps", "rank": -9.952534352591737}]

I requested just -c title but also got back rowid and rank.

@simonw
Copy link
Owner Author

simonw commented Nov 6, 2020

Also that order looks incorrect. It looks like most relevant came back last, not first.

@simonw simonw added this to the 3.0 milestone Nov 6, 2020
@simonw
Copy link
Owner Author

simonw commented Nov 7, 2020

I'm going to have it only output the exact -c columns you requested (if you requested any). Add --rank to specify the rank column, since you may not know what its name is.

@simonw
Copy link
Owner Author

simonw commented Nov 8, 2020

This looks pretty good now!

% sqlite-utils search 24ways.db articles simon -c title -c author -t 
title                                                                          author
-----------------------------------------------------------------------------  ------------------
Don't be eval()                                                                Simon Willison
DOM Scripting Your Way to Better Blockquotes                                   Jeremy Keith
Swooshy Curly Quotes Without Images                                            Simon Collison
The Articulate Web Designer of Tomorrow                                        Simon Collison
Writing Responsible JavaScript                                                 Drew McLellan
Going Nuts with CSS Transitions                                                Natalie Downe
Managing a Mind                                                                Christopher Murphy
Design Systems                                                                 Laura Kalbag
Bringing Your Code to the Streets                                              Ruth John
Taming Complexity                                                              Simon Collison
Unobtrusively Mapping Microformats with jQuery                                 Simon Willison
Crafting the Front-end                                                         Ben Bodien
Develop Your Naturalist Superpowers with Observable Notebooks and iNaturalist  Natalie Downe
Fast Autocomplete Search for Your Website                                      Simon Willison

@simonw simonw closed this as completed Nov 8, 2020
simonw added a commit that referenced this issue Nov 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli-tool enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant