-
-
Notifications
You must be signed in to change notification settings - Fork 711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"datasette insert" command and plugin hook #1160
Comments
Basic command design:
The options can include:
|
The UI can live at
|
Other names I considered:
|
Default formats to support:
Each of these will be implemented as a default plugin. |
The It should accept more than one file name at a time for bulk inserts. if using a URL that URL will be passed to the method that decides if a plugin implementation can handle the import or not. This will allow plugins to register themselves for specific websites. |
It would be pretty cool if you could launch Datasette directly against an insert-compatible file or URL without first having to load it into a SQLite database file. Or imagine being able to tail a log file and like that directly into a new Datasette process, which then runs a web server with the UI while simultaneously continuing to load new entries from that log into the in-memory SQLite database that it is serving... Not quite sure what that CLI interface would look like. Maybe treat that as a future stretch goal for the moment. |
Potential design for this: a
|
Given the URL option could it be possible for plugins to "subscribe" to URLs that keep on streaming?
|
More thoughts on this: the key mechanism that populates the tables needs to be an |
If I'm going to execute 1000s of writes in an https://stackoverflow.com/a/36648102 and python/asyncio#284 confirm that |
It would be neat if |
Figuring out the API designI want to be able to support different formats, and be able to parse them into tables either streaming or in one go depending on if the format supports that. Ideally I want to be able to pull the first 1,024 bytes for the purpose of detecting the format, then replay those bytes again later. I'm considering this a stretch goal though. CSV is easy to parse as a stream - here’s how sqlite-utils does it:
Problem: using |
Important detail from https://docs.python.org/3/library/csv.html#csv.reader
|
Does it definitely make sense to break this operation up into the code that turns the incoming format into a iterator of dictionaries, then the code that inserts those into the database using That seems right for simple imports, where the incoming file represents a sequence of records in a single table. But what about more complex formats? What if a format needs to be represented as multiple tables? |
Aside: maybe This would be useful for import mechanisms that are likely to need their own custom set of command-line options unique to that source. |
What's the simplest thing that could possible work? I think it's |
It would be nice if this abstraction could support progress bars as well. These won't necessarily work for every format - or they might work for things loaded from files but not things loaded over URLs (if the |
I'm going to break out some separate tickets. |
How much of this should I get done in a branch before merging into The challenge here is the plugin hook design: ideally I don't want an incomplete plugin hook design in |
If I design this right I can ship a full version of the command-line |
The documentation for this plugin hook is going to be pretty detailed, since it involves writing custom classes. I'll stick it all on the existing hooks page for the moment, but I should think about breaking up the plugin hook documentation into a page-per-hook in the future. |
If I can get this working for CSV, TSV, JSON and JSON-NL that should be enough to exercise the API design pretty well across both streaming and non-streaming formats. |
Should this command include a I thought about doing that for But maybe I can set sensible defaults for that with |
Tools for loading data into Datasette currently mostly exist as separate utilities -
yaml-to-sqlite
andcsvs-to-sqlite
and suchlike.Bringing these into Datasette could have some interesting properties:
datasette insert
command could be extended with plugins to handle more formatsyaml-to-postgresql
and suchlikeThe text was updated successfully, but these errors were encountered: