You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There will at some point be a proxy that intercepts and authenticates the request. There will also be some amount of catalog lookup here to ensure the query we're sending to the node has all info it needs for planning/executing.
Right now we don't need to do that since the client is able to send that info directly, and it doesn't need to injected or read from anywhere.
For example (assuming wasm):
-- Stores attach info on client.
ATTACH postgres DATABASE AS mypg ...
-- Plans remotely be sending attach info to server.SELECT*FROMmypg.schema.table
What's left before we can begin implementing more data sources (db vs files)?
Either extend FileLocation or have a wrapper around FileLocation which is able to handle globbing which produces FileLocations. Personally I like how simple FileLocation is right now, so leaning more towards a FileList struct which can parse/handle hive & glob.
How much work is it going to be to support different cloud provider object storage?
S3 already in. GCS will require the service account flow (#190). I don't think GCS would take too long, maybe a day or two.
Have no clue for azure.
Where do we stand with the "native" storage? Updates? Inserts?
Where does the delta-lake implementation stand? Vacuuming? Write operations?
Very simple reads right now. No writes yet.
Where is catalog persistence (Databases? Tables?) and what more remains?
See above for CatalogStorage & TableStorage. These are implemented for memory tables, and postgres has an implementation of CatalogStorage that only does table lookup, no actual catalog modifications.
The text was updated successfully, but these errors were encountered:
Currently no authentication. Anyone is able to connect to "https://server.rayexec.glaredb.com".
There will at some point be a proxy that intercepts and authenticates the request. There will also be some amount of catalog lookup here to ensure the query we're sending to the node has all info it needs for planning/executing.
Right now we don't need to do that since the client is able to send that info directly, and it doesn't need to injected or read from anywhere.
For example (assuming wasm):
Files should be read to go enough.
E.g. parquet: https://github.com/glaredb/rayexec/blob/cfd482eba4020acf9d211138c520c62f5e081737/crates/rayexec_parquet/src/lib.rs#L18-L65
There might be some slight changes but nothing major.
Databases slightly less sure of. They'll need to implement
connect
: https://github.com/glaredb/rayexec/blob/cfd482eba4020acf9d211138c520c62f5e081737/crates/rayexec_execution/src/datasource.rs#L64-L73The semantics of
DataSourceConnection
still needs to be hammered out around actually making changes to catalogs in remote databases, but I imagine for reads, it should be decently ready enough. E.g. postgres: https://github.com/glaredb/rayexec/blob/cfd482eba4020acf9d211138c520c62f5e081737/crates/rayexec_postgres/src/lib.rs#L55-L68https://github.com/glaredb/rayexec/blob/cfd482eba4020acf9d211138c520c62f5e081737/crates/rayexec_io/src/location.rs#L66-L85
Either extend
FileLocation
or have a wrapper aroundFileLocation
which is able to handle globbing which producesFileLocation
s. Personally I like how simpleFileLocation
is right now, so leaning more towards aFileList
struct which can parse/handle hive & glob.S3 already in. GCS will require the service account flow (#190). I don't think GCS would take too long, maybe a day or two.
Have no clue for azure.
Temp tables exist with inserts.
"Native storage" will just be another data source. The idea is that datasources will return both a TableStorage and CatalogStorage implementation from
connect
(https://github.com/glaredb/rayexec/blob/cfd482eba4020acf9d211138c520c62f5e081737/crates/rayexec_execution/src/datasource.rs#L37-L41) and it's what I plan to use for the native storage (just plop in delta for table storage + whatever catalog stuff).TableStorage implements this trait: https://github.com/glaredb/rayexec/blob/cfd482eba4020acf9d211138c520c62f5e081737/crates/rayexec_execution/src/storage/table_storage.rs#L8-L18
This will be how tables are physically created/dropped/scanned.
DataTables implement this trait: https://github.com/glaredb/rayexec/blob/cfd482eba4020acf9d211138c520c62f5e081737/crates/rayexec_execution/src/storage/table_storage.rs#L20-L40
CatalogStorage will implement this trait: https://github.com/glaredb/rayexec/blob/cfd482eba4020acf9d211138c520c62f5e081737/crates/rayexec_execution/src/storage/catalog_storage.rs#L7-L13
CatalogStorage is the larger unknown right now.
Very simple reads right now. No writes yet.
See above for CatalogStorage & TableStorage. These are implemented for memory tables, and postgres has an implementation of CatalogStorage that only does table lookup, no actual catalog modifications.
The text was updated successfully, but these errors were encountered: