You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rethink the whole way we interact with data: Session, CheckedSession, FileHandler, LazySession (#727), open_excel, ... See also the refactoring in #761 and #614.
Dataset API:
__init__(connect_string, max_memory=None, **kwargs) -- filepath or connection string, kwargs passed to underlying Dataset implementation (compression option, Excel option, ...). If max_memory is not None, the Dataset will transparently flush some of its content (probably base on LRU) to "disk" when more memory is needed.
open(**kwargs) -- open/connect to the underlying storage. Kwargs here override those passed in __init__. Normally called via __enter__.
__enter__ and __exit__ (to be usable as a context manager)
read(key=None) -- read a single key, multiple keys (when key is a list), or everything (if key is None) and return the values. Unsure this explicit method makes sense. Maybe __getitem__, with an optional load() is enough.
load(key=None) -- load a single key, multiple keys (when key is a list), or everything (if key is None) and return nothing.
open_key(key=None) -- in the future for returning a lazy object which will load data when actually accessed. Can potentially load only part of that key (array/...). This needs further thoughts.
__getattr__ -> forwards to __getitem__
__getitem__(key) -> equivalent to load(key) if not loaded yet and return the array (or use open_key(key) instead???)
__setitem__(key) -> add or change an existing value.
close() -- close file/connection to underlying storage. Normally called via __exit__
Misc thoughts:
I think excel.Workbook should be a subclass of Dataset
We could/should also implement a generic "read" top-level function which would open a dataset, read the array and close it, to replace/complement the read_* functions.
The text was updated successfully, but these errors were encountered:
Rethink the whole way we interact with data: Session, CheckedSession, FileHandler, LazySession (#727), open_excel, ... See also the refactoring in #761 and #614.
Dataset API:
__init__(connect_string, max_memory=None, **kwargs)
-- filepath or connection string, kwargs passed to underlying Dataset implementation (compression option, Excel option, ...). If max_memory is not None, the Dataset will transparently flush some of its content (probably base on LRU) to "disk" when more memory is needed.open(**kwargs)
-- open/connect to the underlying storage. Kwargs here override those passed in__init__
. Normally called via__enter__
.__enter__
and__exit__
(to be usable as a context manager)read(key=None)
-- read a single key, multiple keys (when key is a list), or everything (if key is None) and return the values. Unsure this explicit method makes sense. Maybe__getitem__
, with an optionalload()
is enough.load(key=None)
-- load a single key, multiple keys (when key is a list), or everything (if key is None) and return nothing.open_key(key=None)
-- in the future for returning a lazy object which will load data when actually accessed. Can potentially load only part of that key (array/...). This needs further thoughts.__getattr__
-> forwards to__getitem__
__getitem__(key)
-> equivalent toload(key)
if not loaded yet and return the array (or use open_key(key) instead???)__setitem__(key)
-> add or change an existing value.close()
-- close file/connection to underlying storage. Normally called via__exit__
Misc thoughts:
The text was updated successfully, but these errors were encountered: