-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add topostats file helper class #945
base: main
Are you sure you want to change the base?
Conversation
Not ignoring this, have had a scan through and it looks good but focusing on the various outstanding issues with the better tracing merger. |
All good, not wanting this to take time away from more important stuff, it can wait and I want user opinions first too :) |
Can we work into this a notebook on opening and extracting data from the file too, to address the comments from the workshop day? |
ye |
Added a notebook showing how to use the class in |
topostats/io.py
Outdated
Examples | ||
-------- | ||
Creating a helper object. | ||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about how this renders when Sphinx's Autoapi-doc parses it to generate API docs in the webpage as docstrings are Restructured text. Might be worth using the .. code-block::
approach, see @MaxGamill-Sheffield solution in commit b953634 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point thank you, I'll do that
I think the tests under Python 3.9 failing because under that version we need the following import... from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Notebook would be easier to read if the comments were moved to Markdown sections to delineate the code examples. Otherwise may as well just include a codeblock in docs/advanced/topostats_file_helper.md
and be a simpler solution to documentation as its a web-page people can go to, they wouldn't need to activate a virtual environment and then start a Jupyter. Code chunks could be copy and pasted (I'd have to work out how to enable a button to support that though).
Just came across h5glance after the Skan developer mentioned it on Mastodon. I wonder if using this would be a simpler solution, it sounds as though it might work within Jupyter Notebooks too. |
Good find! It does work Wonderfully in notebooks and it've even interactive! You can click on the items to expand / hide sub-fields. However this is just for looking at I propose that I replace the code I wrote to display the contents of the file with |
Sounds like a plan. From memory these are thin wrappers around being able to access dictionary items directly and I feel that its substituting learning how to work with dictionaries directly with learning how to use the wrappers. I'm of the opinion that the more general skill (working directly with dictionaries) has broader benefits to users in the long term. ⚖️ |
There is an issue with I tried this: def pretty_print_structure(self) -> None:
"""
Print the structure of the data in the data dictionary.
The structure is printed with the keys indented to show the hierarchy of the data.
"""
LOGGER.info(f"running h5glance")
H5Glance(self.topofile) and this: def pretty_print_structure(self) -> None:
"""
Print the structure of the data in the data dictionary.
The structure is printed with the keys indented to show the hierarchy of the data.
"""
LOGGER.info(f"running h5glance")
result = H5Glance(self.topofile)
print(result) and neither produce the nested output needed. If users want the interactive So either they must remember |
I guess it depends how people are using the I'm not a great fan of re-inventing the 🛞 and if a tool exists I'll tend to advocate for its use over making something new which adds to our codebase and the overhead of maintenance. As a general rule though, and perhaps I'm missing something, but I think I recall writing such when I wrote original Notebooks (see for example the line after The h5glance README points to using the Having re-read the |
Plan from the TopoStats code clean 08/01/25 is:
|
Sorry to miss code clean, was embroiled in some bioinformatics work and didn't notice the time. With regards to Notebook this might be an ideal opportunity to migrate to the newer marimo which among other things has a major advantage of updating all dependent cells when an earlier one is re-run. For more on the problems marimo solves see the faq. I doubt there will be much of an overhead in migrating since its still a notebook running the cells so both Markdown and code cells could be copied over. There is a section on migrating from Jupyter. |
e06ed14
to
279bc00
Compare
This PR would add a small helper class in
topostats.io
to assist users that want to explore & retrieve data contained in.topostats
files, as we have had feedback from the experimentalists that navigating the.hdf5
file structure is prohibitively complex / difficult to do manually.Previously, to load the file in a notebook, one had to:
The TopoFileHelper class adds some methods to help with this:
pretty_print_structure()
will print the entire structure (but not messy dictionaries of arrays!):find_data()
will perform a strict search for the keys (given in a list) and if no match is found, perform a partial search to find possible matches that the user intended. Eg:get_data()
Simply retries data when provided with a key string separated by "/"s:data_info()
Prints a little information about the value at a specific key:No tests yet, would want feedback first