-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: New MATLAB loader #4441
WIP: New MATLAB loader #4441
Conversation
Some issues with character encoding that needs to be clarified. Changed the API of ZlibInputStream to return empty reads at EOF rather than raise IOError, like other streams.
See test_classdef.py and gen_classdefmat.m. References to other objects is not implemented yet.
Did you run the benchmarks? |
Now I have. Memory usage is basically the same as the Cython version after providing |
b99f9b1
to
538d38b
Compare
GenericStream.readinto was made cpdef and its signature changed accordingly to allow reading into a buffer that'll directly be turned into an ndarray.
538d38b
to
a3df69a
Compare
I think you will find that you won't be able to optimize much more in Python, but I am happy to be proved wrong. I wrote the Cython reader because people were complaining rather vigorously that the original reader was too slow (about the same speed as you have now). I am happy to port what you did to the current code, if it is not obvious how to do that. |
Yes, once Re: simplification: I read and yield tags in a big generator loop (which may or may not be a good idea...) in If you want to port the object loading part to the old version (I honestly don't have the courage to do this now), you may want to have a look at Or, perhaps we can just cythonize the new code... Let me know if I can be of any help. |
... which I had missed initially.
On another topic, I was worried about what to do in case of self-containing objects, but apparently MATLAB avoids creating them so we may be fine:
|
Taking back what I just said; self-containing objects can be created by MATLAB too, one just needs to derive the class from |
I'm glad to see you're finding my notes useful. I managed to access the data I needed and then the project fell off my desk. If you have any questions about my notes or run into any inconsistencies you are very welcome to contact me. It's really exciting to see support for this in any language! |
Some more fun on MAT files: object references to other objects are represented as uint32 column arrays of 6 elements, with the first element equal to
|
:) Rather remarkable, isn't it? If I remember right they don't even check the length. See JuliaIO/MAT.jl#23 (comment) |
@mbauman I have finally figured out a case where "segment 5" is populated: when saving classes with dynamic properties, e.g. below.
Not that I care much about them, and it's not totally clear how to represent such objects (and especially arrays of such objects with varying defined properties) on the numpy side of things. |
FWIW I repackaged the whole thing as an independent package (http://github.com/anntzer/matloader) for my own use, so that you can have the best of both worlds. |
No movement on this in 3.5 years and this is available as https://github.com/anntzer/matloader. Hence closing this PR. Thanks @anntzer, all! |
Re #4411 (loading of MATLAB classdef objects):
I gave in and just rewrote the entire MAT5 loader in pure Python -- it is now much shorter (I essentially got rid of the 800-line
mio5_utils.pyx
while keepingmio5.py
nearly the same size) -- then added MATLAB object support (based on the the notes of @mbauman) on top of that.Some private APIs have been changed and docstrings still need to be updated.