-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test mdpsan for handling memory #91
base: master
Are you sure you want to change the base?
Conversation
- Support n dimensional mdspan as in- and output. - The iteration order is fixed to iterate first over most right dimension and then iterate from the right to the left dimension. - Only single core CPU is supported at the moment. - Some code for developing with cppinsight is left. - Requires the alpaka branch: https://github.com/bernhardmgruber/alpaka/tree/mdspan
@bernhardmgruber ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should use an example where an mdspan actually makes sense over a 1D span. For the transform you have here, a 1D span is easier and potentially faster to handle.
Maybe mdspan is generally not a good fit for vikunja, since vikunja's primitives (e.g. transform, scan, for_each) are typically used on 1D sequences. Or in other words: there is no benefit of the n-dimensionality of the data structure.
One example I could think of would be batched matrix multiplication. You would have e.g. three mdspan<dynamic_extent, 4, 4> a, b, c
and compute as many individual 4x4 matrix multiplications as the dynamic extent.
// Copy the data back to the host for validation. | ||
alpaka::memcpy(queueAcc, hostMem, deviceMem, extent); | ||
|
||
Data resultSum = std::accumulate(hostNativePtr, hostNativePtr + extent.prod(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data resultSum = std::accumulate(hostNativePtr, hostNativePtr + extent.prod(), 0); | |
Data resultSum = std::reduce(hostNativePtr, hostNativePtr + extent.prod()); |
- only for testing purpose - fix bug, which wrote everything to input
I think any tensor example would fit extremely well for mdspan. Tensors are THE thing for machine learning. |
I fully agree that |
The PR tests mdspan as alternative for raw pointers to handle the input and output memory.
Benefits of mdspan:
The PR requires the mdspan support, which is not merged yet: alpaka-group/alpaka#1788
Is solving the same problem like in PR #88 with another approach.