Releases: gorgonia/cu
Releases · gorgonia/cu
CUDA 12 Support (Windows)
What's Changed
CUDA 12 works for Windows as well. Thanks to Mike(@hunjixin)
New Contributors
Full Changelog: v0.9.5...v0.9.6
CUDA 12 Support
What's Changed
- Incorrect variable passed in cuLaunchAndSync by @tkunicki in #58
- Fixed install command by @MarvinJWendt in #59
- Add CUDA 11.8 and alternative means of setting up cgo on Windows by @dalva24 in #60
- Cuda 12 by @neurlang in #69
New Contributors
- @tkunicki made their first contribution in #58
- @MarvinJWendt made their first contribution in #59
- @dalva24 made their first contribution in #60
- @neurlang made their first contribution in #69
Full Changelog: v0.9.4...v0.9.5
CUDA11 supported
* CUDA11 initial work. First, we generate the new enums * Added generateEnums, which generates the Go version of the CUresult type * Updated tests such that they no longer fail. Added a Signal() method to BatchedContext, to force the BatchedContext to DoWork * Updated benchmarking of batched vs no batched context. It would appear that for now Batching no longer confers a benefit * Attempt #4 at getting CUDA11. Previous attempts were working based off a faulty copy of `cuda.h` - Updated Device to support UUID - Updated README - Updated genlib to do more things more carefully * More work on CUDA11 - Added more mappings into mappings.go to generate stufff - Changed the definition of Context, by adding one additional method to clear L2Cache - Added stubs for LaunchCooperativeKernel - Added Graph types. TODO next: add all the basic Graph data structure and then autogenerate all the things! * Fixed mappings to also include @egonelbre's change in 2e25e65507 Fixed a bug where Fix() wasn't called, leading to weird generations * Added some graph stuff, fixed some mappings stuff for genAPI. It seems that the graph functions will have to be manually written for now * Updated graph.go from ages ago * Updated more of CUDA11 Graph API into the library. Slowly getting there. * Added the body of CopyParams * Added AddMemsetNode method for Graph. * Fixed a bunch of things * Switched to modernc.org/cc instead of using the older github.com/cznic/cc * cuDNN updated their website. So parse.py also has to change. As a result moredecls.go also changed * Sorted the data in mappings.go. This will allow for better diffing * Updated the generatethis pipeline * Initial mappings generation. * Mapped the old commented out mappings to new commented out mappings (see mappings.ods) * Generated enums. * Updated enums and enum strings * Added more generated data structures * Added methods * Generated stubs. 7 TODOs * Added more incompletes report * Manually fixed the TODO of SpatialTransformer * Manually fixed generated_rnndata.go * Manually fixed generated_seqdata.go * Manually fixed generated_backend.go * Manually fixed generated_tensortransform.go * Fixed the missing getters * fixed all the .C()s of the generated types * Generated a new API * Fixed random C int issues. Now to handle the rest * Updated INCOMPLETES_REPORTS * fixed variable collition in _BackendAttributeTypeNames * gencudnn enum generation syntax fixes added * Updated INCOMPLETES * variable renaming added as per the review * AlgorithmDescriptor syntax fixes added * AlgorithmPerformance syntax fixes added * Activation cudnnActivationDescriptor_t return method name change added * syntax fixes added on FusedOpVariantParams * FusedOpConsts syntax fixes added * C type retrieve function added for cudnnStatus * tensor file syntax fixes added tensor file unreachable code removed * method receiver renaming added * optensor syntax fixes added * generated_api syntax fixes added * code review changes added * go modules updated algorithmdescriptor Algorithm type changes added * review changes added GetRNNLinLayerBiasParams & GetRNNLinLayerMatrixParams methods moved to manually written API.go file * Fixed a bug in parse.py where when parsing the documentation for CUDA11, the function names have `()` * Removed deprecated functions from being generated * More deprecated stuff no longer generated * Fixed up algorithmdescriptor.go * fixed some auto generated issues * Manually fixed the fused ops generation * Fixed even more autogenerated errors * Fixed up more of the auto generated issues * Renamed API to todo, because eh, I'll figure it out later Co-authored-by: Aruna Prabhashwara <[email protected]>
CUDA 10.2 supported
v0.9.3 Added some more documentation, and support for cuda 10.2
New CUDA versions supported
fixed the convolution.c import use cuda 10.1
v0.9.1
Beta release of v0.9.0
Features:
- CUDA 9 support
- CuDNN 7 support
- JIT support (thanks @egonelbre )
- nvRTC support (thanks @egonelbre )
- Full CUBLAS support
- Move towards a unified generation method
- Various API changes
- Various fixes (@egonelbre)
- Bug fixes (thanks to @egonelbre):
- CString not freed