Skip to content

Commit

Permalink
HDF5: Finish Chunking JSON/Env control
Browse files Browse the repository at this point in the history
  • Loading branch information
ax3l committed Jun 23, 2021
1 parent 2e01d59 commit 5610b2d
Show file tree
Hide file tree
Showing 8 changed files with 64 additions and 11 deletions.
4 changes: 4 additions & 0 deletions docs/source/backends/hdf5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ environment variable default description
===================================== ========= ====================================================================================
``OPENPMD_HDF5_INDEPENDENT`` ``ON`` Sets the MPI-parallel transfer mode to collective (``OFF``) or independent (``ON``).
``OPENPMD_HDF5_ALIGNMENT`` ``1`` Tuning parameter for parallel I/O, choose an alignment which is a multiple of the disk block size.
``OPENPMD_HDF5_CHUNKS`` ``auto`` Defaults for ``H5Pset_chunk``: ``"auto"`` (heuristic) or ``"none"`` (no chunking).
``H5_COLL_API_SANITY_CHECK`` unset Set to ``1`` to perform an ``MPI_Barrier`` inside each meta-data operation.
===================================== ========= ====================================================================================

Expand All @@ -40,6 +41,9 @@ According to the `HDF5 documentation <https://support.hdfgroup.org/HDF5/doc/RM/H
*For MPI IO and other parallel systems, choose an alignment which is a multiple of the disk block size.*
On Lustre filesystems, according to the `NERSC documentation <https://www.nersc.gov/users/training/online-tutorials/introduction-to-scientific-i-o/?start=5>`_, it is advised to set this to the Lustre stripe size. In addition, ORNL Summit GPFS users are recommended to set the alignment value to 16777216(16MB).

``OPENPMD_HDF5_CHUNKS`` This sets defaults for data chunking via `H5Pset_chunk <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_chunk.htm>`__.
Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.

``H5_COLL_API_SANITY_CHECK``: this is a HDF5 control option for debugging parallel I/O logic (API calls).
Debugging a parallel program with that option enabled can help to spot bugs such as collective MPI-calls that are not called by all participating MPI ranks.
Do not use in production, this will slow parallel I/O operations down.
Expand Down
17 changes: 17 additions & 0 deletions docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,23 @@ Explanation of the single keys:
Any setting specified under ``adios2.dataset`` is applicable globally as well as on a per-dataset level.
Any setting under ``adios2.engine`` is applicable globally only.

HDF5
^^^^

A full configuration of the HDF5 backend:

.. literalinclude:: hdf5.json
:language: json

All keys found under ``hdf5.dataset`` are applicable globally (future: as well as per dataset).
Explanation of the single keys:

* ``adios2.dataset.chunks``: This key contains options for data chunking via `H5Pset_chunk <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_chunk.htm>`__.
The default is ``"auto"`` for a heuristic.
``"none"`` can be used to disable chunking.
Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.


Other backends
^^^^^^^^^^^^^^

Expand Down
7 changes: 7 additions & 0 deletions docs/source/details/json.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"hdf5": {
"dataset": {
"chunks": "auto"
}
}
}
2 changes: 1 addition & 1 deletion include/openPMD/IO/HDF5/HDF5Auxiliary.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ namespace openPMD
* @param[in] typeSize size of each element in bytes
* @return array for resulting chunk dimensions
*/
inline std::vector< hsize_t >
std::vector< hsize_t >
getOptimalChunkDims( std::vector< hsize_t > const dims,
size_t const typeSize );
} // namespace openPMD
1 change: 1 addition & 0 deletions include/openPMD/IO/HDF5/HDF5IOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ namespace openPMD

private:
auxiliary::TracingJSON m_config;
std::string m_chunks = "auto";
struct File
{
std::string name;
Expand Down
4 changes: 2 additions & 2 deletions src/IO/AbstractIOHandlerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ namespace openPMD
{
case Format::HDF5:
return std::make_shared< ParallelHDF5IOHandler >(
path, access, comm, std::move( optionsJson ) );
path, access, comm, std::move( options ) );
case Format::ADIOS1:
# if openPMD_HAVE_ADIOS1
return std::make_shared< ParallelADIOS1IOHandler >( path, access, comm );
Expand Down Expand Up @@ -82,7 +82,7 @@ namespace openPMD
{
case Format::HDF5:
return std::make_shared< HDF5IOHandler >(
path, access, std::move( optionsJson ) );
path, access, std::move( options ) );
case Format::ADIOS1:
#if openPMD_HAVE_ADIOS1
return std::make_shared< ADIOS1IOHandler >( path, access );
Expand Down
34 changes: 28 additions & 6 deletions src/IO/HDF5/HDF5IOHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
*/
#include "openPMD/IO/HDF5/HDF5IOHandler.hpp"
#include "openPMD/IO/HDF5/HDF5IOHandlerImpl.hpp"
#include "openPMD/auxiliary/Environment.hpp"

#if openPMD_HAVE_HDF5
# include "openPMD/Datatype.hpp"
Expand Down Expand Up @@ -84,12 +85,31 @@ HDF5IOHandlerImpl::HDF5IOHandlerImpl(
H5Tinsert(m_H5T_CLONG_DOUBLE, "r", 0, H5T_NATIVE_LDOUBLE);
H5Tinsert(m_H5T_CLONG_DOUBLE, "i", sizeof(long double), H5T_NATIVE_LDOUBLE);

m_chunks = auxiliary::getEnvString( "OPENPMD_HDF5_CHUNKS", "auto" );
// JSON option can overwrite env option:
if( config.contains( "hdf5" ) )
{
m_config = std::move( config[ "hdf5" ] );
/*
* @todo Apply global configuration options from the JSON config here.
*/

// check for global dataset configs
if( m_config.json().contains( "dataset" ) )
{
auto datasetConfig = m_config[ "dataset" ];
if( datasetConfig.json().contains( "chunks" ) )
{
m_chunks = datasetConfig.json()[ "chunks" ];
}
datasetConfig.declareFullyRead();
}
if( m_chunks != "auto" && m_chunks != "none" )
{
std::cerr << "Warning: HDF5 chunking option set to an invalid "
"value '" << m_chunks << "'. Reset to 'auto'."
<< std::endl;
m_chunks = "auto";
}

// unused params
auto shadow = m_config.invertShadow();
if( shadow.size() > 0 )
{
Expand Down Expand Up @@ -304,12 +324,14 @@ HDF5IOHandlerImpl::createDataset(Writable* writable,
/* enable chunking on the created dataspace */
hid_t datasetCreationProperty = H5Pcreate(H5P_DATASET_CREATE);

if( num_elements != 0u )
if( num_elements != 0u && m_chunks != "none" )
{
//! @todo add per dataset chunk control from JSON config

// get chunking dimensions
std::vector< hsize_t > chunk_dims = getOptimalChunkDims(dims, toBytes(d));

// TODO: allow overwrite with user-provided chunk size
//! @todo allow overwrite with user-provided chunk size
//for( auto const& val : parameters.chunkSize )
// chunk_dims.push_back(static_cast< hsize_t >(val));

Expand Down Expand Up @@ -1891,7 +1913,7 @@ HDF5IOHandler::flush()
return m_impl->flush();
}
#else
HDF5IOHandler::HDF5IOHandler(std::string path, Access at)
HDF5IOHandler::HDF5IOHandler(std::string path, Access at, nlohmann::json /* config */)
: AbstractIOHandler(std::move(path), at)
{
throw std::runtime_error("openPMD-api built without HDF5 support");
Expand Down
6 changes: 4 additions & 2 deletions src/IO/HDF5/ParallelHDF5IOHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -102,14 +102,16 @@ ParallelHDF5IOHandlerImpl::~ParallelHDF5IOHandlerImpl()
# if openPMD_HAVE_MPI
ParallelHDF5IOHandler::ParallelHDF5IOHandler(std::string path,
Access at,
MPI_Comm comm)
MPI_Comm comm,
nlohmann::json /* config */)
: AbstractIOHandler(std::move(path), at, comm)
{
throw std::runtime_error("openPMD-api built without HDF5 support");
}
# else
ParallelHDF5IOHandler::ParallelHDF5IOHandler(std::string path,
Access at)
Access at,
nlohmann::json /* config */)
: AbstractIOHandler(std::move(path), at)
{
throw std::runtime_error("openPMD-api built without parallel support and without HDF5 support");
Expand Down

0 comments on commit 5610b2d

Please sign in to comment.