Skip to content

Commit

Permalink
HDF5: Empiric for Optimal Chunk Size (#916)
Browse files Browse the repository at this point in the history
* HDF5: Empiric for Optimal Chunk Size

This ports a prior empirical algorithm from libSplash to determine
an optimal (large) chunk size for an HDF5 dataset based on its
datatype and global extent.

Original implementation by Felix Schmitt @f-schmitt (ZIH, TU Dresden)
in
[libSplash](https://github.com/ComputationalRadiationPhysics/libSplash).

Original source:
- https://github.com/ComputationalRadiationPhysics/libSplash/blob/v1.7.0/src/DCDataSet.cpp
- https://github.com/ComputationalRadiationPhysics/libSplash/blob/v1.7.0/src/include/splash/core/DCHelper.hpp

Co-authored-by: Felix Schmitt <[email protected]>

* Add scaffolding for JSON options in HDF5

* HDF5: Finish Chunking JSON/Env control

* HiPACE (legacy) pipeline: no chunking

The parallel, independent I/O pattern here is corner-case for what
HDF5 can support, due to non-collective declarations of data sets.
Testing shows that it does not work with chunking.

* CI: no HDF5 Chunking with Sanitizer

Runs into timeout for unclear reasons with this patch:
```
15/32 Test #15: MPI.8_benchmark_parallel ...............***Timeout 1500.17 sec
```

* Apply suggestions from code review

Co-authored-by: Franz Pöschel <[email protected]>

Co-authored-by: Felix Schmitt <[email protected]>
Co-authored-by: Franz Pöschel <[email protected]>
  • Loading branch information
3 people authored Jun 24, 2021
1 parent baaf349 commit 8c2d9ce
Show file tree
Hide file tree
Showing 14 changed files with 268 additions and 36 deletions.
13 changes: 10 additions & 3 deletions .github/workflows/unix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,16 +43,23 @@ jobs:
python3 -m pip install -U numpy
sudo .github/workflows/dependencies/install_spack
- name: Build
env: {CC: mpicc, CXX: mpic++, OMPI_CC: clang-10, OMPI_CXX: clang++-10, CXXFLAGS: -Werror -Wno-deprecated-declarations}
env: {CC: mpicc, CXX: mpic++, OMPI_CC: clang-10, OMPI_CXX: clang++-10, CXXFLAGS: -Werror -Wno-deprecated-declarations, OPENPMD_HDF5_CHUNKS: none}
run: |
eval $(spack env activate --sh .github/ci/spack-envs/clangtidy_nopy_ompi_h5_ad1_ad2/)
spack install
SOURCEPATH="$(pwd)"
mkdir build && cd build
../share/openPMD/download_samples.sh && chmod u-w samples/git-sample/*.h5
export LDFLAGS="${LDFLAGS} -fsanitize=address,undefined -shared-libsan"
CXXFLAGS="${CXXFLAGS} -fsanitize=address,undefined -shared-libsan"
CXXFLAGS="${CXXFLAGS}" cmake -S .. -B . -DopenPMD_USE_MPI=ON -DopenPMD_USE_PYTHON=ON -DopenPMD_USE_HDF5=ON -DopenPMD_USE_ADIOS2=ON -DopenPMD_USE_ADIOS1=ON -DopenPMD_USE_INVASIVE_TESTS=ON -DCMAKE_VERBOSE_MAKEFILE=ON
export CXXFLAGS="${CXXFLAGS} -fsanitize=address,undefined -shared-libsan"
cmake -S .. -B . \
-DopenPMD_USE_MPI=ON \
-DopenPMD_USE_PYTHON=ON \
-DopenPMD_USE_HDF5=ON \
-DopenPMD_USE_ADIOS2=ON \
-DopenPMD_USE_ADIOS1=ON \
-DopenPMD_USE_INVASIVE_TESTS=ON \
-DCMAKE_VERBOSE_MAKEFILE=ON
cmake --build . --parallel 2
export ASAN_OPTIONS=detect_stack_use_after_return=1:detect_leaks=1:check_initialization_order=true:strict_init_order=true:detect_stack_use_after_scope=1:fast_unwind_on_malloc=0
export LSAN_OPTIONS=suppressions="$SOURCEPATH/.github/ci/sanitizer/clang/Leak.supp"
Expand Down
4 changes: 4 additions & 0 deletions docs/source/backends/hdf5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ environment variable default description
===================================== ========= ====================================================================================
``OPENPMD_HDF5_INDEPENDENT`` ``ON`` Sets the MPI-parallel transfer mode to collective (``OFF``) or independent (``ON``).
``OPENPMD_HDF5_ALIGNMENT`` ``1`` Tuning parameter for parallel I/O, choose an alignment which is a multiple of the disk block size.
``OPENPMD_HDF5_CHUNKS`` ``auto`` Defaults for ``H5Pset_chunk``: ``"auto"`` (heuristic) or ``"none"`` (no chunking).
``H5_COLL_API_SANITY_CHECK`` unset Set to ``1`` to perform an ``MPI_Barrier`` inside each meta-data operation.
===================================== ========= ====================================================================================

Expand All @@ -40,6 +41,9 @@ According to the `HDF5 documentation <https://support.hdfgroup.org/HDF5/doc/RM/H
*For MPI IO and other parallel systems, choose an alignment which is a multiple of the disk block size.*
On Lustre filesystems, according to the `NERSC documentation <https://www.nersc.gov/users/training/online-tutorials/introduction-to-scientific-i-o/?start=5>`_, it is advised to set this to the Lustre stripe size. In addition, ORNL Summit GPFS users are recommended to set the alignment value to 16777216(16MB).

``OPENPMD_HDF5_CHUNKS`` This sets defaults for data chunking via `H5Pset_chunk <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_chunk.htm>`__.
Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.

``H5_COLL_API_SANITY_CHECK``: this is a HDF5 control option for debugging parallel I/O logic (API calls).
Debugging a parallel program with that option enabled can help to spot bugs such as collective MPI-calls that are not called by all participating MPI ranks.
Do not use in production, this will slow parallel I/O operations down.
Expand Down
17 changes: 17 additions & 0 deletions docs/source/details/backendconfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,23 @@ Explanation of the single keys:
Any setting specified under ``adios2.dataset`` is applicable globally as well as on a per-dataset level.
Any setting under ``adios2.engine`` is applicable globally only.

HDF5
^^^^

A full configuration of the HDF5 backend:

.. literalinclude:: hdf5.json
:language: json

All keys found under ``hdf5.dataset`` are applicable globally (future: as well as per dataset).
Explanation of the single keys:

* ``adios2.dataset.chunks``: This key contains options for data chunking via `H5Pset_chunk <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_chunk.htm>`__.
The default is ``"auto"`` for a heuristic.
``"none"`` can be used to disable chunking.
Chunking generally improves performance and only needs to be disabled in corner-cases, e.g. when heavily relying on independent, parallel I/O that non-collectively declares data records.


Other backends
^^^^^^^^^^^^^^

Expand Down
7 changes: 7 additions & 0 deletions docs/source/details/json.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"hdf5": {
"dataset": {
"chunks": "auto"
}
}
}
20 changes: 17 additions & 3 deletions include/openPMD/IO/HDF5/HDF5Auxiliary.hpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/* Copyright 2017-2021 Fabian Koller
/* Copyright 2017-2021 Fabian Koller, Felix Schmitt, Axel Huebl
*
* This file is part of openPMD-api.
*
Expand Down Expand Up @@ -34,7 +34,6 @@

namespace openPMD
{
#if openPMD_HAVE_HDF5
struct GetH5DataType
{
std::unordered_map< std::string, hid_t > m_userTypes;
Expand All @@ -54,5 +53,20 @@ namespace openPMD
std::string
concrete_h5_file_position(Writable* w);

#endif
/** Computes the chunk dimensions for a dataset.
*
* Chunk dimensions are selected to create chunks sizes between
* 64KByte and 4MB. Smaller chunk sizes are inefficient due to overhead,
* larger chunks do not map well to file system blocks and striding.
*
* Chunk dimensions are less or equal to dataset dimensions and do
* not need to be a factor of the respective dataset dimension.
*
* @param[in] dims dimensions of dataset to get chunk dims for
* @param[in] typeSize size of each element in bytes
* @return array for resulting chunk dimensions
*/
std::vector< hsize_t >
getOptimalChunkDims( std::vector< hsize_t > const dims,
size_t const typeSize );
} // namespace openPMD
4 changes: 3 additions & 1 deletion include/openPMD/IO/HDF5/HDF5IOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@

#include "openPMD/IO/AbstractIOHandler.hpp"

#include <nlohmann/json.hpp>

#include <future>
#include <memory>
#include <string>
Expand All @@ -34,7 +36,7 @@ class HDF5IOHandlerImpl;
class HDF5IOHandler : public AbstractIOHandler
{
public:
HDF5IOHandler(std::string path, Access);
HDF5IOHandler(std::string path, Access, nlohmann::json config);
~HDF5IOHandler() override;

std::string backendName() const override { return "HDF5"; }
Expand Down
5 changes: 4 additions & 1 deletion include/openPMD/IO/HDF5/HDF5IOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
#if openPMD_HAVE_HDF5
# include "openPMD/IO/AbstractIOHandlerImpl.hpp"

# include "openPMD/auxiliary/JSON.hpp"
# include "openPMD/auxiliary/Option.hpp"

# include <hdf5.h>
Expand All @@ -38,7 +39,7 @@ namespace openPMD
class HDF5IOHandlerImpl : public AbstractIOHandlerImpl
{
public:
HDF5IOHandlerImpl(AbstractIOHandler*);
HDF5IOHandlerImpl(AbstractIOHandler*, nlohmann::json config);
~HDF5IOHandlerImpl() override;

void createFile(Writable*, Parameter< Operation::CREATE_FILE > const&) override;
Expand Down Expand Up @@ -77,6 +78,8 @@ namespace openPMD
hid_t m_H5T_CLONG_DOUBLE;

private:
auxiliary::TracingJSON m_config;
std::string m_chunks = "auto";
struct File
{
std::string name;
Expand Down
7 changes: 5 additions & 2 deletions include/openPMD/IO/HDF5/ParallelHDF5IOHandler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@
#include "openPMD/config.hpp"
#include "openPMD/IO/AbstractIOHandler.hpp"

#include <nlohmann/json.hpp>

#include <future>
#include <memory>
#include <string>
Expand All @@ -36,9 +38,10 @@ namespace openPMD
{
public:
#if openPMD_HAVE_MPI
ParallelHDF5IOHandler(std::string path, Access, MPI_Comm);
ParallelHDF5IOHandler(
std::string path, Access, MPI_Comm, nlohmann::json config);
#else
ParallelHDF5IOHandler(std::string path, Access);
ParallelHDF5IOHandler(std::string path, Access, nlohmann::json config);
#endif
~ParallelHDF5IOHandler() override;

Expand Down
4 changes: 3 additions & 1 deletion include/openPMD/IO/HDF5/ParallelHDF5IOHandlerImpl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
# include <mpi.h>
# if openPMD_HAVE_HDF5
# include "openPMD/IO/HDF5/HDF5IOHandlerImpl.hpp"
# include <nlohmann/json.hpp>
# endif
#endif

Expand All @@ -37,7 +38,8 @@ namespace openPMD
class ParallelHDF5IOHandlerImpl : public HDF5IOHandlerImpl
{
public:
ParallelHDF5IOHandlerImpl(AbstractIOHandler*, MPI_Comm);
ParallelHDF5IOHandlerImpl(
AbstractIOHandler*, MPI_Comm, nlohmann::json config);
~ParallelHDF5IOHandlerImpl() override;

MPI_Comm m_mpiComm;
Expand Down
6 changes: 4 additions & 2 deletions src/IO/AbstractIOHandlerHelper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@ namespace openPMD
switch( format )
{
case Format::HDF5:
return std::make_shared< ParallelHDF5IOHandler >( path, access, comm );
return std::make_shared< ParallelHDF5IOHandler >(
path, access, comm, std::move( options ) );
case Format::ADIOS1:
# if openPMD_HAVE_ADIOS1
return std::make_shared< ParallelADIOS1IOHandler >( path, access, comm );
Expand Down Expand Up @@ -80,7 +81,8 @@ namespace openPMD
switch( format )
{
case Format::HDF5:
return std::make_shared< HDF5IOHandler >( path, access );
return std::make_shared< HDF5IOHandler >(
path, access, std::move( options ) );
case Format::ADIOS1:
#if openPMD_HAVE_ADIOS1
return std::make_shared< ADIOS1IOHandler >( path, access );
Expand Down
92 changes: 91 additions & 1 deletion src/IO/HDF5/HDF5Auxiliary.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
/* Copyright 2017-2021 Fabian Koller, Axel Huebl
/* Copyright 2017-2021 Fabian Koller, Felix Schmitt, Axel Huebl
*
* This file is part of openPMD-api.
*
Expand Down Expand Up @@ -30,10 +30,12 @@

# include <array>
# include <complex>
# include <map>
# include <stack>
# include <stdexcept>
# include <string>
# include <typeinfo>
# include <vector>

# if openPMD_USE_VERIFY
# define VERIFY(CONDITION, TEXT) { if(!(CONDITION)) throw std::runtime_error((TEXT)); }
Expand Down Expand Up @@ -306,4 +308,92 @@ openPMD::concrete_h5_file_position(Writable* w)
return auxiliary::replace_all(pos, "//", "/");
}


std::vector< hsize_t >
openPMD::getOptimalChunkDims( std::vector< hsize_t > const dims,
size_t const typeSize )
{
auto const ndims = dims.size();
std::vector< hsize_t > chunk_dims( dims.size() );

// chunk sizes in KiByte
constexpr std::array< size_t, 7u > CHUNK_SIZES_KiB
{{4096u, 2048u, 1024u, 512u, 256u, 128u, 64u}};

size_t total_data_size = typeSize;
size_t max_chunk_size = typeSize;
size_t target_chunk_size = 0u;

// compute the order of dimensions (descending)
// large dataset dimensions should have larger chunk sizes
std::multimap<hsize_t, uint32_t> dims_order;
for (uint32_t i = 0; i < ndims; ++i)
dims_order.insert(std::make_pair(dims[i], i));

for (uint32_t i = 0; i < ndims; ++i)
{
// initial number of chunks per dimension
chunk_dims[i] = 1;

// try to make at least two chunks for each dimension
size_t half_dim = dims[i] / 2;

// compute sizes
max_chunk_size *= (half_dim > 0) ? half_dim : 1;
total_data_size *= dims[i];
}

// compute the target chunk size
for( auto const & chunk_size : CHUNK_SIZES_KiB )
{
target_chunk_size = chunk_size * 1024;
if (target_chunk_size <= max_chunk_size)
break;
}

size_t current_chunk_size = typeSize;
size_t last_chunk_diff = target_chunk_size;
std::multimap<hsize_t, uint32_t>::const_iterator current_index =
dims_order.begin();

while (current_chunk_size < target_chunk_size)
{
// test if increasing chunk size optimizes towards target chunk size
size_t chunk_diff = target_chunk_size - (current_chunk_size * 2u);
if (chunk_diff >= last_chunk_diff)
break;

// find next dimension to increase chunk size for
int can_increase_dim = 0;
for (uint32_t d = 0; d < ndims; ++d)
{
int current_dim = current_index->second;

// increasing chunk size possible
if (chunk_dims[current_dim] * 2 <= dims[current_dim])
{
chunk_dims[current_dim] *= 2;
current_chunk_size *= 2;
can_increase_dim = 1;
}

current_index++;
if (current_index == dims_order.end())
current_index = dims_order.begin();

if (can_increase_dim)
break;
}

// can not increase chunk size in any dimension
// we must use the current chunk sizes
if (!can_increase_dim)
break;

last_chunk_diff = chunk_diff;
}

return chunk_dims;
}

#endif
Loading

0 comments on commit 8c2d9ce

Please sign in to comment.