Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sparse matrix usage to example #13

Merged
merged 8 commits into from
Jan 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 20 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ The distributed-ranges library provides data-structures, algorithms and views de
Algorithms and data structures are designed to take the user off the need to worry about the technical details of their parallelism. An example would be the definition of a distributed vector in memory of multiple nodes connected using MPI.

```cpp
dr::mhp::distributed_vector<double> dv(N);
dr::mp::distributed_vector<double> dv(N);
```

Such a vector, containing N elements, is automatically distributed among all the nodes involved in the calculation, with individual nodes storing an equal (if possible) amount of data.
Expand All @@ -82,12 +82,12 @@ In this way, many of the technical details related to the parallel execution of
### Namespaces

General namespace used in the library is `dr::`
For program using a single node with shared memory available for multiple CPUs and one or more GPUs, data structures and algorithms from `dr::shp::` namespace are provided.
For distributed memory model, use the `dr::mhp::` namespace.
For program using a single node with shared memory available for multiple CPUs and one or more GPUs, data structures and algorithms from `dr::sp::` namespace are provided.
For distributed memory model, use the `dr::mp::` namespace.

### Data structures

Content of distributes-ranges' data structures is distributed over available nodes. For example, segments of `dr::mhp::distributed_vector` are located in memory of different nodes (mpi processes). Still, global view of the `distributed_vector` is uniform, with contiguous indices.
Content of distributes-ranges' data structures is distributed over available nodes. For example, segments of `dr::mp::distributed_vector` are located in memory of different nodes (mpi processes). Still, global view of the `distributed_vector` is uniform, with contiguous indices.
<!-- TODO: some pictures here -->

#### Halo concept
Expand All @@ -98,7 +98,7 @@ To support this situation, the concept of halo was introduced. A halo is an area

### Algorithms

Following algorithms are included in distributed-ranges, both in mhp and shp versions:
Following algorithms are included in distributed-ranges, both in mp and sp versions:

```cpp
copy()
Expand Down Expand Up @@ -151,16 +151,28 @@ The example shows the distributed nature of dr data structures. The distributed_

[./src/example4.cpp](src/example4.cpp)

This example illustrates adding two distributed,multidimensional arrays. Each array has two dimensions and is initialized by an `std::array`. The arrays are populated with sequential values using a distributed version of iota called `mhp::iota`. A for_each loop is the main part of the code, computing the sum on a specified number of nodes. It takes a lambda copy function along with two input arrays (a and b) and an output array (c) as parameters. The result is printed on a node 0.
This example illustrates adding two distributed,multidimensional arrays. Each array has two dimensions and is initialized by an `std::array`. The arrays are populated with sequential values using a distributed version of iota called `mp::iota`. A for_each loop is the main part of the code, computing the sum on a specified number of nodes. It takes a lambda copy function along with two input arrays (a and b) and an output array (c) as parameters. The result is printed on a node 0.

### Example 5

[./src/example5.cpp](src/example5.cpp)

Example 5 outlines a method for calculating a 2D 5-point stencil with distributed multidimensional arrays, specifically utilizing `dr::mhp::distributed_mdarray`. Initially, it involves setting up key parameters like the radius for element exchange between nodes through `dr::mhp::halo`, and defining the start and end points of the array slice. The example's core is the `mhp::stencil_for_each` function, which applies a lambda function to two subsets of the array, designated as input and output. The `mdspan_stencil_op` lambda function conducts a simple calculation that involves adding together the values of an element and its adjacent elements and subsequently calculating their average. The `mhp::halo().exchange()` enables values to be shared across distinct nodes, making this process feasible. Ultimately, the outcomes of the calculation are neatly displayed on node 0 using mdspan(), resulting in a clear indication of the modifications made to the 2D array. This example is a practical demonstration of executing stencil operations on distributed arrays.
Example 5 outlines a method for calculating a 2D 5-point stencil with distributed multidimensional arrays, specifically utilizing `dr::mp::distributed_mdarray`. Initially, it involves setting up key parameters like the radius for element exchange between nodes through `dr::mp::halo`, and defining the start and end points of the array slice. The example's core is the `mp::stencil_for_each` function, which applies a lambda function to two subsets of the array, designated as input and output. The `mdspan_stencil_op` lambda function conducts a simple calculation that involves adding together the values of an element and its adjacent elements and subsequently calculating their average. The `mp::halo().exchange()` enables values to be shared across distinct nodes, making this process feasible. Ultimately, the outcomes of the calculation are neatly displayed on node 0 using mdspan(), resulting in a clear indication of the modifications made to the 2D array. This example is a practical demonstration of executing stencil operations on distributed arrays.

### Example 6

[./src/example6.cpp](src/example6.cpp)

This example's code demonstrates a 2D pattern search in a distributed, multidimensional array (`mhp::distributed_mdarray<float, 2>`). It initializes a 2D array, populates it with `mhp::iota`, converts it to binary values using `mhp::transform` and defines a pattern of 2x2. A lambda function is used to scan the array and mark occurrences of the pattern in a separate array. The process is similar to the one demonstrated in example5.
This example's code demonstrates a 2D pattern search in a distributed, multidimensional array (`mp::distributed_mdarray<float, 2>`). It initializes a 2D array, populates it with `mp::iota`, converts it to binary values using `mp::transform` and defines a pattern of 2x2. A lambda function is used to scan the array and mark occurrences of the pattern in a separate array. The process is similar to the one demonstrated in example5.

### Example 7

[./src/example7.cpp](src/example7.cpp)

This example showcases usage of `mp::distributed_sparse_matrix`. It retrieves data from `resources/example.mtx` file in root node, and distributes it between all nodes. The root node initializes vector and broadcasts it to every other node. After that, the `mp::gemv` operation is performed and result is returned to `std::vector<double>` in the root. Finally, the root prints the multiplied vector and the result.

### Example 8

[./src/example8.cpp](src/example8.cpp)

The example 8 is exactly the same as example 7, the only thing that is different is the initialization of the matrix data. Here the matrix is generated inside the code, has different shape and uses random values. Additionally, we print matrix data together with vector and result.
51 changes: 51 additions & 0 deletions resources/example.mtx
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
%%MatrixMarket matrix coordinate real general
25 25 49
1 1 1.000e+00
2 2 2.000e+00
3 3 3.000e+00
4 4 4.000e+00
5 5 5.000e+00
6 6 6.000e+00
7 7 7.000e+00
8 8 8.000e+00
9 9 9.000e+00
10 10 1.000e+01
11 11 2.000e+01
12 12 3.000e+01
13 13 4.000e+01
14 14 5.000e+01
15 15 6.000e+01
16 16 7.000e+01
17 17 8.000e+01
18 18 8.000e+01
19 19 9.000e+01
20 20 1.000e+02
21 21 2.000e+02
22 22 2.000e+02
23 23 3.000e+02
24 24 4.000e+02
25 25 5.000e+02
1 2 1.000e+00
2 3 2.000e+00
3 4 3.000e+00
4 5 4.000e+00
5 6 5.000e+00
6 7 6.000e+00
7 8 7.000e+00
8 9 8.000e+00
9 10 9.000e+00
10 11 1.000e+01
11 12 2.000e+01
12 13 3.000e+01
13 14 4.000e+01
14 15 5.000e+01
15 16 6.000e+01
16 17 7.000e+01
17 18 8.000e+01
18 19 9.000e+01
19 20 1.000e+01
20 21 2.000e+01
21 22 3.000e+01
22 23 4.000e+01
23 24 5.000e+01
24 25 6.000e+01
2 changes: 2 additions & 0 deletions scripts/build_run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,5 @@ mpirun -n 3 ./build/src/example3
mpirun -n 3 ./build/src/example4
mpirun -n 3 ./build/src/example5
mpirun -n 3 ./build/src/example6
mpirun -n 3 ./build/src/example7
mpirun -n 3 ./build/src/example8
10 changes: 10 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,13 @@ add_executable(example6 example6.cpp)

target_compile_definitions(example6 INTERFACE DR_FORMAT)
target_link_libraries(example6 DR::mpi fmt::fmt)

add_executable(example7 example7.cpp)

target_compile_definitions(example7 INTERFACE DR_FORMAT)
target_link_libraries(example7 DR::mpi fmt::fmt)

add_executable(example8 example8.cpp)

target_compile_definitions(example8 INTERFACE DR_FORMAT)
target_link_libraries(example8 DR::mpi fmt::fmt)
68 changes: 68 additions & 0 deletions src/example7.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
// SPDX-FileCopyrightText: Intel Corporation
//
// SPDX-License-Identifier: BSD-3-Clause

#include <dr/mp.hpp>
#include <fmt/core.h>
#include <ranges>

/* Sparse band matrix vector multiplication */
int main() {
dr::mp::init(sycl::default_selector_v);
using I = long;
using V = double;

dr::views::csr_matrix_view<V, I> local_data;
auto root = 0;
if (root == dr::mp::rank()) {
// x x 0 0 ... 0
// 0 x x 0 ... 0
// .............
// 0 ... 0 0 x x
auto source = "./resources/example.mtx";
local_data = dr::read_csr<double, long>(source);
}

dr::mp::distributed_sparse_matrix<
V, I, dr::mp::MpiBackend,
dr::mp::csr_eq_distribution<V, I, dr::mp::MpiBackend>>
matrix(local_data, root);

dr::mp::broadcasted_vector<double> broadcasted_b;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In lines 26-31 all ranks already computed their vector B

What is the step with "broadcasted_vector" for?

std::vector<double> b;
if (root == dr::mp::rank()) {
b.resize(matrix.shape().second);
std::iota(b.begin(), b.end(), 1);

broadcasted_b.broadcast_data(matrix.shape().second, 0, b,
dr::mp::default_comm());
} else {
broadcasted_b.broadcast_data(matrix.shape().second, 0,
std::ranges::empty_view<V>(),
dr::mp::default_comm());
}
std::vector<double> res(matrix.shape().first);
gemv(root, res, matrix, broadcasted_b);

if (root == dr::mp::rank()) {
fmt::print("Matrix imported from {}\n", "./resources/example.mtx");
fmt::print("Input: ");
for (auto x : b) {
fmt::print("{} ", x);
}
fmt::print("\n");
fmt::print("Matrix vector multiplication res: ");
for (auto x : res) {
fmt::print("{} ", x);
}
fmt::print("\n");
}

if (root == dr::mp::default_comm().rank()) {
dr::__detail::destroy_csr_matrix_view(local_data, std::allocator<V>{});
}

dr::mp::finalize();

return 0;
}
93 changes: 93 additions & 0 deletions src/example8.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
// SPDX-FileCopyrightText: Intel Corporation
//
// SPDX-License-Identifier: BSD-3-Clause

#include <dr/mp.hpp>
#include <fmt/core.h>
#include <random>
#include <ranges>

/* Sparse band matrix vector multiplication */
int main() {
dr::mp::init(sycl::default_selector_v);
using I = long;
using V = double;
dr::views::csr_matrix_view<V, I> local_data;
auto root = 0;
if (root == dr::mp::rank()) {
auto size = 10;
auto nnz = 20;
auto colInd = new I[nnz];
auto rowInd = new I[size + 1];
auto values = new V[nnz];
std::uniform_real_distribution<double> unif(0, 1);
std::default_random_engine re;
// x x 0 0 ... 0
// x x 0 0 ... 0
// x 0 x 0 ... 0
// x 0 0 x ... 0
// .............
// x ... 0 0 0 x
for (auto i = 0; i <= size; i++) {
rowInd[i] = i * 2; // two elements per row
}
for (auto i = 0; i < nnz; i++) {
colInd[i] =
(i % 2) * (std::max(i / 2, 1)); // column on 0 and diagonal (with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we draw that matrix in a comment?
something like this (not sure if this is your matrix shape, just an example)

// X X X ...     X
// 0 X 0 ...     0
// 0 0 X 0 ... 0
// 0 0 ...       X

// additional entry in first row)
values[i] = unif(re);
}

local_data = dr::views::csr_matrix_view<V, I>(values, rowInd, colInd,
{size, size}, nnz, root);
}

dr::mp::distributed_sparse_matrix<
V, I, dr::mp::MpiBackend,
dr::mp::csr_eq_distribution<V, I, dr::mp::MpiBackend>>
matrix(local_data, root);

dr::mp::broadcasted_vector<double> broadcasted_b;
std::vector<double> b;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comments an in the first example in this code

if (root == dr::mp::rank()) {
b.resize(matrix.shape().second);
std::iota(b.begin(), b.end(), 1);

broadcasted_b.broadcast_data(matrix.shape().second, 0, b,
dr::mp::default_comm());
} else {
broadcasted_b.broadcast_data(matrix.shape().second, 0,
std::ranges::empty_view<V>(),
dr::mp::default_comm());
}

std::vector<double> res(matrix.shape().first);
gemv(root, res, matrix, broadcasted_b);

if (root == dr::mp::rank()) {
fmt::print("Matrix with {} x {} and number of non-zero entries equal to {} "
"and entries:\n",
matrix.shape().first, matrix.shape().second, matrix.size());
for (auto [i, v] : matrix) {
auto [n, m] = i;
fmt::print("Matrix entry <{}, {}, {}>\n", n, m, v);
}
fmt::print("Input: ");
for (auto x : b) {
fmt::print("{} ", x);
}
fmt::print("\n");
fmt::print("Matrix vector multiplication res: ");
for (auto x : res) {
fmt::print("{} ", x);
}
fmt::print("\n");
}

if (root == dr::mp::default_comm().rank()) {
dr::__detail::destroy_csr_matrix_view(local_data, std::allocator<double>{});
}
dr::mp::finalize();

return 0;
}
Loading