diff --git a/SUMMARY.md b/SUMMARY.md index 06f76de47..412b7fd12 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -20,6 +20,8 @@ * [Multiplication as a Convolution](contents/convolutions/multiplication/multiplication.md) * [Convolutions of Images (2D)](contents/convolutions/2d/2d.md) * [Convolutional Theorem](contents/convolutions/convolutional_theorem/convolutional_theorem.md) + * [Box Muller Transform](contents/box_muller/box_muller.md) + * [How costly is rejection sampling?](contents/box_muller/box_muller_rejection.md) * [Probability Distributions](contents/probability_distributions/distributions.md) * [Tree Traversal](contents/tree_traversal/tree_traversal.md) * [Euclidean Algorithm](contents/euclidean_algorithm/euclidean_algorithm.md) diff --git a/contents/box_muller/box_muller.md b/contents/box_muller/box_muller.md new file mode 100644 index 000000000..2c7ce34da --- /dev/null +++ b/contents/box_muller/box_muller.md @@ -0,0 +1,240 @@ +# The Box—Muller Transform + +The Box—Muller transform holds a special place in my heart as it was the first method I ever had to implement for my own research. + +The purpose of this transformation is simple. +It takes a uniform (probably random) distribution and turns it into a Gaussian one. + +

+ +

+ +That's it. + +It was originally developed by George Box (yes, Box is his last name) and Mervin Muller in 1958 and is one of the most common methods to create a random, Gaussian distribution of points {{ "box1958" | cite }}. +It's particularly useful when initializing a set of particles for a physical, N-body simulation. +This chapter will be divided into a few subsections: + +1. How to initialize the Box—Muller transform +2. How to use the Box—Muller transform in Cartesian coordinates {{ "box_muller_wiki" | cite }}. +3. How to use the Box—Muller transform in Polar Coordinates, also called the Marsaglia transform {{ "marsaglia_wiki" | cite }}. + +Of course, there will be full code examples at the bottom. +So, let's get to it! + +## How to initialize the Box—Muller transform + +The main thing to mention here is that the Box—Muller transform requires some form of uniform distribution as its input. +One obvious way to initialize a random distribution of points is to start with a grid, like so: + +{% method %} +{% sample lang="jl" %} +[import:3-32, lang:"julia"](code/julia/box_muller.jl) +{% endmethod %} + +This will create the following set of points for $$n=100$$: + +

+ +

+ +To be honest, there are a bunch of ways to generate this exact same distribution. +Here, we simply walked backwards half of the grid size, determined the step size, and then placed a particle at each step. +Note that there is an inherent limitation with this method in that it only works for a square numbers. +Because of this, we decided to round $$n$$ up to the nearest square to make a nice grid. +It's not the cleanest implementation, but the grid will mainly be used for debugging anyway, so it's OK to be a *little* messy here. + +The real star of the show is the uniform random distribution, which can be generated like this: + +{% method %} +{% sample lang="jl" %} +[import:34-37, lang:"julia"](code/julia/box_muller.jl) +{% endmethod %} + +This will create the following set of points for $$n=100$$: + +

+ +

+ +OK, but how do we know this is uniform? +Good question! + +The easiest way is to plot a histogram of a super large number of points. +If the random distribution is uniform, then all the bins should be roughly the same value. +The more points we have, the smaller the percent difference between the bins will be. +Here is a set of images generated by `rand()` for $$n=100$$, $$1,000$$, and $$10,000$$ all in one dimension: + + +| $$100$$ | $$1,000$$ | $$10,000$$ | +|---------|-----------|------------| +|![100 points](res/rand100.png)|![1000 points](res/rand1000.png)|![10000 points](res/rand10000.png)| + +It is clear that the $$10,000$$ case looks the most uniform. +Note that for two dimensions, the same logic applies, but we need to create separate histograms for the $$x$$ and $$y$$ coordinates. + +Once this test is complete, we can be fairly sure that the function we are using to generate the initial distribution is uniform and ready for the next step of the process: actually using the Box—Muller transform! + +## How to use the Box—Muller transform in Cartesian coordinates + +The two dimensional Cartesian version of the Box—Muller transform starts with two random input values ($$u_1$$ and $$u_2$$), both of which come from their own uniform distributions that are between $$0$$ and $$1$$. +It then creates two output points ($$z_1$$ and $$z_2$$). +For this, $$u_1$$ is used to create a Gaussian distribution along some radial value $$r$$, and $$u_2$$ is used to spin that around a circle with some angular component $$\theta$$, such that + +$$ +\begin{align} +r &= \sqrt{-2\ln(u_1)} \\ +\theta &= 2\pi u_2. +\end{align} +$$ + +Looking at these equations, $$\theta$$ seems to make a decent amount of sense. +After all, angles typically vary from $$0 \rightarrow 2\pi$$, and our input distribution varies from $$0 \rightarrow 1$$, so we can get some value between $$0$$ and $$2\pi$$ by multiplying $$2\pi$$ by one of our input values. + +So what about $$r$$? +Well, remember that if we want $$u$$ to be in a Gaussian form, then we might say something like, $$u = e^{-\frac{r^2}{2}}$$, so if we solve this for $$r$$, we get $$r=\sqrt{-2\ln(u)}$$. + +From these values, we can calculate our new $$x,y$$ points as, + +$$ +\begin{align} +x &= r\cos(\theta) \\ +y &= r\sin(\theta). +\end{align} +$$ + +Finally, in order to specify the size and shape of the generated Gaussian distribution, we can use the standard deviation, $$\sigma$$, and the mean, $$\mu$$, like so: + +$$ +\begin{align} +z_1 &= x\sigma + \mu \\ +z_2 &= y\sigma + \mu. +\end{align} +$$ + +In general, this can be written in code like so: + +{% method %} +{% sample lang="jl" %} +[import:41-49, lang:"julia"](code/julia/box_muller.jl) +{% endmethod %} + +Which produces the following output + +

+ + +

+ +Note that we have written the code to work on a single set of input values, but it could also be written to read in the entire distribution of points all at once. +As this particular technique is usually implemented in parallel, it's up to you to decided which is the fastest for your own individual use-case. + +At this stage, we have a good idea of how the transform works, but some people shy away from the Cartesian method in practice and instead opt for the polar form, which will be discussed next! + +## How to use the Box—Muller transform in polar coordinates + +The Cartesian form of the Box—Muller transform is relatively intuitive. +The polar method is essentially the same, but without the costly $$\sin$$ and $$\cos$$ operations. +In this case, we use the input values to create an initial radial point (to be scaled later): + +$$ +r_0 = \sqrt{u_1^2 + u_2^2}. +$$ + +This means that we are essentially trying to transform our set of $$u$$ values into a new input value $$r_0$$. +To do this, we need to start with a uniformly distributed *circle*, so we must reject any values for $$u_1$$ and $$u_2$$ where $$r$$ is either $$0$$ or $$\gt 1$$. +This also means that the initial distributions of $$u_1$$ and $$u_2$$ must range from $$-1 \rightarrow +1$$. + +From here, we can use basic trigonometric identities to redefine the $$\sin$$ and $$\cos$$ to be + +$$ +\begin{align} +\cos(\theta) &= u_1/\sqrt{r_0} \\ +\sin(\theta) &= u_2/\sqrt{r_0}. +\end{align} +$$ + +This changes the output equations to be + +$$ +\begin{align} +x &= r\cos(\theta) = \sqrt{-2\ln(r_0)}\left(\frac{u_1}{\sqrt{r_0}}\right) = u_1 \sqrt{\frac{-2\ln(r_0)}{r_0}} \\ +y &= r\sin(\theta) = \sqrt{-2\ln(r_0)}\left(\frac{u_2}{\sqrt{r_0}}\right) = u_2 \sqrt{\frac{-2\ln(r_0)}{r_0}}. +\end{align} +$$ + +Again, the final values are: + +$$ +\begin{align} +z_1 &= \sigma x + \mu \\ +z_2 &= \sigma y + \mu. +\end{align} +$$ + +In code, it might look like this: + +{% method %} +{% sample lang="jl" %} +[import:52-63, lang:"julia"](code/julia/box_muller.jl) +{% endmethod %} + +This will produce the following output: + +

+ + +

+ +Again, this is ultimately the same as the Cartesian method, except that it: +1. Rejects points in the initial distribution that are outside of the unit circle (also called rejection sampling) +2. Avoids costly $$\sin$$ and $$\cos$$ operations + +Point 2 means that the polar method *should be* way faster than the Cartesian one, but rejection sampling is somewhat interesting in it's own right, which we have discussed in a [separate chapter](box_muller_rejection.md) + +## Example Code + +The example code here is straightforward: we start with a uniform distribution of points (both on a grid and a uniform random distribution) and then we preform the Box—Muller transform to see how far off it is from the Gaussian we expect. + +{% method %} +{% sample lang="jl" %} +[import, lang:"julia"](code/julia/box_muller.jl) +{% endmethod %} + +### Bibliography + +{% references %} {% endreferences %} + + + +## License + +##### Code Examples + +The code examples are licensed under the MIT license (found in [LICENSE.md](https://github.com/algorithm-archivists/algorithm-archive/blob/main/LICENSE.md)). + +##### Text + +The text of this chapter was written by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). + +[

](https://creativecommons.org/licenses/by-sa/4.0/) + +#### Images/Graphics + +- The image "[IFS triangle 1](../IFS/res/IFS_triangle_1.png)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The image "[IFS square 3](../IFS/res/IFS_square_3.png)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The image "[Simple Barnsley fern](res/full_fern.png)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Affine random transform 0](res/affine_rnd_0.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Affine random transform 1](res/affine_rnd_1.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Affine random transform 2](res/affine_rnd_2.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Affine random transform 3](res/affine_rnd_3.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Affine fern transform 0](res/affine_fern_0.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Affine fern transform 1](res/affine_fern_1.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Affine fern transform 2](res/affine_fern_2.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Affine fern transform 3](res/affine_fern_3.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Fern twiddle 0](res/fern_twiddle_0.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Fern twiddle 1](res/fern_twiddle_1.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Fern twiddle 2](res/fern_twiddle_2.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). +- The video "[Fern twiddle 3](res/fern_twiddle_3.mp4)" was created by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). diff --git a/contents/box_muller/box_muller_rejection.md b/contents/box_muller/box_muller_rejection.md new file mode 100644 index 000000000..37e1c0e3d --- /dev/null +++ b/contents/box_muller/box_muller_rejection.md @@ -0,0 +1,126 @@ +# Just how costly is rejection sampling anyway? + +Let's imagine we want to have a final Gaussian distribution with $$n$$ particles in it. +With the Cartesian Box—Muller method, this is easy: start the initial distribution(s) with $$n$$ particles and then do the transform. +Things *can* be just as easy with the Polar Box—Muller method as well, so long as we start with a uniformly distributed *circle* instead of a uniformly distributed *square*. +That is to say, so long as we do the rejection sampling before-hand, the Polar Box—Muller method will always be more efficient. +To be fair, there are methods to generate a uniform distribution of points within a circle without rejection sampling, but let's assume that we require rejection sampling for this example + +This means that someone somehow needs to do the rejection sampling for the Polar method, which is sometimes a painful process. +This also means that the Box—Muller method can be used to teach some of the fundamentals of General-Purpose GPU computing. +Note that because of the specificity of this problem, all the code in this subsection will be in Julia and using the package KernelAbstractions.jl, which allows us to execute the same kernels on either CPU or GPU hardware depending on how we configure things. + +Let's first consider the case where we do the rejection sampling as a part of the polar Box—Muller kernel instead of as a pre-processing step. +In this case, we can imagine 2 separate ways of writing our kernel: +1. With replacement: In this case, we *absolutely require* the final number of points in our Gaussian distribution to be $$n$$, so if we find a point outside of the unit circle while running the kernel, we will "re-roll" again for a new point that *is* within the circle. +2. Without replacement: This means that we will start with a uniform distribution of $$n$$ points, but end with a Gaussian of $$m < n$$ points. In this case, if we find a point outside of the unit circle while running the kernel, we just ignore it by setting the output values to NaNs (or something similar). + +OK, so first with replacement: + +[import:70-84, lang:"julia"](code/julia/performance.jl) + +This is an awful idea for a number of reasons. +Here are a few: +1. If we find a point outside of the unit circle, we have to continually look for new points until we *do* find one inside of the circle. Because we are running this program in parallel, where each thread transforms one point at a time, some threads might take literally forever to find a new point (if we are really unlucky). +2. To generate new points, we need to re-generate a uniform distribution, but what if our uniform distribution is not random? What if it's a grid (or something similar) instead? In this case, we really shouldn't look for a new point on the inside of the circle as all those points have already been accounted for. +3. The `rand()` function is kinda tricky on some parallel platforms (like GPUs) and might not work out of the box. In fact, the implementation shown above can only be run on the CPU. + +OK, fine. +I don't think anyone expected a kernel with a `while` loop inside of it to be fast. +So what about a method without replacement? +Surely there is no problem if we just ignore the `while` loop altogether! +Well, the problem with this approach is a bit less straightforward, but first, code: + +[import:53-68, lang:"julia"](code/julia/performance.jl) + +To start discussing why a polar kernel without replacement is *also* a bad idea, let's go back to the [Monte Carlo chapter](../monte_carlo/monte_carlo.md), where we calculated the value of $$\pi$$ by embedding it into a circle. +There, we found that the probability of a randomly chosen point falling within the unit circle to be $$\frac{\pi r^2}{(2r)^2} = \frac{pi}{4} \sim 78.54\%$$, shown in the visual below: + +

+ +

+ +This means that a uniform distribution of points within a circle will reject $$\sim 21.46\%$$ of points on the square. +This also means that if we have a specific $$n$$ value we want for the final distribution, we will need $$\frac{1}{0.7853} \sim 1.273 \times$$ more input values on average! + +No problem! +In this hypothetical case, we don't need *exactly* $$n$$ points, so we can just start the initial distributions with $$1.273 \times n$$ points, right? + +Right. +That will work well on parallel CPU hardware, but on the GPU this will still have an issue. + +On the GPU, computation is all done in parallel, but there is a minimum unit of parallelism called a *warp*. +The warp is the smallest number of threads that can execute something in parallel and is usually about 32. +This means that if an operation is queued, all 32 threads will do it at the same time. +If 16 threads need to execute something and the other 16 threads need to execute something else, this will lead to *warp divergence* where 2 actions need to be performed instead of 1: + +

+ +

+ +In this image, every odd thread needs to perform the pink action, while the even threads need to perform the blue action. +This means that 2 separate parallel tasks will be performed, one for the even threads, another for the odd threads. +This means that if $$\ell$$ separate operations are queued, it could take $$\ell\times$$ as long for all the threads to do their work! +This is why `if` statements in a kernel can be dangerous! +If used improperly, they can cause certain threads in a warp to do different things! + +So let's imagine that the above image is part of a larger array of values, such that there are a bunch of warps with the same divergence issue. +In this case, we could sort the array before-hand so that all even elements come before all odd elements. +This would mean that the warps will almost certainly not diverge because the elements queued will all be of the same type and require the same operations. +Unfortunately, this comes at the cost of a sorting operation which is prohibitively expensive. + +If we look at the above kernel, we are essentially asking $$21.47\%$$ of our threads to do something different than everyone else, and because we are usually inputting a uniform random distribution, this means that *most* warps will have to queue up 2 parallel actions instead of 1. + +Essentially, we need to pick our poison: +* Slow $$\sin$$ and $$\cos$$ operations with the Cartesian method +* Warp divergence with the Polar method + +The only way to know which is better is to perform benchmarks, which we will show in a bit, but there is one final scenario we should consider: what about doing the rejection sampling as a pre-processing step? +This would mean that we pre-initialize the polar kernel with a uniform distribution of points in the unit circle. +This means no warp divergence, so we can get the best of both worlds, right? + +Well, not exactly. +The polar Box—Muller method will definitely be faster, but again: someone somewhere needed to do rejection sampling and if we include that step into the process, things become complicated again. +The truth is that this pre-processing step is difficult to get right, so it might require a chapter in it's own right. + +In many cases, it's worth spending a little time before-hand to make sure subsequent operations are fast, but in this case, we only have a single operation, not a set of operations. +The Box—Muller method will usually only be used once at the start of the simulation, which means that the pre-processing step of rejection sampling might end up being overkill. + +No matter the case, benchmarks will show the true nature of what we are dealing with here: + +| Method | CPU | GPU | +| ------------------------- | ---------------------- | ---------------------- | +| Cartesian | $$385.819 \pm 1.9$$ms | $$19.347 \pm 0.618$$ms | +| Polar without replacement | $$273.308 \pm 2.81$$ms | $$26.712 \pm 0.592$$ms | +| Polar with replacement | $$433.644 \pm 2.64$$ms | NA | + +These were run with an Nvidia GTX 970 GPU and a Ryzen 3700X 16 core CPU. +For those interested, the code can be found below. +For these benchmarks, we used Julia's inbuilt benchmarking suite from `BenchmarkTools`, making sure to sync the GPU kernels with `CUDA.@sync`. +We also ran with $$4096^2$$ input points. + +Here, we see an interesting divergence in the results. +On the CPU, the polar method is *always* faster, but on the GPU, both methods are comparable. +I believe this is the most important lesson to be learned from the Box—Muller method: sometimes, no matter how hard you try to optimize your code, different hardware can provide radically different results! +It's incredibly important to benchmark code to make sure it is actually is as performant as you think it is! + +## Full Script + +[import, lang:"julia"](code/julia/performance.jl) + + + +## License + +##### Code Examples + +The code examples are licensed under the MIT license (found in [LICENSE.md](https://github.com/algorithm-archivists/algorithm-archive/blob/main/LICENSE.md)). + +##### Text + +The text of this chapter was written by [James Schloss](https://github.com/leios) and is licensed under the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/legalcode). + +[

](https://creativecommons.org/licenses/by-sa/4.0/) + diff --git a/contents/box_muller/code/julia/box_muller.jl b/contents/box_muller/code/julia/box_muller.jl new file mode 100644 index 000000000..82f614bd6 --- /dev/null +++ b/contents/box_muller/code/julia/box_muller.jl @@ -0,0 +1,156 @@ +using DelimitedFiles, LinearAlgebra +using Test + +function create_grid(n, endpoints) + + grid_extents = endpoints[2] - endpoints[1] + + # number of points along any given axis + # For 2D, we take the sqrt(n) and then round up + axis_num = ceil(Int, sqrt(n)) + + # we are now rounding n up to the nearest square if it was not already one + if sqrt(n) != axis_num + n = axis_num^2 + end + + # Distance between each point + dx = grid_extents / (axis_num) + + # Initializing the array, particles along the column, dimensions along rows + a = zeros(n, 2) + + # This loops over the relevant dimensions + for i = 1:axis_num + for j = 1:axis_num + a[(i - 1) * axis_num + j, :] .= + [(i - 0.5) * dx + endpoints[1], + (j - 0.5) * dx + endpoints[1]] + end + end + + return a +end + +function create_rand_dist(n, endpoints) + grid_extents = endpoints[2] - endpoints[1] + return(rand(n,2) * grid_extents .+ endpoints[1]) +end + +# This function reads in a pair of input points and performs the Cartesian +# Box--Muller transform +function cartesian_box_muller(input_pts, sigma, mu) + r = sqrt(-2 * log(input_pts[1])) + theta = 2 * pi * input_pts[2] + + return [sigma * r * cos(theta) + mu[1], + sigma * r * sin(theta) + mu[2]] + +end + +# This function reads in a pair of input points and performs the Cartesian +# Box--Muller transform +function polar_box_muller(input_pts, sigma, mu) + r_0 = input_pts[1]^2 + input_pts[2]^2 + + # this method is only valid for points within the unit circle + if r_0 == 0 || r_0 > 1 + return [NaN, NaN] + end + + return [sigma * input_pts[1] * sqrt(-2 * log(r_0) / r_0) + mu[1], + sigma * input_pts[2] * sqrt(-2 * log(r_0) / r_0) + mu[2]] + +end + +function is_gaussian(input_pts; bounds = [-1 1; -1 1], dx = 0.1, + sigma = 1, mu = [0,0], threshold = 0.1) + histogram = zeros(ceil(Int,(bounds[1,2]-bounds[1,1])/dx), + ceil(Int,(bounds[2,2]-bounds[2,1])/dx)) + + for i = 1:size(input_pts)[1] + input_x = input_pts[i, 1] + input_y = input_pts[i, 2] + if !(isnan(input_x) || isnan(input_y)) + + bin = CartesianIndex(ceil(Int, (input_x - bounds[1,1]) / dx), + ceil(Int, (input_y - bounds[2,1]) / dx)) + + if bin[1] <= size(histogram)[1] && bin[1] > 0 && + bin[2] <= size(histogram)[2] && bin[2] > 0 + histogram[bin] += 1 + end + end + end + + n = sum(histogram) + normalize!(histogram) + + rms = 0 + for i = 1:size(histogram)[1] + x = bounds[1,1] + i*dx + for j = 1:size(histogram)[2] + y = bounds[2,1] + j*dx + gaussian_value = exp(-(((x+mu[1])^2)/(2*sigma^2) + + ((y+mu[2])^2)/(2*sigma^2))) + rms += (gaussian_value - histogram[i,j])^2 + end + end + + return sqrt(rms/n) < threshold +end + +function main(n) + + # This casts the input onto the nearest square for the cartesian grids + n = Int(ceil(sqrt(n))^2) + + cartesian_grid = create_grid(n, [0,1]) + polar_grid = create_grid(n, [-1,1]) + cartesian_rand = create_rand_dist(n, [0,1]) + polar_rand = create_rand_dist(n, [-1,1]) + + cartesian_grid_output = similar(cartesian_grid) + polar_grid_output = similar(polar_grid) + cartesian_rand_output = similar(cartesian_rand) + polar_rand_output = similar(polar_rand) + + # going through each pair of points and using the x,y coordinates in + # their respective functions + for i = 1:size(cartesian_grid)[1] + cartesian_grid_output[i,:] .= + cartesian_box_muller(cartesian_grid[i,:], 1, [0,0]) + + polar_grid_output[i,:] .= polar_box_muller(polar_grid[i,:], 1, [0,0]) + + cartesian_rand_output[i,:] .= + cartesian_box_muller(cartesian_rand[i,:], 1, [0,0]) + + polar_rand_output[i,:] .= polar_box_muller(polar_rand[i,:], 1, [0,0]) + end + + @testset "histogram tests of Box--Muller Gaussianness" begin + @test is_gaussian(cartesian_grid_output; + bounds = [-3 3; -3 3], dx = 0.3, + sigma = 1, mu = [0,0]) + @test is_gaussian(cartesian_rand_output; + bounds = [-3 3; -3 3], dx = 0.3, + sigma = 1, mu = [0,0]) + @test is_gaussian(polar_grid_output; + bounds = [-3 3; -3 3], dx = 0.3, + sigma = 1, mu = [0,0]) + @test is_gaussian(polar_rand_output; + bounds = [-3 3; -3 3], dx = 0.3, + sigma = 1, mu = [0,0]) + end + + writedlm("cartesian_grid_output.dat", cartesian_grid_output) + writedlm("polar_grid_output.dat", polar_grid_output) + writedlm("cartesian_rand_output.dat", cartesian_rand_output) + writedlm("polar_rand_output.dat", polar_rand_output) + + writedlm("cartesian_grid.dat", cartesian_grid) + writedlm("polar_grid.dat", polar_grid) + writedlm("cartesian_rand.dat", cartesian_rand) + writedlm("polar_rand.dat", polar_rand) +end diff --git a/contents/box_muller/code/julia/performance.jl b/contents/box_muller/code/julia/performance.jl new file mode 100644 index 000000000..d8732df33 --- /dev/null +++ b/contents/box_muller/code/julia/performance.jl @@ -0,0 +1,142 @@ +using KernelAbstractions +using CUDA + +if has_cuda_gpu() + using CUDAKernels +end + +function create_grid(n, endpoints; AT = Array) + + grid_extents = endpoints[2] - endpoints[1] + + # number of points along any given axis + # For 2D, we take the sqrt(n) and then round up + axis_num = ceil(Int, sqrt(n)) + + # we are now rounding n up to the nearest square if it was not already one + if sqrt(n) != axis_num + n = axis_num^2 + end + + # Distance between each point + dx = grid_extents / (axis_num) + + # This is warning in the case that we do not have a square number + if sqrt(n) != axis_num + println("Cannot evenly divide ", n, " into 2 dimensions!") + end + + # Initializing the array, particles along the column, dimensions along rows + a = AT(zeros(n, 2)) + + # This works by firxt generating an N dimensional tuple with the number + # of particles to be places along each dimension ((10,10) for 2D and n=100) + # Then we generate the list of all CartesianIndices and cast that onto a + # grid by multiplying by dx and subtracting grid_extents/2 + for i = 1:axis_num + for j = 1:axis_num + a[(i - 1) * axis_num + j, 1] = i * dx + endpoints[1] + a[(i - 1) * axis_num + j, 2] = j * dx + endpoints[1] + end + end + + return a +end + +function create_rand_dist(n, endpoints; AT = Array) + grid_extents = endpoints[2] - endpoints[1] + return(AT(rand(n,2)) * grid_extents .+ endpoints[1]) +end + +# This function reads in a pair of input points and performs the Cartesian +# Box--Muller transform +@kernel function polar_muller_noreplacement!(input_pts, output_pts, sigma, mu) + tid = @index(Global, Linear) + @inbounds r_0 = input_pts[tid, 1]^2 + input_pts[tid, 2]^2 + + # this method is only valid for points within the unit circle + if r_0 == 0 || r_0 > 1 + @inbounds output_pts[tid,1] = NaN + @inbounds output_pts[tid,2] = NaN + else + @inbounds output_pts[tid,1] = sigma * input_pts[tid,1] * + sqrt(-2 * log(r_0) / r_0) + mu + @inbounds output_pts[tid,2] = sigma * input_pts[tid, 2] * + sqrt(-2 * log(r_0) / r_0) + mu + end + +end + +@kernel function polar_muller_replacement!(input_pts, output_pts, sigma, mu) + tid = @index(Global, Linear) + @inbounds r_0 = input_pts[tid, 1]^2 + input_pts[tid, 2]^2 + + while r_0 > 1 || r_0 == 0 + p1 = rand()*2-1 + p2 = rand()*2-1 + r_0 = p1^2 + p2^2 + end + + @inbounds output_pts[tid,1] = sigma * input_pts[tid,1] * + sqrt(-2 * log(r_0) / r_0) + mu + @inbounds output_pts[tid,2] = sigma * input_pts[tid, 2] * + sqrt(-2 * log(r_0) / r_0) + mu +end + + +function polar_box_muller!(input_pts, output_pts, sigma, mu; + numthreads = 256, numcores = 4, + f = polar_muller_noreplacement!) + if isa(input_pts, Array) + kernel! = f(CPU(), numcores) + else + kernel! = f(CUDADevice(), numthreads) + end + kernel!(input_pts, output_pts, sigma, mu, ndrange=size(input_pts)[1]) +end + + +@kernel function cartesian_kernel!(input_pts, output_pts, sigma, mu) + tid = @index(Global, Linear) + + @inbounds r = sqrt(-2 * log(input_pts[tid,1])) + @inbounds theta = 2 * pi * input_pts[tid, 2] + + @inbounds output_pts[tid,1] = sigma * r * cos(theta) + mu + @inbounds output_pts[tid,2] = sigma * r * sin(theta) + mu +end + +function cartesian_box_muller!(input_pts, output_pts, sigma, mu; + numthreads = 256, numcores = 4) + if isa(input_pts, Array) + kernel! = cartesian_kernel!(CPU(), numcores) + else + kernel! = cartesian_kernel!(CUDADevice(), numthreads) + end + + kernel!(input_pts, output_pts, sigma, mu, ndrange=size(input_pts)[1]) +end + +function main() + + input_pts = create_rand_dist(4096^2,[0,1]) + output_pts = create_rand_dist(4096^2,[0,1]) + + wait(cartesian_box_muller!(input_pts, output_pts, 1, 0)) + @time wait(cartesian_box_muller!(input_pts, output_pts, 1, 0)) + wait(polar_box_muller!(input_pts, output_pts, 1, 0)) + @time wait(polar_box_muller!(input_pts, output_pts, 1, 0)) + + if has_cuda_gpu() + input_pts = create_rand_dist(4096^2,[0,1], AT = CuArray) + output_pts = create_rand_dist(4096^2,[0,1], AT = CuArray) + + wait(cartesian_box_muller!(input_pts, output_pts, 1, 0)) + CUDA.@time wait(cartesian_box_muller!(input_pts, output_pts, 1, 0)) + wait(polar_box_muller!(input_pts, output_pts, 1, 0)) + CUDA.@time wait(polar_box_muller!(input_pts, output_pts, 1, 0)) + end + +end + +main() diff --git a/contents/box_muller/code/julia/plot.gp b/contents/box_muller/code/julia/plot.gp new file mode 100644 index 000000000..7dbbf4808 --- /dev/null +++ b/contents/box_muller/code/julia/plot.gp @@ -0,0 +1,31 @@ +set terminal pngcairo +set size square + +set output "cartesian_grid.png" +p "cartesian_grid.dat" pt 7 title "" + +set output "cartesian_rand.png" +p "cartesian_rand.dat" pt 7 title "" + +set output "polar_rand.png" +p "polar_rand.dat" pt 7 title "" + +set output "polar_grid.png" +p "polar_grid.dat" pt 7 title "" + + +set xrange[-3:3] +set yrange[-3:3] + +set output "polar_grid_output.png" +p "polar_grid_output.dat" pt 7 title "" + +set output "polar_rand_output.png" +p "polar_rand_output.dat" pt 7 title "" + +set output "cartesian_rand_output.png" +p "cartesian_rand_output.dat" pt 7 title "" + +set output "cartesian_grid_output.png" +p "cartesian_grid_output.dat" pt 7 title "" + diff --git a/contents/box_muller/res/cartesian_grid.png b/contents/box_muller/res/cartesian_grid.png new file mode 100644 index 000000000..c96144a65 Binary files /dev/null and b/contents/box_muller/res/cartesian_grid.png differ diff --git a/contents/box_muller/res/cartesian_grid_output.png b/contents/box_muller/res/cartesian_grid_output.png new file mode 100644 index 000000000..daf0151af Binary files /dev/null and b/contents/box_muller/res/cartesian_grid_output.png differ diff --git a/contents/box_muller/res/cartesian_grid_transform.png b/contents/box_muller/res/cartesian_grid_transform.png new file mode 100644 index 000000000..c5e88c484 Binary files /dev/null and b/contents/box_muller/res/cartesian_grid_transform.png differ diff --git a/contents/box_muller/res/cartesian_grid_transform.xcf b/contents/box_muller/res/cartesian_grid_transform.xcf new file mode 100644 index 000000000..b81accf3d Binary files /dev/null and b/contents/box_muller/res/cartesian_grid_transform.xcf differ diff --git a/contents/box_muller/res/cartesian_rand.png b/contents/box_muller/res/cartesian_rand.png new file mode 100644 index 000000000..8c21b32d7 Binary files /dev/null and b/contents/box_muller/res/cartesian_rand.png differ diff --git a/contents/box_muller/res/cartesian_rand_output.png b/contents/box_muller/res/cartesian_rand_output.png new file mode 100644 index 000000000..3be2945ce Binary files /dev/null and b/contents/box_muller/res/cartesian_rand_output.png differ diff --git a/contents/box_muller/res/cartesian_rand_transform.png b/contents/box_muller/res/cartesian_rand_transform.png new file mode 100644 index 000000000..08018bf34 Binary files /dev/null and b/contents/box_muller/res/cartesian_rand_transform.png differ diff --git a/contents/box_muller/res/grid.png b/contents/box_muller/res/grid.png new file mode 100644 index 000000000..bc4e80890 Binary files /dev/null and b/contents/box_muller/res/grid.png differ diff --git a/contents/box_muller/res/polar_grid.png b/contents/box_muller/res/polar_grid.png new file mode 100644 index 000000000..36088f773 Binary files /dev/null and b/contents/box_muller/res/polar_grid.png differ diff --git a/contents/box_muller/res/polar_grid_output.png b/contents/box_muller/res/polar_grid_output.png new file mode 100644 index 000000000..5fb774e3f Binary files /dev/null and b/contents/box_muller/res/polar_grid_output.png differ diff --git a/contents/box_muller/res/polar_grid_transform.png b/contents/box_muller/res/polar_grid_transform.png new file mode 100644 index 000000000..1335cf06a Binary files /dev/null and b/contents/box_muller/res/polar_grid_transform.png differ diff --git a/contents/box_muller/res/polar_rand.png b/contents/box_muller/res/polar_rand.png new file mode 100644 index 000000000..e4ff7ac58 Binary files /dev/null and b/contents/box_muller/res/polar_rand.png differ diff --git a/contents/box_muller/res/polar_rand_output.png b/contents/box_muller/res/polar_rand_output.png new file mode 100644 index 000000000..52944893f Binary files /dev/null and b/contents/box_muller/res/polar_rand_output.png differ diff --git a/contents/box_muller/res/polar_rand_transform.png b/contents/box_muller/res/polar_rand_transform.png new file mode 100644 index 000000000..aeefa3e4d Binary files /dev/null and b/contents/box_muller/res/polar_rand_transform.png differ diff --git a/contents/box_muller/res/rand100.png b/contents/box_muller/res/rand100.png new file mode 100644 index 000000000..5a09a248a Binary files /dev/null and b/contents/box_muller/res/rand100.png differ diff --git a/contents/box_muller/res/rand1000.png b/contents/box_muller/res/rand1000.png new file mode 100644 index 000000000..94609737d Binary files /dev/null and b/contents/box_muller/res/rand1000.png differ diff --git a/contents/box_muller/res/rand10000.png b/contents/box_muller/res/rand10000.png new file mode 100644 index 000000000..62adeaa1f Binary files /dev/null and b/contents/box_muller/res/rand10000.png differ diff --git a/contents/box_muller/res/rand_dist.png b/contents/box_muller/res/rand_dist.png new file mode 100644 index 000000000..d56d8f588 Binary files /dev/null and b/contents/box_muller/res/rand_dist.png differ diff --git a/contents/box_muller/res/right_arrow.pdf b/contents/box_muller/res/right_arrow.pdf new file mode 100644 index 000000000..9105474fa Binary files /dev/null and b/contents/box_muller/res/right_arrow.pdf differ diff --git a/contents/box_muller/res/warp_divergence.png b/contents/box_muller/res/warp_divergence.png new file mode 100644 index 000000000..7b385b82c Binary files /dev/null and b/contents/box_muller/res/warp_divergence.png differ diff --git a/literature.bib b/literature.bib index 1ef1e5c12..80e55f787 100644 --- a/literature.bib +++ b/literature.bib @@ -447,7 +447,6 @@ @misc{rmsd_wiki year={2021} } - #------------------------------------------------------------------------------# # Cryptography #------------------------------------------------------------------------------# @@ -518,3 +517,27 @@ @misc{ECC_crypto_wiki year={2022} } +#------------------------------------------------------------------------------# +# BOX--MULLER +#------------------------------------------------------------------------------# + +@article{box1958, + title={A note on the generation of random normal deviates}, + author={Box, George EP}, + journal={Ann. Math. Statist.}, + volume={29}, + pages={610--611}, + year={1958} +} + +@misc{marsaglia_wiki, + title={Wikipedia: Marsaglia Transform}, + url={https://en.wikipedia.org/wiki/Marsaglia_polar_method}, + year={2022} +} + +@misc{box_muller_wiki, + title={Wikipedia: Box-Muller Transform}, + url={https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform}, + year={2022} +}