Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting a full read-through #174

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions content/05.introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ When viewing this from the perspective of the landscape of inquiry, the most sta

In [@doi:10.1088/0067-0049/192/1/9], the analysis platform `yt` was described.
At the time, `yt` was focused on analyzing and visualizing the output of grid-based adaptive mesh refinement hydrodynamic simulations; while these were used to study many different physical phenomena, they all were laid out in roughly the same way, in rectilinear meshes of data.
In this paper, we present the current version of `yt`, which enables identical scripts to analyze and visualize data stored as rectilinear grids as before, but additionally particle or discrete data, octree-based data, and data stored as unstructured meshes.
In this paper, we present the current version of `yt`, which enables identical scripts to analyze and visualize data stored as [rectilinear grids](#sec:grid_analysis) as before, but additionally [particle or discrete data](#sec:sph-analysis), [octree-based data](#sec:octree_analysis), and data stored as [unstructured meshes](#sec:unstructured_mesh).
This has been the result of a large-scale effort to rewrite the underlying machinery within `yt` for accessing data, indexing that data, and providing it in efficient ways to higher-level routines, as discussed in Section Something.
While this was underway, `yt` has also been considerably reinstrumented with [metadata-aware array infrastructure](#sec:units), the [volume rendering infrastructure](#sec:vr) has been rewritten to be more user-friendly and capable, and support for [non-Cartesian geometries](#sec:noncartesian) has been added.

The single biggest update/addition to `yt` since that paper was published has not been technical in nature.
The single biggest update or addition to `yt` since that paper was published has not been technical in nature.
In the intervening years, a directed and intense community-building effort has resulted in the contributions from over a hundred different individuals, many of them early-stage researchers, and a [thriving community of both users and developers](#sec:community).
This is the crowning achievement of development, as we have attempted to build `yt` into a tool that enables inquiry from a technical level as well as fosters a supportive, friendly community of individuals engaged in self-directed inquiry.
8 changes: 5 additions & 3 deletions content/10.community_building.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,12 @@ Participation in code review, providing comments, feedback and suggestions to ot
But, it does arise from a pragmatic (ensuring code reliability) or altruistic (the public good of the software) motivation, and is thus a deeper level of engagement.

The final two activities, drafting enhancement proposals and closing bug reports, are the most engaged, and often the most removed from the academic motivation structure.
Developing an [enhancement proposal](#sec:ytep) for `yt` means iterating with other developers on the motivation behind and implementation of a large piece of functionality; it requires both motivation to engage with the community and the patience to build consensus among stakeholders.
Developing an [enhancement proposal](#sec:ytep) for `yt` means iterating with other developers on the motivation behind and implementation of a large piece of functionality; it requires both motivation to engage with the community and the patience to build consensus among stakeholders.
Closing bug reports -- and the development work associated with identifying, tracking and fixing bugs -- requires patience and often repeated engagement with stakeholders.

### Engagement Metrics

We include here plots of the level of engagement on mailing list discussions and the citation count of the original method paper.

Typically, measuring the degree of engagement in a project is done by examining the amount of activity that surrounds it; this can be through development, mailing list or other discussion forum engagement, or through citations of a paper.
These metrics are valuable, but incomplete.
Furthermore, their quantification presents challenges: how does migration of a project (and a community) from one form of interaction (such as a mailing list) to another (such as Slack or Github Issues) impact the perceived growth or health of that project?
As such, we have attempted to build a proxy for the development metrics by examining activity around pull requests (as below in Figure #fig:pr-closing-time) and have opted to elide discussion of the activity of the project through the currently dominant medium of Slack.
135 changes: 68 additions & 67 deletions content/15.development_procedure.md

Large diffs are not rendered by default.

67 changes: 33 additions & 34 deletions content/20.data_objects.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,29 +5,31 @@ The basic principles by which `yt` operates are built on the notion of selecting
Selections in `yt` are usually spatial in nature, although several non-spatial mechanisms focused on queries can be utilized as well.
These objects which conduct selection are selectors, and are designed to provide as small of an API as possible, to enable ease of development and deployment of new selectors.

Selectors require defining several functions, with the option of defining additional functions for optimization, that return true or false whether a given point is or is not included in the selected region.
Implementing a new "selector" in `yt` requires defining several functions, with the option of defining additional functions for optimization, that return true or false whether a given point is or is not included in the selected region.
These functions include selection of a rectilinear grid (or any point within that grid), selection of a point with zero extent and selection of a point with a non-zero spherical radius.
Implementing new selectors is uncommon, as many basic selectors have been defined, along with the ability to combine these through boolean operations.

The base selector object utilizes these routines during a selection operation to maximize the amount of code reused between particle, patch, and octree selection of data.
These three types of data are selected through specific routines designed to minimize the number of times that the selection function must be called, as they can be quite expensive.

Selecting data from a grid is a two-step process.
Selecting data from a dataset composed of grids is a two-step process.
The first step is identifying which grids intersect a given data selector; this is done through a sequence of bounding box intersection checks.
Within a given grid, the cells which are intersected are identified.
This results in the selection routine being called once for each grid object in the simulation and once for each cell located within an intersecting grid.
This results in the selection routine being called once for each grid object in the simulation and once for each cell located within an intersecting grid (unless additional short-circuit paths, specific to the selector, are available).
This can be conducted hierarchically, but due to implementation details around how the grid index is stored this is not yet cost effective.

Selecting data from an octree-organized dataset utilizes a recursive scheme that selects individual oct nodes, then for each cell within that oct, determining which cells must be selected or child nodes recursed into.
This system is designed to allow for having leaf nodes of varying cells-per-side, for instance 1, 2, 4, 8, etc.
However, the number of nodes is fixed at 8, with subdivision always occurring at the midplane.

The final mechanism by which data is selected is for discrete data points, typically particles in astrophysical simulations.
At present, this is done by first identifying which data files intersect with a given selector, then selecting individual points.
There is no hierarchical data selection conducted in this system, as we do not yet allow for re-ordering of data on disk or in-memory which would facilitate hierarchical selection through the use of operations such as Morton indices.
Often these particles are stored in multiple files, or multiple _virtual_ files can be identified by `yt` through applying range or subsetting to the full dataset.
Selection is conducted by first identifying which data files (or data file subsets) intersect with a given selector, then selecting individual points in those data files.
There is only a single level of hierarchical data selection in this system, as we do not yet allow for re-ordering of data on disk or in-memory which would facilitate multi-level hierarchical selection through the use of operations such as Morton indices.

### Selection Routines

Given these set of hierarchical selection methods, all of which are designed to provide opportunities for early-termination, each *geometric* selector object is required to implement a small set of methods to expose its functionality to the hierarchical selection process.
Given these set of hierarchical selection methods, all of which are designed to provide opportunities for early-termination, each _geometric_ selector object is required to implement a small set of methods to expose its functionality to the hierarchical selection process.
Duplicative functions often result from attempts to avoid expensive calculations that take into account boundary conditions such as periodicity and reflectivity unless necessary.
Additionally, by providing some routines as options, we can in some instances specialize them for the specific geometric operation.

Expand Down Expand Up @@ -56,13 +58,12 @@ A selection of data in a low-resolution simulation from a sphere.
The logical `A AND NOT B` for regions `A` and `B` from Figures @fig:reg2 and @fig:sp2 respectively.
](images/selectors/reg2_not_sp2.svg){#fig:reg2_not_sp2}


### Fast and Slow Paths

Given an ensemble of objects, the simplest way of testing for inclusion in a selector is to call the operation `select_cell` on each individual object.
Where the objects are organized in a regular fashion, for instance a "grid" that contains many "cells," we can apply both "first pass" and "second pass" fast-path operations.
The "first pass" checks whether or not the given ensemble of objects is included, and only iterates inward if there is partial or total inclusion.
The "second pass" fast pass is specialized to both the organization of the objects *and* the selector itself, and is used to determine whether either only a specific (and well-defined) subset of the objects is included or the entirety of them.
The "second pass" fast pass is specialized to both the organization of the objects _and_ the selector itself, and is used to determine whether either only a specific (and well-defined) subset of the objects is included or the entirety of them.

For instance, we can examine the specific case of selecting grid cells within a rectangular prism.
When we select a "grid" of cells within a rectangular prism, we can have either total inclusion, partial inclusion, or full exclusion.
Expand All @@ -79,31 +80,29 @@ We do make a distinction between "selection" operations and "reduction" or "cons
Additionally, some have been marked as not "user-facing," in the sense that they are not expected to be constructed directly by users, but instead are utilized internally for indexing purposes.
In columns to the right, we provide information as to whether there is an available "fast" path for grid objects.

| Object Name | Object Type |
| ------------------------ | ------------------------ |
| Arbitrary grid | Resampling |
| Boolean object | Selection (Base Class) |
| Covering grid | Resampling |
| Cut region | Selection |
| Cutting plane | Selection |
| Data collection | Selection |
| Disk | Selection |
| Ellipsoid | Selection |
| Intersection | Selection (Bool) |
| Octree | Internal index |
| Orthogonal ray | Selection |
| Particle projection | Reduction |
| Point | Selection |
| Quadtree projection | Reduction |
| Ray | Selection |
| Rectangular Prism | Selection |
| Slice | Selection |
| Smoothed covering grid | Resampling |
| Sphere | Selection |
| Streamline | Selection |
| Surface | Selection |
| Union | Selection (Bool) |
| Object Name | Object Type |
| ---------------------- | ---------------------- |
| Arbitrary grid | Resampling |
| Boolean object | Selection (Base Class) |
| Covering grid | Resampling |
| Cut region | Selection |
| Cutting plane | Selection |
| Data collection | Selection |
| Disk | Selection |
| Ellipsoid | Selection |
| Intersection | Selection (Bool) |
| Octree | Internal index |
| Orthogonal ray | Selection |
| Particle projection | Reduction |
| Point | Selection |
| Quadtree projection | Reduction |
| Ray | Selection |
| Rectangular Prism | Selection |
| Slice | Selection |
| Smoothed covering grid | Resampling |
| Sphere | Selection |
| Streamline | Selection |
| Surface | Selection |
| Union | Selection (Bool) |

Table: Selection objects and their types. {#tbl:selection-objects}


22 changes: 13 additions & 9 deletions content/25.processing_and_analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ Derived fields are an extremely integral component of `yt` and are the gateway t
In addition, `yt` includes a large number of fields available, many of which are dynamically constructed according to metadata available in the dataset, to jump-start analysis.
Researchers using `yt` can load a dataset and immediately compute, for instance, the velocity divergence and `yt` will construct the appropriate finite different stencil, fill in any missing zones at the edge of individual boundaries, and return an array that can be accessed, visualized or processed.

`yt` also provides, and utilizes internally, methods for constructing derived fields from "templates."
For instance, generation of mass fraction fields (as demonstrated above) is conducted internally by `yt` through iterating over all known fields of type density and applying the same function template to them.
This is applied for quantities such as atomic and molecular species as well as for vector fields, where operators such as divergence and gradient are available through templated field operations.

#### Particle Filters {#sec:particle_filters}

Many of the data formats that `yt` accepts define particles as mixtures of a single set of attributes (such as position, velocity, etc) and then a "type" -- for instance, intermingling dark matter particles with "star" particles.
Expand Down Expand Up @@ -141,13 +145,13 @@ The array-like operations utilized in `yt` attempt to map to conceptually simila
Unlike numpy, however, these utilize `yt`'s dataset-aware "chunking" operations, in a manner philosophically similar to the chunking operations used in the parallel computation library dask.
Below, we outline the three classes of operations that are available, based on the type of their return value.

#### Reduction to Scalars {#sec:arrayops-scalar}
#### Reduction to Scalars {#sec:arrayops-scalar}

Traditional array operations that map from an array to a scalar are accessible utilizing familiar syntax. These include:
Traditional array operations that map from an array to a scalar are accessible utilizing familiar syntax. These include:

* `min(field_specification)`, `max(field_specification)`, and `ptp(field_specification)`
* `argmin(field_specification, axis)`, and `argmax(field_specification, axis)`
* `mean(field_specification, weight)`, `std(field_specification, weight)`, and `sum(field_specification)`
- `min(field_specification)`, `max(field_specification)`, and `ptp(field_specification)`
- `argmin(field_specification, axis)`, and `argmax(field_specification, axis)`
- `mean(field_specification, weight)`, `std(field_specification, weight)`, and `sum(field_specification)`

In addition to the advantages of allowing the parallelism and memory management be handled by `yt`, these operations are also able to accept multiple fields.
This allows multiple fields to be queried in a single pass over the data, rather than multiple passes.
Expand All @@ -160,17 +164,17 @@ The operations `mean` and `sum` are available here in a non-spatial form, where

#### Reduction to Vectors {#sec:arrayops-vector}

* `profile(axes, fields, profile_specification)`
- `profile(axes, fields, profile_specification)`

The `profile` operation provides weighted or unweighted histogramming in one or two dimensions.
This function accepts the axes along which to compute the histogram as well as the fields to compute, and information about whether the binning should be an accumulation, an average, or a weighted average.
These operations are described in more detail in **reference profile section**.

#### Remapping Operations {#sec:arrayops-remap}

* `mean(field_specification, weight, axis)`
* `sum(field_specification, axis)`
* `integrate(field_specification, weight, axis)`
- `mean(field_specification, weight, axis)`
- `sum(field_specification, axis)`
- `integrate(field_specification, weight, axis)`

These functions map directly to different methods used by the projection data object.
Both `mean` and `sum`, when supplied a spatial axis, will compute a dimensionally-reduced projection, remapped into a pixel coordinate plane.
Expand Down
Loading
Loading