-
Notifications
You must be signed in to change notification settings - Fork 3
Original proposal suggestion
Mentor(s) Blazej Krzeminski, Francesca Di Giuseppe, Kevin Marsh
NetCDF Linux Python
Development of software
Represent the hierarchical structure of a NetCDF dataset as a virtual file system. Easily manipulate NetCDF dataset structure using filesystem operations and general-purpose Unix tools.
NetCDF data format is widely used in the atmospheric science community including ECMWF. The CF Community Standard for NetCDF in particular is becoming widely adopted. The proposed software would simplify many tasks like creating and modifying NetCDF datasets e.g for quick prototyping and debugging. The software would be potentially useful for anyone working with weather and climate data in NetCDF format. At ECMWF, it would be particularly useful within a number of projects run at ECMWF (EFAS, GloFAS and EFFIS/GEFF projects) which all use NetCDF.
What data/system do you plan to use? Implementation will be based on FUSE (Filesystem in Userspace) technology. We will implement the project in Python, using “netcdf” and “fuse” modules. Additionally we will use ECMWF NetCDF compliance checker to prepare a set of standard NetCDF templates and compliance rules.
Often there is a need to quickly create or adapt existing NetCDF datasets e.g. when prototyping a new data processing application. Tasks such as modifying name or value of a NetCDF attribute, deleting unnecessary variable or attribute typically require specialised NetCDF tools and libraries. Modifying a NetCDF dataset is not as straightforward as editing a text file, deleting or renaming a directory. We want to make it simple!
Our solution is to represent NetCDF structure as a virtual filesystem. We were inspired by a sentence from the NetCDF4 manual: "Groups are like directories in a file system, except they are all within a file" and took it literally.
Our idea also follows the “everything is a file” philosophy of Unix, where the same set of utilities and APIs can be used on a wide range of resources exposed through the filesystem.
Exposing the contents of a NetCDF dataset as a hierarchy of directories and files would make it easy to manipulate the structure of the dataset using common and familiar tools - UNIX commands like “cp” or “mkdir”, text editors, graphical file managers and so on.
We will implement FUSE driver which will translate filesystem operations like open, read, write, unlink, mkdir, rmdir etc. into operations on NetCDF objects (Variables, Groups, Attributes, Data Arrays, Dimensions) represented as virtual files and directories.
The NetCDF dataset will be “mounted” as a disk volume and the contents of the NetCDF dataset will look and behave like a filesystem. For example a NetCDF Variable could be represented as a directory containing: Variable’s data array as a text/binary/PNG/other file (to be selected at mount time?) Variable’s attributes as text files (one file per attribute)
- Deleting a variable = deleting the directory
- Adding attribute = creating a new text file in the directory
- Modifying attribute = editing the text file (with your favourite text editor!)
- Copying variable = copying the directory
To promote CF standards we would include a set of CF-compliant NetCDF templates e.g. for use as starting point when creating new datasets. Additionally, the software package could generate CF compliance rules (JSON files) for the ECMWF NetCDF compliance checker.