Skip to content

Original proposal suggestion

Declan Valters edited this page May 30, 2018 · 1 revision

NetCDF managed as a filesystem

Mentor(s) Blazej Krzeminski, Francesca Di Giuseppe, Kevin Marsh

Skills required

NetCDF Linux Python

Project type

Development of software

Project goal

Represent the hierarchical structure of a NetCDF dataset as a virtual file system. Easily manipulate NetCDF dataset structure using filesystem operations and general-purpose Unix tools.

Benefits to ECMWF

NetCDF data format is widely used in the atmospheric science community including ECMWF. The CF Community Standard for NetCDF in particular is becoming widely adopted. The proposed software would simplify many tasks like creating and modifying NetCDF datasets e.g for quick prototyping and debugging. The software would be potentially useful for anyone working with weather and climate data in NetCDF format. At ECMWF, it would be particularly useful within a number of projects run at ECMWF (EFAS, GloFAS and EFFIS/GEFF projects) which all use NetCDF.

Description

What data/system do you plan to use? Implementation will be based on FUSE (Filesystem in Userspace) technology. We will implement the project in Python, using “netcdf” and “fuse” modules. Additionally we will use ECMWF NetCDF compliance checker to prepare a set of standard NetCDF templates and compliance rules.

What is the current problem/limitation?

Often there is a need to quickly create or adapt existing NetCDF datasets e.g. when prototyping a new data processing application. Tasks such as modifying name or value of a NetCDF attribute, deleting unnecessary variable or attribute typically require specialised NetCDF tools and libraries. Modifying a NetCDF dataset is not as straightforward as editing a text file, deleting or renaming a directory. We want to make it simple!

What could be the solution?

Our solution is to represent NetCDF structure as a virtual filesystem. We were inspired by a sentence from the NetCDF4 manual: "Groups are like directories in a file system, except they are all within a file" and took it literally.

Our idea also follows the “everything is a file” philosophy of Unix, where the same set of utilities and APIs can be used on a wide range of resources exposed through the filesystem.

Exposing the contents of a NetCDF dataset as a hierarchy of directories and files would make it easy to manipulate the structure of the dataset using common and familiar tools - UNIX commands like “cp” or “mkdir”, text editors, graphical file managers and so on.

Give some ideas for the implementation (optional)

We will implement FUSE driver which will translate filesystem operations like open, read, write, unlink, mkdir, rmdir etc. into operations on NetCDF objects (Variables, Groups, Attributes, Data Arrays, Dimensions) represented as virtual files and directories.

The NetCDF dataset will be “mounted” as a disk volume and the contents of the NetCDF dataset will look and behave like a filesystem. For example a NetCDF Variable could be represented as a directory containing: Variable’s data array as a text/binary/PNG/other file (to be selected at mount time?) Variable’s attributes as text files (one file per attribute)

Some example operations:

  • Deleting a variable = deleting the directory
  • Adding attribute = creating a new text file in the directory
  • Modifying attribute = editing the text file (with your favourite text editor!)
  • Copying variable = copying the directory

To promote CF standards we would include a set of CF-compliant NetCDF templates e.g. for use as starting point when creating new datasets. Additionally, the software package could generate CF compliance rules (JSON files) for the ECMWF NetCDF compliance checker.

Clone this wiki locally