Folder structure

Folder Structure:#

How to structure data on disk should not be enforced by any SOP. Anyhow, we recommend the following structure to aid the automation of data curation and publication workflows. It is based on several best-practices guides (e.g. https://github.com/drivendata/cookiecutter-data-science).

/<volume>/<project>/
├── <event_1>
│   ├── <sensor_x>
│   │   ├── external/ (Optional) External data that affects the creation of raw data (e.g. calibration curves)
│   │   └── raw/ The raw data as recorded by the sensor (e.g. acoustic soundings)
│   │   └── intermediate/ (Optional) Intermediate data that will not be archived. Playground or sandbox for working with the raw data
│   │   └── processed/ Processed data that has been QA/QC'd and is ready for publication (e.g. map grids)
│   │   └── products/ (Optional) Data products created from the raw or processed data for visualization or as combinations of data of several events (e.g. geological maps)
│   │   └── protocol/ │Documentation on how the data was created, curated, processed, visualized, etc.
│   └── <sensor_y> Same as above for the next sensor deployed during this event
│   └── protocol/ General information on this event (e.g. ROV deployment plan)
└── <event_2> The same as above for the next event
    ├── <sensor_x>
    └── <sensor_z>

On German research vessels, the “scientists folder” on the network or the “Mass-Data-Module” (MDM) will mostly act as the root folder /<volume>/<project>/ but for some researchers, who bring their own mass storage or NAS devices, it may be some path on their own hardware. Some disciplines/groups like to split their data by sensor first. This is not recommended but certainly possible. In that case, the paths would look like this:

/<volume>/<project_i>/
├── <sensor_x>
│   ├── <event_1>
│   └── <event_2>
└── <sensor_y>
    ├── <event_1>
    └── <event_3>