-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
75 lines (50 loc) · 2.16 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
output:
md_document:
variant: markdown_github
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# raadfiles
The goal of `raadfiles` is to manage information about the large collection of files used by raadtools, and related systems.
## Motivation
You have a huge set of files you need to access regularly, and the information about those files is the first thing you need.
This project aims to speed up and help you control the following:
* raw file listing
* seaching file names for patterns
* caching metadata extracted from the files
The overall goal is to help you write code to access and manipulate the data in those files. This is a natural complement to
schemes that automatically obtain files and build file collections such as [raadsync](https://github.com/AustralianAntarcticDataCentre/raadsync.git) but can be used for other collections as well.
## Set up
1. Get a huge file collection. (You probably have lots, but see https://github.com/rOpensci/bowerbird for a possible way forward if not).
2. Install `raadfiles`.
3. Set up the automated file listing and caching mechanism.
## Install raadfiles
```R
## install.packages("remotes")
remotes::install_github("AustralianAntarcticDivision/raadfiles")
```
## Set up the automated file listing and caching mechanism
1. An R script to list all the files and save to a cache, use raw text or R workspace `saveRDS` or feather or an actual database.
2. Create a cron job to run that script every day/hour/minute.
3. Configure `raadfiles::custom_setup` (TBD, see R/zzz.R for the in-built mechanism)
## Why raadfiles for raadtools?
* consistent convention around "file", "root" in the file cache to have a clear separation on the configured path versus the data
* mechanism to load file cache into memory on load, for all functions to share
*
```{r timing, eval=FALSE}
library(raadtools)
system.time(rt_files <- sstfiles())
library(raadfiles)
system.time(rf_files <- oisst_daily_files())
range(rt_files$date)
range(rf_files$date)
length(rt_files$date)
length(rf_files$date)
```