-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
download a zenodo archive #31
Comments
Thanks @hansvancalster i'll have a look ASAP |
Thanks for looking into this. Note that downloading files might also be possible via the API, but I haven't seen examples yet. As a side-note, I noticed that the github LICENSE file should be changed to be in accordance with the license that is mentioned in the description of the package (and also the license mentioned on your deposit of the package on zenodo): |
For the LICENSE, it has been set-up in this way to fulfill CRAN expectations, the full license text is not handled in the LICENSE file because CRAN resolves this in 2 separate things: the original LICENSE text and the specific LICENSE details for the software, in the case of MIT it is the copyright holder. See https://cran.r-project.org/package=zen4R where see both MIT and LICENSE links. All R CRAN packages are handled in this way, see eg https://cran.r-project.org/package=rdflib |
Thanks for the explanation. Didn't know that... |
Just sticking my nose in here to say that I found this package by looking for exactly this functionality. I'd like to be able to download the data archive in R and use the data directly. I can write something to do this, but proper functionality that builds on a package like |
Thanks @adamhsparks i'm very busy currently, but asap I will have a look to had a function to download data archives from zenodo records. Indeed I have colleagues that are also interested in this feature. |
No worries. I was just voicing community support for the idea. |
Hi all, i've added 2 new functions associated to a
By default it will be done sequentially and files are downloaded into the target dir (default is current wd). Since you were handling parallel, i've put some code to do this with more flexibility required on the way parallel is handled. By default it will use standard To reuse zen4R within your package, and download files from a particular record, you will have to do in 2 times: first reach the record, and then download files. You can reach the record by ID or DOI (or eventually by Concept Id / DOI): #instantiate Zen4R client
zenodo <- ZenodoManager$new(
token = <your_token>,
logger = "INFO" # use "DEBUG" to see detailed API operation logs, use NULL if you don't want logs at all
)
#reach your record by Id or DOI (or same by concept ID / DOI)
rec = zenodo$getRecordById("<your id>")
rec = zenodo$getRecordByDOI("<your doi>")
rec = zenodo$getRecordByConceptId("<your concept id>")
rec = zenodo$getRecordByConceptDOI("<your concept doi>")
#list files
rec$listFiles()
#download files as seq
rec$downloadFiles(path = "<my target dir>")
#download files as parallel (standard mclapply in Unix)
rec$downloadFiles(path = "<my target dir>", parallel = TRUE, mc.cores = 4)
#download files as parallel (using a cluster, compatible with Win OS)
cl <- makeCluster(4)
rec$downloadFiles(path = "<my target dir>", parallel = TRUE, parallel_handler = parLapply, cl = cl)
Last but not least, i did a round of other improvements in zen4R, and i'm planning to do a CRAN release soon, in case you have comments / suggestions on the new functions. |
Good timing. I have a workshop coming up at the end of the month where I want to download data from Zenodo. I'll install this and give it a go and let you know if I have any feedback for your CRAN submission. |
Thanks for the efforts @eblondel, this seems promising! (I didn't test it yet.) Thanks also for providing sample code! My colleague @hansvancalster will be back online in the week of 24 Aug. Meanwhile I'll try to have a closer look at this. At first sight (note that I'm not yet familiar with
From your code, I see you also return helpful feedback messages 👍 . In |
Thanks @florisvdh for your feedback,
zenodo = ZenodoManager$new()
rec = zenodo$getRecordByConceptDOI("10.5281/zenodo.2547036")
rec$downloadFiles()
|
@florisvdh I've just added a download_zenodo("10.5281/zenodo.2547036") |
@florisvdh justed added the missing md5sum integrity check. |
Hi @eblondel , I've taken closer look. It's great that you added a Some further tweaks and fixes are proposed in PR #35. Further aspects to be discussed / solved IMO are below. Some points have to do with our wish to drop our 'miscellaneous' function
Code, output, session info> system.time(
+ inborutils::download_zenodo("10.5281/zenodo.2682323") #doi
+ )
Will download 1 file (total size: 37.5 MiB) from https://doi.org/10.5281/zenodo.2682323 (GRTS master sample for habitat monitoring in Flanders; version: 2)
[100%] Downloaded 39306606 bytes...
Verifying file integrity...
GRTSmaster_habitats.tif was downloaded and its integrity verified (md5sum: 20de76e1abfbafd6edcc00e1a9cf87a0)
user system elapsed
1.278 1.959 7.751
> system.time(
+ download_zenodo("10.5281/zenodo.2682323") #doi
+ )
[zen4R][INFO] ZenodoRecord - Download in sequential mode
[zen4R][INFO] ZenodoRecord - Will download 1 file from record '2682323' (doi: '10.5281/zenodo.2682323') - total size: 39306606
[zen4R][INFO] Downloading file 'GRTSmaster_habitats.tif' from record '2682323' (doi: '10.5281/zenodo.2682323') - size: 39306606
trying URL 'https://zenodo.org/api/files/ca78f68d-9753-4223-8115-4b8717760e96/GRTSmaster_habitats.tif'
Content type 'image/tiff' length 39306606 bytes (37.5 MB)
==================================================
downloaded 37.5 MB
[zen4R][INFO] File downloaded at '/media/floris/DATA/git_repositories/zen4R'.
[zen4R][INFO] ZenodoRecord - Verifying file integrity...
[zen4R][INFO] File 'GRTSmaster_habitats.tif': integrity verified (md5sum: 20de76e1abfbafd6edcc00e1a9cf87a0)
[zen4R][INFO] ZenodoRecord - End of download
user system elapsed
0.881 1.398 9.418
Warning messages:
1: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
2: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
3: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables Session info─ Session info ────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.3 (2020-02-29)
os Linux Mint 18.1
system x86_64, linux-gnu
ui RStudio
language (EN)
collate nl_BE.UTF-8
ctype nl_BE.UTF-8
tz Europe/Brussels
date 2020-08-13
─ Packages ────────────────────────────────────────────────────────────────────────
! package * version date lib source
assertable 0.2.7 2019-09-21 [1] CRAN (R 3.6.1)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.8 2020-06-17 [1] CRAN (R 3.6.3)
bit 4.0.4 2020-08-04 [1] CRAN (R 3.6.3)
bit64 4.0.2 2020-07-30 [1] CRAN (R 3.6.3)
blob 1.2.1 2020-01-20 [1] CRAN (R 3.6.2)
callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.3)
class 7.3-17 2020-04-26 [4] CRAN (R 3.6.3)
classInt 0.4-3 2020-04-07 [1] CRAN (R 3.6.3)
cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
conditionz 0.1.0 2019-04-24 [1] CRAN (R 3.6.3)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
crosstalk 1.1.0.1 2020-03-13 [1] CRAN (R 3.6.3)
curl 4.3 2019-12-02 [1] CRAN (R 3.6.2)
data.table 1.13.0 2020-07-24 [1] CRAN (R 3.6.3)
DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.2)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
devtools 2.3.1 2020-07-21 [1] CRAN (R 3.6.3)
digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.3)
dplyr 1.0.1 2020-07-31 [1] CRAN (R 3.6.3)
drat 0.1.8 2020-07-18 [1] CRAN (R 3.6.3)
e1071 1.7-3 2019-11-26 [1] CRAN (R 3.6.2)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 3.6.3)
evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.1)
fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.2)
fs 1.5.0 2020-07-31 [1] CRAN (R 3.6.3)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
geoaxe 0.1.0 2016-02-19 [1] CRAN (R 3.6.0)
ggplot2 3.3.2 2020-06-19 [1] CRAN (R 3.6.3)
glue 1.4.1 2020-05-13 [1] CRAN (R 3.6.3)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.2)
htmltools 0.5.0 2020-06-16 [1] CRAN (R 3.6.3)
htmlwidgets 1.5.1 2019-10-08 [1] CRAN (R 3.6.1)
httr 1.4.2 2020-07-20 [1] CRAN (R 3.6.3)
inborutils 0.1.0.9086 2020-07-10 [1] Github (inbo/inborutils@e07eec1)
iterators 1.0.12 2019-07-26 [1] CRAN (R 3.6.1)
jsonlite 1.7.0 2020-06-25 [1] CRAN (R 3.6.3)
KernSmooth 2.23-17 2020-04-26 [4] CRAN (R 3.6.3)
keyring 1.1.0 2018-07-16 [1] CRAN (R 3.6.3)
knitr 1.29 2020-06-23 [1] CRAN (R 3.6.3)
lattice 0.20-41 2020-04-02 [4] CRAN (R 3.6.3)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.3)
leaflet 2.0.3 2019-11-16 [1] CRAN (R 3.6.2)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.3)
lubridate 1.7.9 2020-06-08 [1] CRAN (R 3.6.3)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
oai 0.3.0 2019-09-07 [1] CRAN (R 3.6.1)
odbc 1.2.3 2020-06-18 [1] CRAN (R 3.6.3)
packrat 0.5.0 2018-11-14 [1] CRAN (R 3.6.0)
pillar 1.4.6 2020-07-10 [1] CRAN (R 3.6.3)
pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 3.6.3)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
pkgload 1.1.0 2020-05-29 [1] CRAN (R 3.6.3)
plyr 1.8.6 2020-03-03 [1] CRAN (R 3.6.3)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.3)
processx 3.4.3 2020-07-05 [1] CRAN (R 3.6.3)
ps 1.3.3 2020-05-08 [1] CRAN (R 3.6.3)
purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.3)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.2)
Rcpp 1.0.5 2020-07-06 [1] CRAN (R 3.6.3)
readr 1.3.1 2018-12-21 [1] CRAN (R 3.6.2)
remotes 2.2.0 2020-07-21 [1] CRAN (R 3.6.3)
rgbif 3.2.0 2020-07-23 [1] CRAN (R 3.6.3)
rgeos 0.5-3 2020-05-08 [1] CRAN (R 3.6.3)
rlang 0.4.7 2020-07-09 [1] CRAN (R 3.6.3)
rmarkdown 2.3 2020-06-18 [1] CRAN (R 3.6.3)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
RSQLite 2.2.0 2020-01-07 [1] CRAN (R 3.6.2)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.3)
scales 1.1.1 2020-05-11 [1] CRAN (R 3.6.3)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
sf 0.9-5 2020-07-14 [1] CRAN (R 3.6.3)
sp 1.4-2 2020-05-20 [1] CRAN (R 3.6.3)
stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.3)
stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
testthat * 2.3.2 2020-03-02 [1] CRAN (R 3.6.3)
tibble 3.0.3 2020-07-10 [1] CRAN (R 3.6.3)
tidyr 1.1.1 2020-07-31 [1] CRAN (R 3.6.3)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.3)
units 0.6-7 2020-06-13 [1] CRAN (R 3.6.3)
usethis 1.6.1 2020-04-29 [1] CRAN (R 3.6.3)
uuid 0.1-4 2020-02-26 [1] CRAN (R 3.6.3)
vctrs 0.3.2 2020-07-15 [1] CRAN (R 3.6.3)
whisker 0.4 2019-08-28 [1] CRAN (R 3.6.1)
withr 2.2.0 2020-04-20 [1] CRAN (R 3.6.3)
xfun 0.16 2020-07-24 [1] CRAN (R 3.6.3)
xml2 1.3.2 2020-04-23 [1] CRAN (R 3.6.3)
yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2)
P zen4R * 0.4 2020-08-11 [?] local
[1] /home/floris/lib/R/library
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
P ── Loaded and on-disk path mismatch. The difference in elapsed time is especially noticeable for small downloads, e.g. the 56.9 KiB (2 files) from "10.5281/zenodo.3378733" (
The warning appears to come from the > zenodo = ZenodoManager$new()
Warning message:
In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
> rec = zenodo$getRecordByConceptDOI("10.5281/zenodo.2547036")
Warning messages:
1: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
2: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables BTW Warnings by
|
@florisvdh I've read carefully your notes/requirements and update R code accordingly:
Fixed in #39 (will see if there is room for improvement based on Zenodo API in next release)
Fixed here in #31 Since many changes done in current milestone, i'm going to init a CRAN release for 0.4. Best |
Thanks for the follow-up @eblondel ! 👍 Thank you for providing a solution to the parallel download. I prepared a small PR (#40) for you to get rid of the extra dependencies. Looking forward to the CRAN release! Below is current behaviour, which works well indeed. Some stuff for later track if you like:
Code and output> download_zenodo("10.5281/zenodo.2547036")
[zen4R][INFO] ZenodoRecord - Download in sequential mode
[zen4R][INFO] ZenodoRecord - Will download 2 files from record '3378733' (doi: '10.5281/zenodo.3378733') - total size: 56.9 KiB
[zen4R][INFO] Downloading file 'zen4R-0.3.tar.gz' - size: 24.8 KiB
trying URL 'https://zenodo.org/api/files/c8a4b50b-27ce-4a03-85aa-27c631219b98/zen4R-0.3.tar.gz'
Content type 'application/octet-stream' length 25350 bytes (24 KB)
==================================================
downloaded 24 KB
[zen4R][INFO] Downloading file 'zen4R-0.3.zip' - size: 32.2 KiB
trying URL 'https://zenodo.org/api/files/c8a4b50b-27ce-4a03-85aa-27c631219b98/zen4R-0.3.zip'
Content type 'application/octet-stream' length 32957 bytes (32 KB)
==================================================
downloaded 32 KB
[zen4R][INFO] Files downloaded at '/media/floris/DATA/git_repositories/zen4R'.
[zen4R][INFO] ZenodoRecord - Verifying file integrity...
[zen4R][INFO] File 'zen4R-0.3.tar.gz': integrity verified (md5sum: 66c585a0398d81b741c19029292c7e3f)
[zen4R][INFO] File 'zen4R-0.3.zip': integrity verified (md5sum: be1ce3a0e52f83ab1c42fa058d6b5451)
[zen4R][INFO] ZenodoRecord - End of download
Warning messages:
1: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
2: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
3: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
4: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
5: In Sys.setlocale("LC_TIME", "us_US") :
OS reports request to set locale to "us_US" cannot be honored
6: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
7: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
> download_zenodo("10.5281/zenodo.3630532")
[zen4R][INFO] ZenodoRecord - Download in sequential mode
[zen4R][INFO] ZenodoRecord - Will download 1 file from record '3836625' (doi: '10.5281/zenodo.3836625') - total size: 97.6 KiB
[zen4R][INFO] Downloading file 'inbo_watina-v0.3.0.zip' - size: 97.6 KiB
trying URL 'https://zenodo.org/api/files/28df5d2b-40f5-43d6-a2f0-822ec2270733/inbo/watina-v0.3.0.zip'
Content type 'application/octet-stream' length 99960 bytes (97 KB)
==================================================
downloaded 97 KB
[zen4R][INFO] File downloaded at '/media/floris/DATA/git_repositories/zen4R'.
[zen4R][INFO] ZenodoRecord - Verifying file integrity...
[zen4R][INFO] File 'inbo_watina-v0.3.0.zip': integrity verified (md5sum: 4c0f952cbd1e70195f957688428af960)
[zen4R][INFO] ZenodoRecord - End of download
Warning messages:
1: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
2: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
3: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
4: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
5: In Sys.setlocale("LC_TIME", "us_US") :
OS reports request to set locale to "us_US" cannot be honored
6: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
7: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
> download_zenodo("10.5281/zenodo.2547036",
+ parallel = TRUE, parallel_handler = parLapply, cl = makeCluster(2))
[zen4R][INFO] ZenodoRecord - Download in parallel mode
Error in rec$downloadFiles(path = path, quiet = quiet, ...) :
object 'parLapply' not found
In addition: Warning messages:
1: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
2: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
3: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
4: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
5: In Sys.setlocale("LC_TIME", "us_US") :
OS reports request to set locale to "us_US" cannot be honored
6: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
7: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables Session info─ Session info ──────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.0.2 (2020-06-22)
os Linux Mint 20
system x86_64, linux-gnu
ui RStudio
language nl_BE:nl
collate nl_BE.UTF-8
ctype nl_BE.UTF-8
tz Europe/Brussels
date 2020-09-02
─ Packages ──────────────────────────────────────────────────────────────────────────────────────────
! package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
backports 1.1.8 2020-06-17 [1] CRAN (R 4.0.2)
callr 3.4.3 2020-03-28 [1] CRAN (R 4.0.2)
cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.2)
crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
devtools 2.3.1 2020-07-21 [1] CRAN (R 4.0.2)
digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.2)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.2)
httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
jsonlite 1.7.0 2020-06-25 [1] CRAN (R 4.0.2)
keyring 1.1.0 2018-07-16 [1] CRAN (R 4.0.2)
magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.2)
pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 4.0.2)
pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.2)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
processx 3.4.3 2020-07-05 [1] CRAN (R 4.0.2)
ps 1.3.4 2020-08-11 [1] CRAN (R 4.0.2)
R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.2)
remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.2)
rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.2)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
testthat * 2.3.2 2020-03-02 [1] CRAN (R 4.0.2)
usethis 1.6.1 2020-04-29 [1] CRAN (R 4.0.2)
withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.2)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.2)
R zen4R * 0.4 <NA> [?] <NA>
[1] /home/floris/lib/R/library
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
R ── Package was removed from disk. |
Thanks, i've made some slight change regarding the 'slashed' case. Warnings: to check later what's happening there. Here it doesn't show up. zen4R 0.4 just submitted to CRAN team for revision. |
We recently wrote a function to download a Zenodo archive. See this discussion. The function is available through a package that bundles useful utilities for our institution but we feel it should better be moved to a focused package like
zen4R
.We have discussed this here, where we also refer to the zenodo package that has a similar goal as this one, but seems not very actively maintained anymore (- but still it might be a good idea to join forces?).
Our question is whether you think the
inborutils::download_zenodo()
could be a useful addition to your package.The text was updated successfully, but these errors were encountered: