Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let's publish this! #17

Open
wants to merge 32 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
f89266e
Sunday - Monday
graciellehigino Jun 17, 2021
99554f4
why isn't the pdf embeding workiiinnnggg
graciellehigino Jun 18, 2021
b4eb604
:hatching_chick: initial commit
TanyaS08 Jun 29, 2021
7fe18ab
basic ideas are there...
TanyaS08 Jun 29, 2021
b33ae5a
start wednesday
graciellehigino Jun 29, 2021
aaee584
added most of the NB content
TanyaS08 Jun 30, 2021
d12dd32
updated self care task
TanyaS08 Jun 30, 2021
c9f5b6c
finish wednesday
graciellehigino Jun 30, 2021
f7edf6b
:arrow_right: migrated post to 08/07
TanyaS08 Jul 1, 2021
d9c94b7
text update
TanyaS08 Jul 1, 2021
6d5d7b6
:tada: gifs!!!
TanyaS08 Jul 1, 2021
c9c7afd
edits
TanyaS08 Jul 1, 2021
c266d0c
added reproducibility task
TanyaS08 Jul 1, 2021
a150787
added a note on retroffitng for {groundhog}
TanyaS08 Sep 23, 2021
b33c36b
post update
TanyaS08 Sep 26, 2021
c292539
some grammars while I was reading through
TanyaS08 Feb 15, 2022
1d948df
Merge branch 'master' into tanya_post
TanyaS08 Oct 24, 2022
e3dec21
:ship: move post over
TanyaS08 Oct 24, 2022
bd3783b
💄 formatting and grammars
TanyaS08 Oct 24, 2022
882ef3e
TanyaS08 Oct 24, 2022
2259963
🏗 build updated post
TanyaS08 Oct 24, 2022
0fd2b89
🐛 change chink options
TanyaS08 Oct 24, 2022
c15d5a1
spellings
TanyaS08 Oct 27, 2022
c2d72d5
formatTing
TanyaS08 Oct 27, 2022
04204d5
:racehorse: upgrade reproducibility task
TanyaS08 Oct 27, 2022
2555425
:sparkles: some suggested reading
TanyaS08 Oct 27, 2022
b3d4453
:put_litter_in_its_place: activity book stuff
TanyaS08 Oct 27, 2022
bbbb3e7
😓 .md formatting is hard
TanyaS08 Oct 27, 2022
a219c7f
Merge pull request #1 from graciellehigino/tanya_post
graciellehigino Oct 31, 2022
238bb05
Add note for Project Set-Up course
graciellehigino Jan 31, 2024
427c8fc
📝 grammars are the worst™
TanyaS08 Feb 5, 2024
15ea8db
Merge pull request #2 from graciellehigino/TanyaS08-patch-1
graciellehigino May 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
📝 grammars are the worst™
  • Loading branch information
TanyaS08 authored Feb 5, 2024
commit 427c8fc615771e0cf3a7880308698fc233093522
Original file line number Diff line number Diff line change
@@ -568,7 +568,7 @@ Do you already have all your manuscripts in a reproducible format? Congratulatio

## Why do we need to preserve our tools?

So you've commented, documented, and shared your code meaning that it's ready to be used by the rest of the world, right? Well maybe for now but you know what they say about time - *all hours wound; the last one kills*. Okay so it might not be that dramatic but there is of course the problem that as time progresses our code becomes out-dated and (worst case scenario) non-functional. Programming languages (and packages) are continually evolving as developers work at squashing bugs and making performance upgrades. Sometimes these upgrades might result in a fundamental change in how the a language or package functions _e.g._ a function name might change or some functionality will be removed in favour of another. This means that in a few years that beautifully documented chunk of code that we've written today might not even run.
So you've commented, documented, and shared your code meaning that it's ready to be used by the rest of the world, right? Well maybe for now but you know what they say about time - *all hours wound; the last one kills*. Okay that was maybe a bit dramatic but there is of course the problem that as time progresses and langages/packages are updated our code becomes out-dated and (worst case scenario) non-functional. Programming languages (and packages) are continually evolving as developers work at squashing bugs and making performance upgrades and sometimes these upgrades might result in a fundamental change in how the a language or package functions _e.g._ a function name might change or some functionality will be removed in favour of another. This means that in a few years that beautifully documented chunk of code that we've written today might not even run.

Oh dear...

@@ -578,23 +578,23 @@ Oh dear...

</center>

What this boils down to is that we need to not only think about documenting the code itself but also all the 'backend' features that make it tick _i.e._ not only what packages we're using but also what version. This can also extend to language and operating system (OS) type or version used.
What this boils down to is that we need to not only think about documenting the code itself but also all the 'backend' features that make it tick _i.e._ not only what packages we're using but also what versions. In the bigger scheme of things this should also extend to the version of the langauge you are using and even the OS (operating system)

Although this may seem daunting it's important to remember that the journey to
reproducibility is much like how one approaches eating an elephant - we take
it one bit~~e~~ at a time. So don't be afraid to take a little nibble before biting off more than you can chew.

## How do we _keep_ our work reproducible?

The good news is that there is a lot of functionality out there to help us on our reproducibility journey. Different languages have different ways we can document and 'keep' the package version that we are using. The main focus will be using `R` as it is the current *lingua franca* of most ecologists and it also straddles the middle ground between being very 'picky' like `python` and literally having a built in system like `Julia`.
The good news is that there is a lot of functionality out there to help us on our reproducibility journey. Different languages have different ways we can document and 'keep' the package version that we are using. The main focus will be using `R` as it is the current *lingua franca* of most ecologists and it also straddles the middle ground between being very 'picky' like `python` and literally having a built in (although not always perfect) system like `Julia`.

The big (language agnostic) take home message here though is that it's important to (at minimum) keep record of the versions of things you used if you want your work to work a few months/years down the line. By keeping a record of the package, software and OS versions used we give other users (and our future selves) a chance to recreate the environment that allowed our project/code to run should things change or be updated.

The three main approaches and packages I will discuss are `{groundhog}`, `{renv}` and, `docker`. There are of course other ways to document package versions but these are (somewhat user friendly) and will give you different 'levels' of reproducibility. It is of course also possible to mix and match these different platforms.
The three main approaches and packages we will discuss are `{groundhog}`, `{renv}` and, `docker`. There are of course other ways to document package versions but these are (somewhat) user friendly and will give you different 'levels' of reproducibility. It is of course also possible to mix and match these different platforms. SO lets dtart from the bottom and work our way up:

### `{groundhog}`

[`{groundhog}`](http://groundhogr.com/using/) is a relatively new kid on the block -and apparently refers to a film of the same name (no comment on my side as this is a facet of pop culture the eludes me). This is a super easy package to implement (think one function easy) and is a really nice way to 'retrofit' some of your older code.
[`{groundhog}`](http://groundhogr.com/using/) is a relatively new kid on the block - and apparently refers to a film of the same name (no comment on my side as this is a facet of pop culture the eludes me). This is a super easy package to implement (think one function easy) and is a really nice way to 'retrofit' some of your older code.

**How it works:** Essentially `{groundhog}` will install the version of a package that was available on CRAN for a specified date. This is done by 'replacing' the `library("package")` with `groundhog.library("package", date)`. This means its easy to go back and set a more suitable date for your script e.g. maybe the date it was created or last time it was saved.

@@ -615,39 +615,39 @@ groundhog.library(pkgs, groundhog.day)

```

**Limitations:** Although `{groundhog}` will call the correct/desired packages version there is of course the potential problem that that package version is no longer compatible with the version of `R` that you're running on your machine --- this means you might have to have multiple version of `R` on you machine and have to switch between them depending on what project you're using. Another issue could arise when retrofitting your workflow. Although you might have a starting date/groundhog day you might not have been using the most up-to-date version available at that date - so you would be retrieving the wrong version.
**Limitations:** Although `{groundhog}` will call the correct/desired packages version there is of course the potential problem that that package version is no longer compatible with the version of `R` that you're running on your machine --- this means you might have to have multiple version of `R` on you machine and have to switch between them depending on what project you're using. Another issue could arise when retrofitting your workflow. Although you might have a starting date/groundhog day you might not have been using the most up-to-date version available at that date - so you would still be retrieving the wrong version.

**Pros:** To end on a positive note though - {groundhog} is at least a solid starting point for documenting package version _and_ its very easy to implement, especially if you are retrofitting your code.

### `{renv}`

As highlighted above one of the potential issues with {groundhog} is that you might run into language version incompatibility - and by extension still have non-working code (bleak). Enter [`{renv}`](https://rstudio.github.io/renv/articles/renv.html), a handy-dandy, easy to use, dependency management package for your projects. `{renv}` records both `R` and package versions through a series of user called functions. This is very similar to `Julia` where all packages are 'stored' in `Project.toml`. `{renv}` works by crawling through your project directory and recording package version and dependencies in use. This is then saved in the `renv.lock` file and is used to 'load' the project state further down the line.
As highlighted above one of the potential issues with {groundhog} is that you might run into language version incompatibility - and by extension still have non-working code (bleak). Enter [`{renv}`](https://rstudio.github.io/renv/articles/renv.html), a handy-dandy, easy to use, dependency management package for your projects. `{renv}` records both the `R` and package versions through a series of user called functions. This is very similar to `Julia` where all packages are 'stored' in `Project.toml`. `{renv}` works by crawling through your project directory and recording package version and dependencies in use. This is then saved in the `renv.lock` file and is used to 'load' the project state further down the line.

**How it works:** The bare bones overview is that you 1) initialise the project-local environment using `renv::init()`, 2) continue tinkering as you go, 3) call `renv::snapshot()` to update `renv.lock` with any new additions, and 4) if things broke along the way you can call `renv::restore()` to revert back to the previous project state you had saved in your lock file (which hopefully did run).

**Limitations:** One limitation is that `{renv}` relies on you saving a _currently_ working/functioning state (if you want recall it and have it to work in the future). This makes it a bit tricky to try and quickly 'fix' old code - something that `{groundhog}` is probably more suited for, whereas `{renv}` is a solid choice when starting a new project form scratch.
**Limitations:** One limitation is that `{renv}` relies on you saving a _currently_ working/functioning state (if you want to recall it and have it work in the future). This makes it a bit tricky to try and quickly 'fix' old code - something that `{groundhog}` is probably more suited for, whereas `{renv}` is a solid choice when starting a new project form scratch.

**Pros:** `{renv}` saves both package and `R` versions - which is great as it 'doubles down' on having things work in harmony. It is also very easy to use - once again you can get away by using a few lines of code.
**Pros:** `{renv}` saves both package and `R` versions - which is great as it 'doubles down' on having things work in harmony. It is also very easy to use - once again you can get away by using a few lines of code. This makes it a really useful tool to try and make an unconsious part of your day to day coding workflow.

### Docker

Docker, a term that can strike trepidation in even some of the most hardened of researchers (although they have the cutest whale as a logo and that 100% drops the scary factor if you as me). Briefly Docker is a program that allows you to host different mini computers on your computer. This of course means its not just an R-specific tool but one that could probably cover a lot of reproducibility bases for most languages. But there is a reason this is last on the list and that is because it takes a bit more work to implement. So think of this as a long-term project/goal to set yourself up for.
Docker, a term that can strike trepidation in even some of the most hardened of researchers (although they have the cutest whale as a logo and that 100% drops the scary factor if you as me). Briefly Docker is a program that allows you to host what are essentially different mini computers on your computer. This of course means its not just an R-specific tool but one that can cover a lot of reproducibility bases for most languages. But there is a reason this is last on the list and that is because it takes a bit more work to implement. So think of this as a long-term project/goal to set yourself up for.

**How it works:** As I said earlier with Docker you can run multiple mini computers (containers) built from an 'image' of your machine (the host). The catch though - you need to build the image from scratch from OS all the way through to you specific script/code chunk. These build instructions are contained in a `Dockerfile` - which you save in your working directory. Inside this file is the 'recipe' for building your image (and spoiler alert it looks a lot like a series of command line calls). Colin Fay wrote [this](https://colinfay.me/docker-r-reproducibility/) really nice blog about using docker and `R` for beginners. If your interested I suggest starting there! Alternatively `{renv}` also plays well with Docker - have a look at [this vignette](https://rstudio.github.io/renv/articles/docker.html)
**How it works:** As mentioned earlier with Docker you can run multiple mini computers (containers) built from an 'image' of your machine (the host). The catch though - you need to build the image from scratch from OS all the way through to your specific script/code chunk. These build instructions are contained in a `Dockerfile` - which you save in your working directory. Inside this file is the 'recipe' for building your image (and spoiler alert it looks a lot like a series of command line calls). Colin Fay wrote [this](https://colinfay.me/docker-r-reproducibility/) really nice blog about using docker and `R` for beginners. If you're interested I suggest starting there! Alternatively `{renv}` also plays well with Docker - have a look at [this vignette](https://rstudio.github.io/renv/articles/docker.html)

**Limitations:** In the context of what has been discussed in this post Docker is _hard_ yo! In order to write a Docker file you will benefit a lot from being comfortable using and thinking of things in terms of command line. Since you are 'creating' you mini computer you need to install a lot of moving parts and components. This means you might be moving from your comfort zone when it comes to programming and could put you off trying the whole reproducibility thing all together. So set realistic expectations here and don't be too hard on yourself!
**Limitations:** In the context of what has been discussed in this post Docker is _hard_ yo! In order to write a Docker file you will benefit a lot from being comfortable using and thinking of things in terms of command line. Since you are 'creating' you mini computer you need to install a lot of moving parts and components. This means you might be moving from your comfort zone when it comes to programming, which could put you off trying the whole reproducibility thing all together. So set realistic expectations here and don't be too hard on yourself!

**Pros:** Docker is very flexible! You can build your mini computer to your specifications and keep your 'normal computer' intact. For example if I am running MacOS, `R` 3.5 on my normal computer but can build an image that runs Linux and `R` 3.1. Also because the recipe is contained in the `Dockerfile` anyone can build the image for that project on their machine and have it all 'just' work (avoiding the whole 'but it works on my machine' scenario).
**Pros:** Docker is very flexible! You can build your mini computer to your specifications and keep your 'normal computer' intact. For example if I am running MacOS, `R` 3.5 on my normal computer I can also build an image that runs Linux and `R` 3.1. Also because the recipe is contained in the `Dockerfile` anyone can build the image for that project on their machine and have it all 'just' work (avoiding the whole 'but it works on my machine' scenario).

## Closing thoughts

If you want to keep your project pipeline working in the long-term it is important to account for the fact that languages are evolving - which means the scaffold on which your code rests also needs to be documented in some way. That being said asking yourself as to how _paramount_ the longevity of your project is a good way to identify and allocate resources to documenting and accommodating for this. For smaller projects you could probably get away with a simple documentation process e.g. `Julia`'s `Project.toml` system or `{renv}` for `R`. But if the longevity of the project is of high importance it's probably recommended to give something like Docker a try.

## Reproducibility task of the day

First sit down and think about your project and how important longevity is. Do future generations depend on your code being able to run and execute tasks flawlessly? Or it it more important that the workflow is well documented and understood _i.e._ it could be easily be 'translated' to the shiny new programming language people are using?
First sit down and think about your project and how important its longevity is. Do future generations depend on your code being able to run and execute tasks flawlessly? Or it it more important that the workflow is well documented and understood _i.e._ it could be easily be 'translated' to the shiny new programming language people are using?

Pick and choose the task(s) that you want to take on (or remix one of them)
Pick and choose the task(s) that you want to take on (or remix) one of them.

1. Open one of the older projects on you computer. Does the code run? If no see if you can retrofit it using {groundhog}