Homework 3 clarification #36

sandraemry · 2017-02-20T22:37:15Z

Do we add assertions to our script that cleans the raw data? Or should I read in my tidy data set and write assertions for that one?

Thanks!

Sandra

aammd · 2017-02-21T06:01:03Z

HI @sandraemry , that is a good question! I think both are just fine. Just make sure your reviewer knows where to find the assertions -- perhaps by labelling that section in your R script with a large comment

katcheung · 2017-02-22T18:37:20Z

Hi @aammd,
I tried to read in my tidied data and verify that certain columns were set up as factor but it fails. I verified with my original scripts from tidying my data that it was set up correctly. I seem to lose that information (eg. tank.no as a factor, etc.) in my saved csv file. Is this normal? If so, would we assume that we're continuing to work with the final product of our tidied script (assignment 2) and not reading in our tidied data csv file? Sorry if this is confusing.
Thanks,
Katherine

sandraemry · 2017-02-22T19:27:31Z

Hi @katcheung, you can read in your csv files with the columns specified with the type of data it is. So for me it would look like this:

mydata <- read_csv("./data/flowcam_sum_tidy.csv", col_types = cols(
temp = col_integer(),
litter = col_factor(c("H", "L")),
rep = col_integer(),
cell_density = col_integer(),
cell_volume = col_double(),
biomass = col_double()
))

Is that what you were asking about? Or maybe @aammd has a better solution?

aammd · 2017-02-23T06:54:42Z

Hi @sandraemry & @katcheung ,

I think Sandra has a good answer here! You're right, factors are created when a csv or other file is read into R. So if you change the way you are reading the file, you change the way the result is represented in R. Sandra's example code shows one way to control exactly how each column is read.

Another answer to your question @katcheung is that you can choose to work in a clean script (reading in your tidy CSV) or on the bottom of your old one. Just make sure it is clear for your peer reviewer.

LinneaSandell · 2017-02-28T19:54:25Z

@aammd Regarding the metadata, should we have it as a routine to only work with files with metadata? As an example, should I save all my datafiles as csvy? It doesn't seem very useful to have metadata only for one part of your script (you add metadata in 01_rscript, but read in the data as csv in 02_analyse_data?
Let me know for what files metadata should be attached, and when it it is optional.
Thank you.

aammd · 2017-02-28T21:46:30Z

@LinneaSandell this is an interesting question, and one we should return to in class! Briefly, I think that we are drawing a distinction here between "in progress" data and the "final version" of the dataset. So we add metadata only when we are "happy" with the way the dataset is organized. However, there are many other workflows that could be imagined, where metadata is created at the beginning, or in the middle, of a project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Homework 3 clarification #36

Homework 3 clarification #36

sandraemry commented Feb 20, 2017 •

edited

Loading

aammd commented Feb 21, 2017

katcheung commented Feb 22, 2017

sandraemry commented Feb 22, 2017

aammd commented Feb 23, 2017

LinneaSandell commented Feb 28, 2017

aammd commented Feb 28, 2017

Homework 3 clarification #36

Homework 3 clarification #36

Comments

sandraemry commented Feb 20, 2017 • edited Loading

aammd commented Feb 21, 2017

katcheung commented Feb 22, 2017

sandraemry commented Feb 22, 2017

aammd commented Feb 23, 2017

LinneaSandell commented Feb 28, 2017

aammd commented Feb 28, 2017

sandraemry commented Feb 20, 2017 •

edited

Loading