Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read the functions that read your data #18

Open
aammd opened this issue Feb 22, 2016 · 0 comments
Open

Read the functions that read your data #18

aammd opened this issue Feb 22, 2016 · 0 comments

Comments

@aammd
Copy link
Member

aammd commented Feb 22, 2016

Hello @BIOL548O/all ,

start your week with some free R advice! 💻 💸

Everyone is doing a great job so far with their data cleaning scripts!
I've noticed that often, after reading in data, lots of time is being spent "fixing" the resulting table:

  • correcting the types of columns (e.g. turning a row of "characters" into "numeric")
  • fixing NA values (e.g. if your data has missing values recorded as something R thinks is a word, not a missing value)

A lot of these issues can be fixed most easily by mastering the function that reads in the data. There are three good ways to read in data that will cover most common cases in this class. Save yourself lots of time and read their help files!

  • ?read.delim (same as ?read.csv)
  • ?readxl::read_excel
  • ?readr::read_csv

For example:

column types

Let's say you have one character column (an id variable) and one numeric column (e.g. mass):

  • read.delim("data-raw/mydata.csv", colClasses = c("character", "numeric"))
  • readxl::read_excel("data-raw/mydata.csv", col_types = c("text", "numeric"))
  • there are two ways to use readr. see vignette("column-types"):
readr::read_csv("data-raw/mydata.csv", col_types = cols(first_col = col_character(),
                                                        secnd_col = col_double()))

or, equivalently

readr::read_csv("data-raw/mydata.csv", col_types = "cd")

specify your "missing" code:

All three functions also let you specify what your code for "missing" is. Let's say that in your data the missing value is coded "N/A":

read.delim("data-raw/mydata.csv", na.strings = "N/A")

readxl::read_excel("data-raw/mydata.csv", na = "N/A")

readr::read_csv("data-raw/mydata.csv", na = "N/A")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant