Skip to content

Commit

Permalink
Added section on data processing
Browse files Browse the repository at this point in the history
  • Loading branch information
abner-hb committed Apr 8, 2024
1 parent 4cea2ed commit b18e8b0
Show file tree
Hide file tree
Showing 27 changed files with 1,157 additions and 3,440 deletions.
40 changes: 40 additions & 0 deletions 04_basic_data_processing.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Basic data processing

Now that we understand how **R** handles data, we can start working with pre-existing data files. These files need to be correctly formatted and in a file format that **R** can recognize. Don't worry, there are plenty of options.

The first step when loading data in **R** is to locate our working directory. This is the default location where **R** will look for files we want to load and where it will put any files we save. Working directory will vary on different computers. To determine which directory **R** is using as your working directory, run:

```{r get working directory}
getwd()
```

You can place data files straight into the folder that is your working directory, or you can move your working directory to where your data files are. You can move your working directory to any folder on your computer with the function `setwd()`. Just give `setwd()` the [file path](https://www.codecademy.com/resources/docs/general/file-paths) to your new working directory. I prefer to set my working directory to a folder dedicated to whichever project I am currently working on. That way I can keep all of my data, scripts, graphs, and reports in the same place. For example:

```{r}
#| eval: false
setwd("C:/Users/user_name/workshop_folder/learning_r/code")
```

You can also change your working directory by clicking on Session > Set Working Directory > Choose Directory in the **R**Studio menu bar. The Windows and Mac graphical user interfaces have similar options. If you start **R** from a UNIX command line (as on Linux machines), the working directory will be whichever directory you were in when you called R.

`list.files()` will show you what files are in your working directory. If the file that you want to open is in your working directory, then you are ready to proceed.

## Loading data

Once we know where to find data files in our computer, we can start loading them into **R**. Note, however, that we need specific ways to open different file formats.

### Plain text files

Plain-text files are simple and many programs can read them. This is why many organizations (e.g., the Census Bureau, the Social Security Administration, etc.) publish their data as plain-text files.

#### read.table

A plain-text file stores a table of data in a text document. Each row of the table is saved on its own line, and a simple symbol separates the cells within a row. This symbol is often a comma, but it can also be a tab, a pipe delimiter `|`, or any other character. Each file only uses one symbol to separate cells, which minimizes confusion.

## Cleaning data

## Data summaries and visualizations

## References

Most of this section is based on ["Hands-On Programming with R"](https://rstudio-education.github.io/hopr/), by Garret Grolemund; and on ["An Introduction to R"](https://intro2r.com/), by Alex Douglas, Deon Roos, Francesca Mancini, Ana Couto & David Lusseau.
1 change: 1 addition & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ book:
- 01_the_r_environment.qmd
- 02_getting_started_with_r.qmd
- 03_data_in_r.qmd
- 04_basic_data_processing.qmd
# - 03_using_scripts.qmd
# - 04_objects.qmd
- 05_data_for_analysis.qmd
Expand Down
97 changes: 97 additions & 0 deletions data_files/flower.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
treat,nitrogen,block,height,weight,leafarea,shootarea,flowers
tip,medium,1,7.5,7.62,11.7,31.9,1
tip,medium,1,10.7,12.14,14.1,46,10
tip,medium,1,11.2,12.76,7.1,66.7,10
tip,medium,1,10.4,8.78,11.9,20.3,1
tip,medium,1,10.4,13.58,14.5,26.9,4
tip,medium,1,9.8,10.08,12.2,72.7,9
tip,medium,1,6.9,10.11,13.2,43.1,7
tip,medium,1,9.4,10.28,14,28.5,6
tip,medium,2,10.4,10.48,10.5,57.8,5
tip,medium,2,12.3,13.48,16.1,36.9,8
tip,medium,2,10.4,13.18,11.1,56.8,12
tip,medium,2,11,11.56,12.6,31.3,6
tip,medium,2,7.1,8.16,29.6,9.7,2
tip,medium,2,6,11.22,13,16.4,3
tip,medium,2,9,10.2,10.8,90.1,6
tip,medium,2,4.5,12.55,13.4,14.4,6
tip,high,1,12.6,18.66,18.6,54,9
tip,high,1,10,18.07,16.9,90.5,3
tip,high,1,10,13.29,15.8,142.7,12
tip,high,1,8.5,14.33,13.2,91.4,5
tip,high,1,14.1,19.12,13.1,113.2,13
tip,high,1,10.1,15.49,12.6,77.2,12
tip,high,1,8.5,17.82,20.5,54.4,3
tip,high,1,6.5,17.13,24.1,147.4,6
tip,high,2,11.5,23.89,14.3,101.5,12
tip,high,2,7.7,14.77,17.2,104.5,4
tip,high,2,6.4,13.6,13.6,152.6,7
tip,high,2,8.8,16.58,16.7,100.1,9
tip,high,2,9.2,13.26,11.3,108,9
tip,high,2,6.2,17.32,11.6,85.9,5
tip,high,2,6.3,14.5,18.3,55.6,8
tip,high,2,17.2,19.2,10.9,89.9,14
tip,low,1,8,6.88,9.3,16.1,4
tip,low,1,8,10.23,11.9,88.1,4
tip,low,1,6.4,5.97,8.7,7.3,2
tip,low,1,7.6,13.05,7.2,47.2,8
tip,low,1,9.7,6.49,8.1,18,3
tip,low,1,12.3,11.27,13.7,28.7,5
tip,low,1,9.1,8.96,9.7,23.8,3
tip,low,1,8.9,11.48,11.1,39.4,7
tip,low,2,7.4,10.89,13.3,9.5,5
tip,low,2,3.1,8.74,16.1,39.1,3
tip,low,2,7.9,8.89,8.4,34.1,4
tip,low,2,8.8,9.39,7.1,38.9,4
tip,low,2,8.5,7.16,8.7,29.9,4
tip,low,2,5.6,8.1,10.1,5.8,2
tip,low,2,11.5,8.72,10.2,28.3,6
tip,low,2,5.8,8.04,5.8,30.7,7
notip,medium,1,5.6,11.03,18.6,49.9,8
notip,medium,1,5.3,9.29,11.5,82.3,6
notip,medium,1,7.5,13.6,13.6,122.2,11
notip,medium,1,4.1,12.58,13.9,136.6,11
notip,medium,1,3.5,12.93,16.6,109.3,3
notip,medium,1,8.5,10.04,12.3,113.6,4
notip,medium,1,4.9,6.89,8.2,52.9,3
notip,medium,1,2.5,14.85,17.5,77.8,10
notip,medium,2,5.4,11.36,17.8,104.6,12
notip,medium,2,3.9,9.07,9.6,90.4,7
notip,medium,2,5.8,10.18,15.7,88.8,6
notip,medium,2,4.5,13.68,14.8,125.5,9
notip,medium,2,8,11.43,12.6,43.2,14
notip,medium,2,1.8,10.47,11.8,120.8,9
notip,medium,2,2.2,10.7,15.3,97.1,7
notip,medium,2,3.9,12.97,17,97.5,5
notip,high,1,8.5,22.53,20.8,166.9,16
notip,high,1,8.5,17.33,19.8,184.4,12
notip,high,1,6.4,11.52,12.1,140.5,7
notip,high,1,1.2,18.24,16.6,148.1,7
notip,high,1,2.6,16.57,17.1,141.1,3
notip,high,1,10.9,17.22,49.2,189.6,17
notip,high,1,7.2,15.21,15.9,135,14
notip,high,1,2.1,19.15,15.6,176.7,6
notip,high,2,4.7,13.42,19.8,124.7,5
notip,high,2,5,16.82,17.3,182.5,15
notip,high,2,6.5,14,10.1,126.5,7
notip,high,2,2.6,18.88,16.4,181.5,14
notip,high,2,6,13.68,16.2,133.7,2
notip,high,2,9.3,18.75,18.4,181.1,16
notip,high,2,4.6,14.65,16.7,91.7,11
notip,high,2,5.2,17.7,19.1,181.1,8
notip,low,1,3.9,7.17,13.5,52.8,6
notip,low,1,2.3,7.28,13.8,32.8,6
notip,low,1,5.2,5.79,11,67.4,5
notip,low,1,2.2,9.97,9.6,63.1,2
notip,low,1,4.5,8.6,9.4,113.5,7
notip,low,1,1.8,6.01,17.6,46.2,4
notip,low,1,3,9.93,12,56.6,6
notip,low,1,3.7,7.03,7.9,36.7,5
notip,low,2,2.4,9.1,14.5,78.7,8
notip,low,2,5.7,9.05,9.6,63.2,6
notip,low,2,3.7,8.1,10.5,60.5,6
notip,low,2,3.2,7.45,14.1,38.1,4
notip,low,2,3.9,9.19,12.4,52.6,9
notip,low,2,3.3,8.92,11.6,55.2,6
notip,low,2,5.5,8.44,13.5,77.6,9
notip,low,2,4.4,10.6,16.2,63.3,6
20 changes: 13 additions & 7 deletions docs/01_the_r_environment.html
Original file line number Diff line number Diff line change
Expand Up @@ -127,47 +127,53 @@
<a href="./03_data_in_r.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Data in R</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./04_basic_data_processing.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Basic data processing</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./05_data_for_analysis.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Data for Analysis</span></span></a>
<span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Data for Analysis</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./06_student_t-tests.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Student t-tests</span></span></a>
<span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Student t-tests</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./07_chi-square_tests.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Chi-Square tests</span></span></a>
<span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Chi-Square tests</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./08_linear_models.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Linear Models</span></span></a>
<span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Linear Models</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./09_lists.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Lists</span></span></a>
<span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Lists</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./10_generalized_linear_models.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Generalized Linear Models (GLM)</span></span></a>
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Generalized Linear Models (GLM)</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./12_programming.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Programming</span></span></a>
<span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Programming</span></span></a>
</div>
</li>
</ul>
Expand Down
20 changes: 13 additions & 7 deletions docs/02_getting_started_with_r.html
Original file line number Diff line number Diff line change
Expand Up @@ -163,47 +163,53 @@
<a href="./03_data_in_r.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Data in R</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./04_basic_data_processing.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Basic data processing</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./05_data_for_analysis.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Data for Analysis</span></span></a>
<span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Data for Analysis</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./06_student_t-tests.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Student t-tests</span></span></a>
<span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Student t-tests</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./07_chi-square_tests.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Chi-Square tests</span></span></a>
<span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Chi-Square tests</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./08_linear_models.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Linear Models</span></span></a>
<span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Linear Models</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./09_lists.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Lists</span></span></a>
<span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Lists</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./10_generalized_linear_models.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Generalized Linear Models (GLM)</span></span></a>
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Generalized Linear Models (GLM)</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./12_programming.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Programming</span></span></a>
<span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Programming</span></span></a>
</div>
</li>
</ul>
Expand Down
26 changes: 16 additions & 10 deletions docs/03_data_in_r.html
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
<script src="site_libs/quarto-search/fuse.min.js"></script>
<script src="site_libs/quarto-search/quarto-search.js"></script>
<meta name="quarto:offset" content="./">
<link href="./05_data_for_analysis.html" rel="next">
<link href="./04_basic_data_processing.html" rel="next">
<link href="./02_getting_started_with_r.html" rel="prev">
<script src="site_libs/quarto-html/quarto.js"></script>
<script src="site_libs/quarto-html/popper.min.js"></script>
Expand Down Expand Up @@ -161,47 +161,53 @@
<a href="./03_data_in_r.html" class="sidebar-item-text sidebar-link active">
<span class="menu-text"><span class="chapter-number">3</span>&nbsp; <span class="chapter-title">Data in R</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./04_basic_data_processing.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Basic data processing</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./05_data_for_analysis.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Data for Analysis</span></span></a>
<span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Data for Analysis</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./06_student_t-tests.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">5</span>&nbsp; <span class="chapter-title">Student t-tests</span></span></a>
<span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Student t-tests</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./07_chi-square_tests.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">6</span>&nbsp; <span class="chapter-title">Chi-Square tests</span></span></a>
<span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Chi-Square tests</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./08_linear_models.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">7</span>&nbsp; <span class="chapter-title">Linear Models</span></span></a>
<span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Linear Models</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./09_lists.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">8</span>&nbsp; <span class="chapter-title">Lists</span></span></a>
<span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Lists</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./10_generalized_linear_models.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">9</span>&nbsp; <span class="chapter-title">Generalized Linear Models (GLM)</span></span></a>
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Generalized Linear Models (GLM)</span></span></a>
</div>
</li>
<li class="sidebar-item">
<div class="sidebar-item-container">
<a href="./12_programming.html" class="sidebar-item-text sidebar-link">
<span class="menu-text"><span class="chapter-number">10</span>&nbsp; <span class="chapter-title">Programming</span></span></a>
<span class="menu-text"><span class="chapter-number">11</span>&nbsp; <span class="chapter-title">Programming</span></span></a>
</div>
</li>
</ul>
Expand Down Expand Up @@ -1038,8 +1044,8 @@ <h2 data-number="3.3" class="anchored" data-anchor-id="references"><span class="
</a>
</div>
<div class="nav-page nav-page-next">
<a href="./05_data_for_analysis.html" class="pagination-link">
<span class="nav-page-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Data for Analysis</span></span> <i class="bi bi-arrow-right-short"></i>
<a href="./04_basic_data_processing.html" class="pagination-link">
<span class="nav-page-text"><span class="chapter-number">4</span>&nbsp; <span class="chapter-title">Basic data processing</span></span> <i class="bi bi-arrow-right-short"></i>
</a>
</div>
</nav>
Expand Down
Loading

0 comments on commit b18e8b0

Please sign in to comment.