From 0bfabcb49c194e4283856033a270af575c88f479 Mon Sep 17 00:00:00 2001 From: Jonathan J Chipman Date: Mon, 26 Aug 2024 23:57:40 -0600 Subject: [PATCH] Update slides.html --- 02-essentials/slides.html | 5433 +++++++++++++++++-------------------- 1 file changed, 2426 insertions(+), 3007 deletions(-) diff --git a/02-essentials/slides.html b/02-essentials/slides.html index 166a98b..6649b2b 100644 --- a/02-essentials/slides.html +++ b/02-essentials/slides.html @@ -1,5 +1,62 @@ - + + + + + + + + + +R Essentials for advanced computing + + + - + - - - - - - - - - - - - - R Essentials for advanced computing - - - - - - - - - - - - - - - -
-
- -
-

R Essentials for advanced computing

- -
-
-
-Jonathan Chipman, Ph.D. -
-
-
- -
-
-
-

Introduction

- -
-
-

Collaborative learning

-

Google Doc for sharing code/notes

-
-
-

Preparation

-

Read/Watch: Selections from R Programming for Data Science

- -

Read: Selections from The Art of R Programming

- -

Note: Big-picture principles in chapter 2, on vectors, have relevance to matrixes, lists, and data frames.

-

Aside: A reference for good coding practices

-
-
-

Warm-up problem

-

Install the package beepr and run the command beepr::beep(). beepr::beep(k) can play k=1-11 sounds. (See ?beep for a list of sounds).

-

A. Write loops and functions.

-
    -
  1. Write a loop to listen to each sound. Use Sys.sleep() to pause 2 seconds between each call to beep.

  2. -
  3. Modify the loop to pause a random duration of time. You can set your own parameters for ‘what a random duration of time’ means.

  4. -
-

B. How can the beepr::beep() function be helpful? What does this say about R being a scripting language?

-

C. After calling library(beepr), the beep function can be directly called as beep(). Why can it be helpful to call beepr::beep()?

-

D. (If time allows) Create a function that takes two inputs: a numeric vector and sound. Write a loop that one-at-a-time calculates the cumulative sum each element of the vector (don’t use the cumsum function). Play a beep at the end of calculation using the sound input. The output should be a two-column matrix with the original vector (column 1) and the cumsum (column 2). Check you answer using the cumsum function.

-

Note: Question D is designed to practice working with loops and building up the output result. One-at-a-time calculations discouraged whenever avoidable. In practice, it would be better to use the cumsum function.

-
-
-

This week’s lesson

-

Focus on foundations

-

Many of the this week’s topics will be familiar, and it may be tempting to gloss over. However, there are important foundational concepts which can strengthen understanding and efficient coding.

-

Key concepts

-

In addition to general concepts, the below strategies improve efficient, reproducible code:

-
    -
  1. Use matrices rather than data.frames whenever possible; they use less memory
  2. -
  3. Use names for indexing and filtering; it is transparent and less error-prone
  4. -
  5. Pre-initialize data-structures to be filled rather than saving over existing objects
  6. -
  7. Consider if there are ways to reduce unnecesary calculations
  8. -
-
-
-
-

R and RStudio

- -
-
-

What is R?

-

R is an object oriented-, functional-, and scripting-programming language.

-

Object-oriented programming language

-
    -
  • R Manual: Objects
  • -
  • Everything in R is an object.
  • -
  • Data are stored in objects (vectors, matrixes, lists, data.frames) and manipulated using objects (functions).
    -
  • -
  • Objects: -
      -
    • Are assigned a value via <- (preferred), =, or ->.
    • -
    • Have basic, intrinsic properties (aka attributes): mode (data type) and length
    • -
    • May have additional attributes such as (list from link) -
        -
      • class (a character vector with the classes that an object inherits from).
      • -
      • comment
      • -
      • dim (which is used to implement arrays)
      • -
      • dimnames
      • -
      • names (to label the elements of a vector or a list).
      • -
      • row.names
      • -
      • levels (for factors)
      • -
    • -
    • Have a class which may behave differently for generic functions (such as plot and summary) … we’ll discuss classes later.
    • -
  • -
-

The attributes of an object can be seen through str() and attributes():

-
-
-
, , Sex = Male
-
-       Eye
-Hair    Brown Blue Hazel Green
-  Black    32   11    10     3
-  Brown    53   50    25    15
-  Red      10   10     7     7
-  Blond     3   30     5     8
-
-, , Sex = Female
-
-       Eye
-Hair    Brown Blue Hazel Green
-  Black    36    9     5     2
-  Brown    66   34    29    14
-  Red      16    7     7     7
-  Blond     4   64     5     8
-
-
-
 'table' num [1:4, 1:4, 1:2] 32 53 10 3 11 50 10 30 10 25 ...
- - attr(*, "dimnames")=List of 3
-  ..$ Hair: chr [1:4] "Black" "Brown" "Red" "Blond"
-  ..$ Eye : chr [1:4] "Brown" "Blue" "Hazel" "Green"
-  ..$ Sex : chr [1:2] "Male" "Female"
+$text +[1] "fun4"
-
-

Functional programming language

-
    -
  • Perform operations on object(s) (ex: sum, '+', rnorm)

  • -
  • Functions come pre-intalled (base), installed, and custom-defined

  • -
  • Using functions is a major theme of good R programming. Avoid explicit iteration (loops and copy-paste) as much as possible. [Matloff (pg xxii)]:

    -
      -
    • Clearer, more compact code
    • -
    • Potentially must faster execution speed
    • -
    • Less debugging, because the code is simpler
    • -
    • Easier transition to parallel programming
    • -
  • -
  • R Manual: Writing-your-own-functions

    -
    # Example of base function (with R installation)
    -1+1
    -
    -# Example of installed function from a package
    -install.packages("beepr")
    -beepr::beep(0)
    -
    -# General structure of custom function
    -< name of your function > <- function(
    -    < argument 1>,
    -    < argument 2> = < default value >,
    -    ...,
    -    < argument n>
    -) {
    -
    -  < some R code here >
    -
    -  return(< some result >)
    -
    -}
    -
    # Example of custom function
    -fun <- function(v1,v2) {v1+v2}
    -fun(1,2)
  • +
  • Scripting language

    • R Manual: Scripting-with-R
    • A script is run top-to-bottom. It is reproducible and transparent.
    • Can be run in ‘interactive’ (ex: RStudio) and ‘batch’ modes (ex: command-line call)
    • +
    +
    % Rscript file.R > output.out
    +
  • -
    -

    What is RStudio?

    +
    +
    +

    What is RStudio?

    • An Integrated Development Environment (IDE) to organize and facilitate common tasks coding in R
    -

    -

    -

    -

    -

    -

    -

    -

    -

    +

    +

    +

    +

    +

    +

    +

    +

    +

    +
    -
    - -
    -
    -
    +
    +

    Data modes and structures

    @@ -4432,9 +3373,8 @@

    Data modes and structures

    -
    -
    -

    Modes

    +
    +

    Modes

    Six primitive modes (data types) of R:

    2 Modes less-commonly (for me never) created in practice:

      @@ -4452,30 +3392,143 @@

      Modes

    • Character / String x <- "hello world
    -
    -

    Structures

    +
    +

    Structures

    Data are stored in any of the following structures (starting from most primitive):

    -
      -
    • vector: A set of elements having the same mode (data type).
    • -
    • matrix/array: A vector with dimensions. -
        -
      • Matrices have rows and columns.
        -
      • -
      • Arrays can have any number of dimensions.
      • -
    • -
    • list: A set of elements which may have different modes / structures
    • -
    • data.frames: A list shaped like a matrix
    • -
    + +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    FeatureVectorsMatricesListsData Frames
    Memory UsageLowLow to ModerateModerateModerate to High
    Dimensionality1D (vector)2D (rows and columns)1D (but can contain elements of varying dimensions)2D (rows and columns)
    HomogeneityHomogeneous (all elements of the same mode)Homogeneous (all elements of the same mode)Heterogeneous (elements can be of different modes and structures)Heterogeneous (different columns can be different modes)
    Creationc(), seq(), rep()matrix(), cbind(), rbind()list()data.frame()
    Combining Elementsc()cbind(), rbind()c(), append()cbind(), rbind()
    IndexingSingle bracket []Double bracket [ , ]Double bracket [[ ]], Single bracket [ ]$, [[ ]], or [ ]
    Arithmetic OperationsYesYesNo (operations on elements)Not directly (need to extract columns first)
    AdvantagesMemory-efficient, fast for operationsEfficient for mathematical and matrix operationsHighly flexible, can store complex structuresOptimized for tabular data, handles mixed types
    DisadvantagesLimited to a single data typeLimited to a single data typeHigher memory usage for large/complex elementsSlightly higher memory overhead due to attributes
    UsageSimple collections like numerical dataMathematical computations, matricesGrouping mixed-type data, complex structuresTabular data, similar to Excel spreadsheets
    +
    +
    +

    Questions 1

    +
      +
    1. What structure would you want to perform a computationally-heavy task?

    2. +
    3. What structure would you want to read in data from a colleague?

    4. +
    5. What kind of object does the following return?

    6. +
    +
    +
    y <- 1:10 
    +x <- 1:10
    +f <- lm(y~x)
    +
    +
    +
    mode(f)
    +
    +
    +
    names(f)
    +
    +
     [1] "coefficients"  "residuals"     "effects"       "rank"         
    + [5] "fitted.values" "assign"        "qr"            "df.residual"  
    + [9] "xlevels"       "call"          "terms"         "model"        
    +
    +
    +
    +

    Examples

    +
    # Numeric vector
    +x <- 1:10
    +x
     [1]  1  2  3  4  5  6  7  8  9 10
    +
    # Matrix of numerics with rownames 1, 2, and 3
    +rbind("1"=x,"2"=x,"3"=x)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
     1    1    2    3    4    5    6    7    8    9    10
     2    1    2    3    4    5    6    7    8    9    10
     3    1    2    3    4    5    6    7    8    9    10
    +
    # List with element names 1, 2, and 3
    +x <- list("1"=1, "2"=2, "3"="A")
    +x
    $`1`
     [1] 1
    @@ -4486,67 +3539,68 @@ 

    Structures

    $`3` [1] "A"
    +
    # data.frame
    +as.data.frame(x)
      X1 X2 X3
     1  1  2  A
    +
    +
    +

    Atomic modes

    Back to modes … The 6 primitive modes are also called ‘atomic’ modes. This means that when data are stored together in a vector, they must be of the same mode.

    When modes are mixed (ex: x <- c("A",0,1L,TRUE)), R will force all elements to be of the same mode.

    Question: What is your guess for which modes receive greatest priority?

    -
    -

    Missing values

    -

    R has different types of missing values:

    +
    +
    +

    Missing values

    +

    R has different types of missing values (in order of most primitive):

      -
    • NA: no information, has length 1,

    • NULL: which has length 0,

    • -
    • Inf: Infinite, and

    • NaN: Not a Number

    • +
    • NA: no information, has length 1,

    • +
    • Inf: Infinite, and

    • These have companion functions is.na(), is.null, is.infinite (or is.finite(), which covers NA, Inf, and NaN), and is.nan.

    -
    -

    Questions 1

    -
      -
    1. What is the mode of the following vector myVector <- c(NA, NaN, Inf)? (First try to answer without coding, then check using the mode() function in R)

    2. -
    3. The c() function can be used with other vectors, for example

      -

      What is the mode of the vector c(myStringVector, myStringVector)?

    4. -
    5. What do each one of the functions is.na, is.null, is.finite, is.infinite, is.nan return on the vector myVector?

    6. -
    7. What are the attributes of the following object myMat <- matrix(runif(12), ncol=4)? What are the attributes of myNumericVector and how can you make sense of the attributes?

    8. -
    -
    -
    -
    -

    Vectors and matrixes

    +
    +

    Vectors and matrixes

    Why to use vectors and matrices: they use less memory than lists and data.frame.

    -

    Efficient programming in R matrix vs data table vs data frame

    Get into a habit of using vectors and matrices / arrays as much as possible.

    -
    -
    -

    Creating vectors

    +
    +

    Creating vectors

      -
    • To initialize an empty vector, use vector, rep(NA,<length>), numeric(<length>), or character(<length>). (JC) I commonly use rep(NA,<length>). +
    • To initialize an empty vector, use vector, rep(NA,<length>), numeric(<length>), or character(<length>).
        +
      • JC aside) I commonly use rep(NA,<length>) to ‘pre-allocate’ a vector to future results.
      • A question to keep in back of mind (we will revisit when talking about loops) … Why would you want to initialize an empty vector?
    +
    # vector(< mode >, < length >)
    +vector("character",2)
    [1] "" ""
    +
    vector("numeric",2)
    [1] 0 0
    +
    vector("logical",2)
    [1] FALSE FALSE
    +
    rep(NA,2)
    [1] NA NA
    +
    numeric(2)
    [1] 0 0
    +
    character(2)
    [1] "" ""
    @@ -4555,27 +3609,54 @@

    Creating vectors

  • Other common methods to create numeric vectors include combine function, c(), :, seq, and rep:
  • -
    > [1] 1 2 3 4
    ->  [1]  1  2  3  4  5  6  7  8  9 10
    ->  [1]  1  2  3  4  5  6  7  8  9 10
    -> [1] 1 3 5 7 9
    -> [1]  1.0  5.5 10.0
    ->  [1] 1 1 1 1 1 2 2 2 2 2
    ->  [1] 1 2 1 2 1 2 1 2 1 2
    +
    # Combine 3 numeric vectors each with length 1
    +c(1,2,3,4)
    +> [1] 1 2 3 4
    +
    +# Vector of sequential numerics
    +x <- 1:10
    +x
    +>  [1]  1  2  3  4  5  6  7  8  9 10
    +
    +# Vector of sequential numerics from 1 to length(x)
    +seq(x)
    +>  [1]  1  2  3  4  5  6  7  8  9 10
    +
    +# Vector of sequential numerics from 1 to 10 by 2
    +seq(1,10,by=2)
    +> [1] 1 3 5 7 9
    +
    +# Vector of sequential numerics from 1 to 10 divided equally into 3 elements
    +seq(1,10,length.out=3)
    +> [1]  1.0  5.5 10.0
    +
    +# Vector of repeated numerics
    +rep(1:2,each=5)
    +>  [1] 1 1 1 1 1 2 2 2 2 2
    +
    +# Vector of repeated numerics
    +rep(1:2,times=5)
    +>  [1] 1 2 1 2 1 2 1 2 1 2
    +
    +# Initialize an empty vector to 'pre-allocate' memory for future results
    +out <- rep(NA,10)
    -
    -

    Creating matrices

    +
    +

    Creating matrices

    • Three common ways to create matrices: matrix, cbind, rbind.
    +
    x <- 1:10
    +rbind(x,x,x)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
     x    1    2    3    4    5    6    7    8    9    10
     x    1    2    3    4    5    6    7    8    9    10
     x    1    2    3    4    5    6    7    8    9    10
    +
    cbind(x,x,x)
           x  x  x
      [1,]  1  1  1
    @@ -4589,87 +3670,117 @@ 

    Creating matrices

    [9,] 9 9 9 [10,] 10 10 10
    +
    matrix(x,nrow=2)
         [,1] [,2] [,3] [,4] [,5]
     [1,]    1    3    5    7    9
     [2,]    2    4    6    8   10
    - -
    -
    -

    Naming and indexing

    +
    +

    Naming and indexing vectors and matrices

    Elements of vectors and matrices can be extracted through [<position(s)>] or [<name(s)>].

    +
    # Vector[k] pulls out the element(s) indexed by k
    +x <- 1:10
    +x[3]
    [1] 3
    +
    x[c(5:3)]
    [1] 5 4 3
    +
    names(x) <- 2011:2020
    +x
    2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 
        1    2    3    4    5    6    7    8    9   10 
    +
    x[c("2015","2017")]
    2015 2017 
        5    7 
    +
    # Matrix[r,c] pulls out the element(s) indexed by row(s) r and columns (c)
    +x <- matrix(1:10,nrow=2)
    +x
         [,1] [,2] [,3] [,4] [,5]
     [1,]    1    3    5    7    9
     [2,]    2    4    6    8   10
    +
    x[1,c(2,5)]
    [1] 3 9
    +
    rownames(x) <- 2010:2011
    +colnames(x) <- c("SLC","Murray","Bountiful","Milcreek","Sandy")
    +x
         SLC Murray Bountiful Milcreek Sandy
     2010   1      3         5        7     9
     2011   2      4         6        8    10
    +
    x[,c("SLC","Milcreek")]
         SLC Milcreek
     2010   1        7
     2011   2        8
    +
    # Beware of dimension reduction
    +dim(x[,c("SLC","Milcreek")])
    [1] 2 2
    +
    dim(x["2010",c("SLC","Milcreek")])
    NULL

    In the last example, why is the dimension NULL? (Hint, what is the class of the last two examples?)

    -
    -

    Questions 2

    +
    +
    +

    Questions 2

    1. What is the number of the alphabet for each letter of your name? Use a vector with names (try the `LETTERS’ object). For example, if the letters to the name JONATHAN were mapped to integers, the result would be: 10, 15, 14, 1, 20, 8, 1, 14.

    2. Why is it important extract elements through naming conventions?

    -
    -
    -
    -

    Lists

    +
    +
    +

    Lists

    Why to use lists?

    • Lists are useful for returning output from functions. For example, the output of lm is a list. (Aside, it is also of the class lm which has a specific behavior when calling “generic” functions such as print and summary).
    +
    mode(f)
    +
    +
    [1] "list"
    +
    +
    names(f)
     [1] "coefficients"  "residuals"     "effects"       "rank"         
      [5] "fitted.values" "assign"        "qr"            "df.residual"  
    - [9] "call"          "terms"         "model"        
    + [9] "xlevels" "call" "terms" "model"
    +
    # Two equivalent summary statements
    +# summary(f)          
    +# summary.lm(f)
    +
    +# Two equivalent statements to get predicted values
    +# predict(f)
    +# predict.lm(f)
    -
    -
    -

    Creating lists

    +
    +

    Creating lists

    +
    x <- list(1:2,"A",NULL,c(TRUE,FALSE),list(1:10,"B"))
    +x
    [[1]]
     [1] 1 2
    @@ -4690,6 +3801,8 @@ 

    Creating lists

    [[5]][[2]] [1] "B"
    +
    x <- list("2010"=1,"2011"="A","2012"=NULL,"2013"=TRUE)
    +x
    $`2010`
     [1] 1
    @@ -4705,10 +3818,13 @@ 

    Creating lists

    -
    -

    Naming and indexing

    +
    +

    Naming and indexing lists

    Names can be set as in the example above, or through the names argument.

    +
    x <- list(1,2,3)
    +names(x) <- c("A","B","C")
    +x
    $A
     [1] 1
    @@ -4722,6 +3838,12 @@ 

    Naming and indexing

    List elements can be extracted using [], [[]], or $. See Subsetting Lists.

    +
    x <- list("2010"=1:2,
    +          "2011"="A",
    +          "2012"=NULL,
    +          "2013"=c(TRUE,FALSE),
    +          "2014"=list("H1"=1:10,"H2"="B"))
    +x
    $`2010`
     [1] 1 2
    @@ -4742,6 +3864,8 @@ 

    Naming and indexing

    $`2014`$H2 [1] "B"
    +
    # Extract multiple elements
    +x[as.character(2010:2012)]
    $`2010`
     [1] 1 2
    @@ -4752,16 +3876,21 @@ 

    Naming and indexing

    $`2012` NULL
    +
    # Extract single elements
    +x[["2010"]]
    [1] 1 2
    +
    # Another option to extract by name
    +x$"2010"
    [1] 1 2
    -
    -
    -

    data.frames

    +
    +
    +
    +

    data.frames

    Why to use data.frames?

    • data.frames are useful in data analysis.
      @@ -4770,6 +3899,7 @@

      data.frames

    • A helpful data.frame comes from expand.grid. How can this be helpful in simulations?
    +
    expand.grid("Param 1"=1:2,"Param 2"=letters[1:3])
      Param 1 Param 2
     1       1       a
    @@ -4782,9 +3912,9 @@ 

    data.frames

    data.frames can be named and indexed similarly as above for lists. (A data.frame is a list.)

    - -
    -

    Naming and indexing

    +
    +
    +

    Naming, indexing, and filtering

    Rule of thumb: When you want to extract information from a data structure, use names (rather than position indexes).

    • Transparent in code and output
    • @@ -4800,56 +3930,13 @@

      Naming and indexing

    • UpperCamelCase: e.g. SignatureMethod

    Strive for names that are concise and meaningful

    -

    Naming conventions are personal preference. My (JC) inclination is the following:

      -
    • I use lowerCamelCase for all objects and files. (Python uses the period to access functions and properties of objects).
    • Avoid naming an object that would overwrite an existing object. (Example, c <- 0 overwrites the c() function).
    • -
    • Clarity is better than brevity. (Example, s in s <- 0 could stand for a method name, simulation, etc.).
    • -
    -
    - -
    -

    Vectorizing

    -

    R Simultaneously performs the same operation across vector elements.

    -
      -
    • Behind the scenes R calls C code to do one-at-a-time calculation, but this is faster than doing one-at-a-time calculation in R.
    • +
    • Clarity is better than brevity unless well-documented. (Example, s in s <- 0 could stand for a method name, simulation, etc.).
    -

    Element-wise addition in one call

    -
    -
    -
     [1] 102 104 106 108 110 112 114 116 118 120
    -
    -
    -

    Element-wise logic assessments in one call

    -
    -
    -
     [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
    -
    -
    -

    The above is an example of R ‘re-cycling’ 5 to have the same length as x1. It is a strength and a caution.

    -
    -
    -
           x y
    - [1,]  1 2
    - [2,]  2 3
    - [3,]  3 4
    - [4,]  4 2
    - [5,]  5 3
    - [6,]  6 4
    - [7,]  7 2
    - [8,]  8 3
    - [9,]  9 4
    -[10,] 10 2
    -
    -
    -
     [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
    -
    -
    -

    Key point: Loops (to be discussed further) carry out one-at-a-time operations. Usually, a vectorized alternative to loops is to use *apply functions with loops. This will be discussed in more later in the class.

    +

    Naming conventions are personal preference. JC and GVY use underscore_separated.

    - -
    -
    +

    Filtering vectors

    Elements of a vector can be filtered through:

      @@ -4864,14 +3951,20 @@

      Filtering vectors

    Example, 10 patients were randomly assigned to control (0) or treatment (1). Suppose tx is the treatment assignment.

    +
    tx <- rep(0:1,times=5)
    +names(tx) <- 1:10
    +
    +tx[tx==1]
     2  4  6  8 10 
      1  1  1  1  1 
    +
    tx[c(1,3,7)]
    1 3 7 
     0 0 0 
    +
    tx["3"]
    3 
     0 
    @@ -4885,29 +3978,37 @@

    Filtering vectors

    +
    which(tx==1)
     2  4  6  8 10 
      2  4  6  8 10 
    +
    tx[which(tx==1)]
     2  4  6  8 10 
      1  1  1  1  1 
    +
    any(tx==1)
    [1] TRUE
    +
    all(tx==1)
    [1] FALSE
    -
    -
    -

    Questions 3

    +
    +

    Questions 3

    Suppose x is the number of vacations in the years 2010 - 2019

    +
    +
    set.seed(1)
    +x <- sample.int(n=7,size=10,replace=TRUE)
    +names(x) <- 2010:2019
    +
    1. How many vacations were taken on the first and ninth year?

    2. How many total vacations were taken on odd years? Use element names for solution.

    3. -
    4. How many total vacations were taken on even years? Use the seq function for solution.

    5. +
    6. How many total vacations were taken on even years? Use the seq function for solution. (seq(2010,2020,by=2))

    7. Why would you want to use element names whenever possible?

    @@ -4931,8 +4032,8 @@

    Questions 3

    - -

    –> –>

    + + @@ -4942,25 +4043,32 @@

    Questions 3

    -
    -
    -
    -

    Control Statements

    -

    “R is a block-structured language … delineated by braces, though braces are optional if the block consists of just a single statement. Statements are separated by newline characters or, optionally, by semicolons.” (Matlooff, page 139)

    -
    -

    If-then statements

    +
    +
    +

    If/Then Statements, Loops, and Functions (Control Statements)

    +

    “R is a block-structured language … delineated by braces, though braces are optional if the block consists of just a single statement. Statements are separated by newline characters or, optionally, by semicolons.” (Matlooff, page 139)

    +
    +

    If-then statements

    Conditional logic evaluates

    -
    > [1] "I'll take on the project"
    +
    x <- "yes"
    +if(x=="yes") {
    +  print("I'll take on the project") 
    +} else {
    +  print("Sorry, I can't take on the project")
    +}
    +> [1] "I'll take on the project"
    -

    The above if-else statement requires a single TRUE/FALSE evaluation. ifelse vectorizes conditional logic.

    +

    The above if-else statement requires a single TRUE/FALSE evaluation. ifelse “vectorizes” conditional logic.

    -
    >  [1] 0 0 0 0 0 1 1 1 1 1
    +
    x <- seq(1:10)
    +ifelse(test = x>5, yes = 1, no = 0)
    +>  [1] 0 0 0 0 0 1 1 1 1 1
    -
    -

    Loops

    +
    +

    Loops

    Loops iterate operations through a parameter saved in a vector. Possible loops include:

    • for: iterate through each value/element of a vector
    • @@ -4968,42 +4076,79 @@

      Loops

    • repeat: continue loop until a return or break statement
    -
    > [1] 1
    -> [1] 2
    -> [1] 3
    -> [1] 4
    -> [1] 5
    -> [1] 6
    -> [1] 7
    -> [1] 8
    -> [1] 9
    -> [1] 10
    -> [1] 1
    -> [1] 2
    -> [1] 3
    -> [1] 4
    -> [1] 5
    -> [1] 6
    -> [1] 7
    -> [1] 8
    -> [1] 9
    -> [1] 10
    +
    x <- 1:10
    +for (n in x){
    +  print(n)
    +}
    +> [1] 1
    +> [1] 2
    +> [1] 3
    +> [1] 4
    +> [1] 5
    +> [1] 6
    +> [1] 7
    +> [1] 8
    +> [1] 9
    +> [1] 10
    +
    +for(n in 1:length(x)){
    +  print(x[n])
    +}
    +> [1] 1
    +> [1] 2
    +> [1] 3
    +> [1] 4
    +> [1] 5
    +> [1] 6
    +> [1] 7
    +> [1] 8
    +> [1] 9
    +> [1] 10

    What will the following return:

    -
    > [1] 13
    -> [1] 13
    -> [1] 13
    +
    x <- seq(1,10,by=3)
    +for(i in x){
    +  print(i)
    +}
    +
    +
    +
    i <- 1
    +while (i <= 10){
    +  i <- i + 4
    +}
    +i
    +> [1] 13
    +
    +i <- 1
    +while(TRUE){
    +i <- i + 4
    +if(i > 10) break
    +}
    +i
    +> [1] 13
    +
    +i <- 1
    +repeat{
    +  i <- i + 4
    +  if (i > 10) break
    +}
    +i
    +> [1] 13

    The next statement allows the loop to stop current iteration and continue to next iteration.

    -
    -

    Questions 4

    +
    +

    Questions 4

    From Jan 1 - Jan 9, 2023, it snowed 6 days atop Snowbasin. Snow days can be represented each day in a vector as 1 (snow) and 0 (no snow).

    +
    +
    snow <- c(1,1,1,1,0,1,1,0,0)
    +names(snow) <- 1:9
    +

    Suppose you are interested in the first day it consecutively snowed three days (i.e. snowed the given day and two previous days). What is this day? Solve using a loop with conditional logic.

    -
    -

    Functions

    +
    +

    Functions

    • Repeating the strengths of functions … Using functions is a major theme of good R programming. Avoid explicit iteration (loops and copy-paste) as much as possible. [Matloff (pg xxii)]:

        @@ -5013,25 +4158,45 @@

        Functions

      • Easier transition to parallel programming
    • In general terms, R functions are structured as follow:

      -
      < name of your function > <- function(
      -    < argument 1>,
      -    < argument 2> = < default value >,
      -    ...,
      -    < argument n>
      -) {
      -
      -  < some R code here >
      -
      -  return(< some result >)
      -
      -}
    • +
      < name of your function > <- function(
      +    < argument 1>,
      +    < argument 2> = < default value >,
      +    ...,
      +    < argument n>
      +) {
      +
      +  < some R code here >
      +
      +  return(< some result >)
      +
      +}
    • For example, if we want to create a function to run the “unfair coin experiment” we could do it in the following way:

      +
      # Function definition
      +
      +# unfairCoin
      +# n: number of tosses
      +# p: biased coin (default = 0.7)
      +unfairCoin <- function(n, p = 0.7) {
      +
      +  # Sampling from the coin dist
      +  ans <- sample(c("H", "T"), n, replace = TRUE, prob = c(p, 1-p))
      +
      +  # Returning
      +  ans
      +
      +}
      +
      +# Testing it
      +set.seed(1)
      +tosses <- unfairCoin(20)
      +table(tosses)
      tosses
        H  T 
       13  7 
      +
      prop.table(table(tosses))
      tosses
          H    T 
      @@ -5040,1384 +4205,1638 @@ 

      Functions

    -
    -

    Questions 5

    -

    Generalize your code as a function so that for any string of days you can find the first day it consecutively snowed a given number of days. Return NA if no day meets this criteria.

    -
    -
    -

    Session info

    +
    +

    Questions 5

    +

    Generalize your code to Question 4 as a function so that for any string of days you can find the first day it consecutively snowed a given number of days. Return NA if no day meets this criteria.

    +
    +
    +
    +

    Vectorizing and replicate

    +

    R Simultaneously performs the same operation across vector elements.

    +
      +
    • Behind the scenes R calls C code to do one-at-a-time calculation, but this is faster than doing one-at-a-time calculation in R.
    • +
    +

    Common vectorized operations and functions:

    + +++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    CategoryFunctionDescriptionExample CodeResult
    Arithmetic+, -, *, /, ^Basic arithmetic operationsc(1, 2, 3) + 12 3 4
    Mathematicalabs()Absolute valueabs(c(-1, -2, 3))1 2 3
    sqrt()Square rootsqrt(c(1, 4, 9))1 2 3
    log()Natural logarithmlog(c(1, exp(1), exp(2)))0 1 2
    exp()Exponential functionexp(c(0, 1, 2))1 2.718 7.389
    sin(), cos(), tan()Trigonometric functionssin(c(0, pi/2, pi))0 1 0
    log10(), log2()Base-10 and base-2 logarithmslog10(c(1, 10, 100))0 1 2
    Statisticalmean()Mean of a vectormean(c(1, 2, 3, 4, 5))3
    median()Median of a vectormedian(c(1, 2, 3, 4, 5))3
    sd()Standard deviationsd(c(1, 2, 3, 4, 5))1.581
    var()Variancevar(c(1, 2, 3, 4, 5))2.5
    sum()Sum of elementssum(c(1, 2, 3, 4, 5))15
    prod()Product of elementsprod(c(1, 2, 3, 4, 5))120
    range()Range (minimum and maximum)range(c(1, 2, 3, 4, 5))1 5
    Cumulative Functionscumsum()Cumulative sum of elementscumsum(c(1, 2, 3, 4))1 3 6 10
    cumprod()Cumulative product of elementscumprod(c(1, 2, 3, 4))1 2 6 24
    cummin()Cumulative minimumcummin(c(5, 2, 8, 1))5 2 2 1
    cummax()Cumulative maximumcummax(c(1, 3, 2, 4))1 3 3 4
    Comparison==, !=, <, >, <=, >=Comparison operatorsc(1, 2, 3) > 2FALSE FALSE TRUE
    which()Indices of true valueswhich(c(FALSE, TRUE, TRUE))2 3
    Logicalany()Check if any elements are trueany(c(FALSE, TRUE, FALSE))TRUE
    all()Check if all elements are trueall(c(TRUE, TRUE, TRUE))TRUE
    is.na()Check for NA valuesis.na(c(1, NA, 3))FALSE TRUE FALSE
    is.finite()Check for finite valuesis.finite(c(1, Inf, NA))TRUE FALSE FALSE
    Stringstr_length()Length of stringsstr_length(c("a", "ab", "abc"))1 2 3
    str_sub()Substringsstr_sub(c("apple", "banana"), 1, 3)app ban
    str_detect()Detect patternsstr_detect(c("apple", "banana"), "a")TRUE TRUE
    Data Manipulationifelse()Vectorized conditional statementsifelse(c(1, 2, 3) > 2, "yes", "no")"no" "no" "yes"
    Apply Functionssapply()Apply function over a list or vectorsapply(c(1, 2, 3), function(x) x^2)1 4 9
    lapply()Apply function over a listlapply(list(c(1, 2), c(3, 4)), sum)3 7
    vapply()Apply function with a specified output typevapply(c(1, 2, 3), function(x) x^2, numeric(1))1 4 9
    mapply()Apply function to multiple argumentsmapply(function(x, y) x + y, c(1, 2), c(3, 4))4 6
    apply()Apply function to rows or columns of a matrixapply(matrix(c(1, 2, 3, 4), nrow=2), 1, sum)3 7
    +

    Element-wise addition in one call

    +
    x1 <- 1:10
    +x2 <- 101:110
    +x1 + x2
    -
    ─ Session info ───────────────────────────────────────────────────────────────
    - setting  value
    - version  R version 4.4.1 (2024-06-14)
    - os       macOS Sonoma 14.6.1
    - system   aarch64, darwin23.4.0
    - ui       unknown
    - language (EN)
    - collate  en_US.UTF-8
    - ctype    en_US.UTF-8
    - tz       America/Denver
    - date     2024-08-19
    - pandoc   3.2.1 @ /opt/homebrew/bin/ (via rmarkdown)
    -
    -─ Packages ───────────────────────────────────────────────────────────────────
    - package     * version date (UTC) lib source
    - cachem        1.1.0   2024-05-16 [1] CRAN (R 4.4.1)
    - cli           3.6.3   2024-06-21 [1] CRAN (R 4.4.1)
    - devtools      2.4.5   2022-10-11 [1] CRAN (R 4.4.1)
    - digest        0.6.36  2024-06-23 [1] CRAN (R 4.4.1)
    - ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.4.1)
    - evaluate      0.24.0  2024-06-10 [1] CRAN (R 4.4.1)
    - fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.1)
    - fs            1.6.4   2024-04-25 [1] CRAN (R 4.4.1)
    - glue          1.7.0   2024-01-09 [1] CRAN (R 4.4.1)
    - htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
    - htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.4.1)
    - httpuv        1.6.15  2024-03-26 [1] CRAN (R 4.4.1)
    - jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.4.1)
    - knitr         1.47    2024-05-29 [1] CRAN (R 4.4.1)
    - later         1.3.2   2023-12-06 [1] CRAN (R 4.4.1)
    - lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.1)
    - magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.1)
    - memoise       2.0.1   2021-11-26 [1] CRAN (R 4.4.1)
    - mime          0.12    2021-09-28 [1] CRAN (R 4.4.1)
    - miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.4.1)
    - pkgbuild      1.4.4   2024-03-17 [1] CRAN (R 4.4.1)
    - pkgload       1.4.0   2024-06-28 [1] CRAN (R 4.4.1)
    - profvis       0.3.8   2023-05-02 [1] CRAN (R 4.4.1)
    - promises      1.3.0   2024-04-05 [1] CRAN (R 4.4.1)
    - purrr         1.0.2   2023-08-10 [1] CRAN (R 4.4.1)
    - R6            2.5.1   2021-08-19 [1] CRAN (R 4.4.1)
    - Rcpp          1.0.12  2024-01-09 [1] CRAN (R 4.4.1)
    - remotes       2.5.0   2024-03-17 [1] CRAN (R 4.4.1)
    - rlang         1.1.4   2024-06-04 [1] CRAN (R 4.4.1)
    - rmarkdown     2.27    2024-05-17 [1] CRAN (R 4.4.1)
    - sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.1)
    - shiny         1.8.1.1 2024-04-02 [1] CRAN (R 4.4.1)
    - stringi       1.8.4   2024-05-06 [1] CRAN (R 4.4.1)
    - stringr       1.5.1   2023-11-14 [1] CRAN (R 4.4.1)
    - urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.4.1)
    - usethis       2.2.3   2024-02-19 [1] CRAN (R 4.4.1)
    - vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.1)
    - xfun          0.45    2024-06-16 [1] CRAN (R 4.4.1)
    - xtable        1.8-4   2019-04-21 [1] CRAN (R 4.4.1)
    - yaml          2.3.8   2023-12-11 [1] CRAN (R 4.4.1)
    -
    - [1] /opt/homebrew/lib/R/4.4/site-library
    - [2] /opt/homebrew/Cellar/r/4.4.1/lib/R/library
    -
    -──────────────────────────────────────────────────────────────────────────────
    -
    -
    -
    -
    -
    -
    - - - - - - - - - - - - - - - - - - - - - - - - + return popup.innerHTML; + }); + } + } +}); + + + + \ No newline at end of file