Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



5 Commits

Repository files navigation

title output
Getting and cleaning data course project


Data source is here:

Data description is here:


  1. Unzip the input file
  2. install package 'dplyr'
  3. Within the extracted directory execute the run_analysis.R script
  4. Read the tidy data from 'tidydataset.txt' also within the extracted directory

##Data Files

Following data files are read:

features.txt This contains the list of feature / variable names test/X_test.txt This contains the measurements taken [Test set] test/y_test.txt This contains the list of activities being done for each measurement [Test set] test/subject_test.txt This contains the id of the subject (person) performing each activity [Test set] train/X_train.txt This contains the measurements taken [Trainging set] train/y_train.txt This contains the list of activities being done for each measurement [Training set] train/subject_train.txt This contains the id of the subject (person) performing each activity [Training set] activity_labels.txt This contains the list of activity descriptions

The working directory is assumed to the same as the extracted data set.

e.g. if the zip file is downloaded and extracted to ~/home/code/UCI HAR Dataset/, then the working directory is assumed to be ~/home/code/UCI HAR Dataset/

NB: The set of files under the Intertial* directory were not read, as these are not important for the problem statement

##Process The code does the following:

  1. Merges the training and the test sets to create one data set.
  2. Extracts only the measurements on the mean and standard deviation for each measurement.
  3. Uses descriptive activity names to name the activities in the data set
  4. Appropriately labels the data set with descriptive variable names.
  5. From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.


The tidy data set is stored in a file called 'tidydataset.txt'

This data set meets the principles of tidy data, by storing the variables in a wide form - please see discussion For details


  • The data files listed above
  • (code book)
  • run_analysis.R (Single R script containing all the processing described in this readme)
  • tidydataset.txt (Tidy data set with the average of each variable, for each acitivty and each subject)

Reading data

data <- read.table("tidydataset.txt", header = TRUE)


Coursera Course project






No releases published


No packages published
