-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCode Book
36 lines (28 loc) · 2.76 KB
/
Code Book
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
title: Codebook Samsung Data
author: sagazluiz
date: 05-27-2015
output: "data_means_and_std.txt"
## Project Description
The purpose of this project is to demonstrate your ability to collect, work with, and clean a data set.
##Study design and data processing
It is required to submit:
1) a tidy data set as described below;
2) a link to a Github repository with your script for performing the analysis,and;
3) a code book that describes the variables, the data, and any transformations or work that you performed to clean up the data called CodeBook.md. You should also include a README.md in the repo with your scripts. This repo explains how all of the scripts work and how they are connected.
###Collection of the raw data
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
###Cleaning of the data
The run_analysis.R script performs the following steps to clean the data:
The run_analysis.R script performs the following steps to clean the data:
1. Read X_train.txt, y_train.txt and subject_train.txt from the “./Dataset/UCI HAR Dataset/train" folder and store them in Xtrain and trainSubject variables.
2. Read X_test.txt, y_test.txt and subject_test.txt from the "./Dataset/UCI HAR Dataset/test" folder and store them in Xtest, ytest and testsubject variables.
3. Concatenate Xtrain to Xtest , concatenate ytrain to ytest, and join trainSubject to testSuject.
4. Read the features.txt file from the "“./Dataset/UCI HAR Dataset /" folder and store the data in a variable called features. We only extract the measurements on the mean and standard deviation.
5. Clean the column names of the subset. We remove the "()" and "-" symbols in the names, as well as make the first letter of "mean" and "std" a capital letter "M" and "S" respectively.
6. Read the activity_labels.txt file from the "“./Dataset/UCI HAR Dataset /" folder and store the data in a variable called activity.
7. Clean the activity names in the second column of activity. We first make all names to lower cases. If the name has an underscore between letters, we remove the underscore and capitalize the letter immediately after the underscore.
8. Transform the values of activity according to the activity data frame.
9. Combine the bindSubject, yData and XData by column to get a new cleaned 10299x68 data frame,cleanedData. Properly name the first two columns, "subject" and "activity". The "subject" .
10. Write the cleanedData out to "merged_data.txt" file in current working directory.
11. Finally, generate a second independent tidy data set with the average of each measurement for each activity and each subject the Write the result out to " data_means_and_std.txt " file in current working directory.