Skip to content

AnitaLiu98/MSBD5001

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MSBD5001 Personal Project: Kaggle in-class competition

Introduction

  1. The programming language used in this project is python 3.5.
  2. The packages used in this project including: pandas, numpy, sklearn, scipy, matplotlib, seaborn.

Project Details

Input

Input files are stored under rawdata folder, including samplesubmission.csv, test.csv and train.csv.

Data processing and Feature Engineering

To run this project, firstly you have to perform data pre-process and feature engineering.

  1. Under dataprocessiing folder, run process.py, perform data pre-process, you can get two CSV files named prefeatures_dropold.csv and test_feature.csv under dataprocessiing/processed data folder.
  2. Run feature ranking.py under dataprocessiing/FeatureEngineering folder, you can get a CSV file named slctdfeature .csv under dataprocessiing/processed data folder. But because of the randomness of running result of feature ranking.py, if you want to generate my final submission, I have uploaded the slctdfeature .csv file that I used in my prediction models under dataprocessiing/processed data folder, plesase use this file directly.
  3. Run KFold.py, it will show cross validation result of several diffrent regression models.

Model 1

Run randomforest_selctdft.py under submit1 folder, it can generate the resulting test.csv file under submit1 folder.

Model 2

Run randomforest_selctdft_model2.py under submit2 folder, it can generate the resulting test.csv file under submit2 folder.

Results

The resulting test.csv file I submitted on kaggle website are stored under submit1 folder and submit2 folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages