Skip to content

muditbac/Kaggle_HomeDepot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

Kaggle_HomeDepot

110th Place Solution for Home Depot Product Search Relevance

Dependencies

  • scikit-learn
  • pandas
  • numpy
  • pychant
  • keras
  • xgboost

FlowChart

Flow Chart Image

Configuration

The configuration of the project can be changed from configs.py.

Note: Use python3 as default interpreter.

Regenerating Results

Before running any of the files copy the data to input/ folder. So, the project structure should look like

Kaggle_HomeDepot
└───input
    │   train.csv
    │   test.csv
    │   attributes.csv
    │   product_descriptions.csv
    │   sample_submission.csv
└───scripts
    │   README.md
    │   ...

Now, To regenerate the results run these files mentioned below respectively.

Data Pre-Processing and Feature Extraction

  • generate_settings.py - Generates Settings for the project.
  • preprocess.py - Initial Clearning of Data
  • feature_generater.py - Clean Data and generates TF-IDF features
  • features_distance.py - Generates distance and counting features
  • generate_dataset_svd50x3_distance.py - Combine all the individual features and generates a dataset.

Machine Learning and Stacked Generalization

  • stacked_generalization.py - To train all the machine learning modules and stacks all the results to create the submission.

About

Contatins code for the competition on Home Depot Product Search Relevance

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published