Skip to content

Latest commit

 

History

History
54 lines (39 loc) · 1.72 KB

README.md

File metadata and controls

54 lines (39 loc) · 1.72 KB

Kaggle_HomeDepot

110th Place Solution for Home Depot Product Search Relevance

Dependencies

  • scikit-learn
  • pandas
  • numpy
  • pychant
  • keras
  • xgboost

FlowChart

Flow Chart Image

Configuration

The configuration of the project can be changed from configs.py.

Note: Use python3 as default interpreter.

Regenerating Results

Before running any of the files copy the data to input/ folder. So, the project structure should look like

Kaggle_HomeDepot
└───input
    │   train.csv
    │   test.csv
    │   attributes.csv
    │   product_descriptions.csv
    │   sample_submission.csv
└───scripts
    │   README.md
    │   ...

Now, To regenerate the results run these files mentioned below respectively.

Data Pre-Processing and Feature Extraction

  • generate_settings.py - Generates Settings for the project.
  • preprocess.py - Initial Clearning of Data
  • feature_generater.py - Clean Data and generates TF-IDF features
  • features_distance.py - Generates distance and counting features
  • generate_dataset_svd50x3_distance.py - Combine all the individual features and generates a dataset.

Machine Learning and Stacked Generalization

  • stacked_generalization.py - To train all the machine learning modules and stacks all the results to create the submission.