README.md

Experiment: CW09B

This experiment is based on the ClueWeb09 dataset, the Category B subset (the first 50 million English webpages).

Our scripts assume the following resources:

To build the feature files for all experimental runs (may take a while):

make

To run the experiment:

Use python script make_kfold_split.py to partition data into 10 folds.
Train a model for each fold with RankLib options -ranker 4 -metric2t NDCG@20 -norm zscore
Make predictions and compile all results into a TREC run file.
Evaluate the run using trec_eval and the qrels file.

To use the data: