glmnet work #7

madrury · 2014-07-05T18:02:23Z

I've been working on the python glmnet for a couple weeks now. Here's what i've accomplished:

Updated the fortran code to the most current from CRAN.
Added support for logistic regression.
Added cross validation code for choosing lambda.
Re-factored quite a lot.

I plan on continuing to expand, my next goal is to add support for poisson regression.

As a side note, I had trouble getting the setup.py you provided to work, so I have to come back to this at some point.

fit methods now take parameters such as weights and offsets that should only be known at fit time. Validation method factored into multiple small mathods to help with this.

Cross validation with two strategies: weighted and unweighted. Also refactored the cv code into multiple modules.

Cross validation folds all use same values of lambda. Calculation of lambda max added to elastic_net to facilitate this, need to investigate the calculation for logistic_net more deeply before implementing.

Imlemented weight adjustment for the max lambda calculation. Also a strategy for the alpha = 0 (ridge) case. No idea what the fortran code does currently, the literature is silent on this point.

This bug was introduced upon updating the fortran code. The interceps attribute shape changed, a call to ravel was added to compensate.

Calculation researched and validated to give the same results as the fortran code. Cross validation for logistic models is done.

Remove dependence on sklearn.standardize. NotImplementedErrors. Better handling of y in .fit.

The describe method is now more robust. 1) Moved to glmnet.py, so its available to all subclasses. 2) Factored into few helper methods, alowing it to give different levels of detail depending on the state of the mdoel when called.

Bug fixes: 1. Lambda max would be determined as a Nan because some variables have zero variance (commit 4407e8a) 2. The predict method would crash when all variables were eliminated due to high lambda (commit 794a337)

This is necessary any time a matrix is standardized.

* zero variance predictors. * non non-zero predictors.

Still not sure how to get perfectly stable tests. Maybe consider not generating random data and instead taking data from static files.

* Improve documentation. * Add describe method. * Add error checking.

Allow passing an existing kfold object when creating a new CVGlmNet.

Implement KFold class, which can be used as a generator in much the same way as weighted_k_fold could before, but carres meta information so that the same object can be used multiple times.

This enhancement was suggested by Declan Groves. Allow the fill model to be fit in parallel along with the sub models (fit on folds of the data during cross validation). This can potentially cut cross validation model fitting time in half.

Move max lambda calculations outside of .fit

Added util/importers.py to handle importing of optional dependencies (joblib, pyplot).

madrury added 30 commits June 13, 2014 18:42

Basic functionality for ElasticNet and LogNet.

7365ef0

Documentation strings.

589ffb4

Added __str__ method. Added small example of use.

a3d5e22

Cleaned up plot_path a bit.

54ca103

Structured as package.

06d242e

Deviance calculations.

53b59d8

Refactoring. Move common functionality to glmnet.

85fddfc

Improved error checking in _validate_inputs.

0a61d8e

First attempt at cross validation.

a107067

More cv work.

e303ccf

Replaced fortran code with up to date version.

72316bc

Refactoring to give .fit a more sensible signature.

2034172

fit methods now take parameters such as weights and offsets that should only be known at fit time. Validation method factored into multiple small mathods to help with this.

Weights in elastic_net deviance.

9783a57

Update cross validation to take advantage of new fit signature.

5893cbe

Add weighted cross validation.

92f453c

Cross validation with two strategies: weighted and unweighted. Also refactored the cv code into multiple modules.

Normalize by weights in deviance.

15b54bc

Better cross validation strategy.

8b2e3f4

Cross validation folds all use same values of lambda. Calculation of lambda max added to elastic_net to facilitate this, need to investigate the calculation for logistic_net more deeply before implementing.

Made examples directory.

ab1dd66

Plot cross validation deviance estimate.

d847d58

Example plots.

f708c54

Elasitc net max lambda with weights.

38ce8f2

Imlemented weight adjustment for the max lambda calculation. Also a strategy for the alpha = 0 (ridge) case. No idea what the fortran code does currently, the literature is silent on this point.

Fix bug in LogisticNet.

91c7900

This bug was introduced upon updating the fortran code. The interceps attribute shape changed, a call to ravel was added to compensate.

LogisticNet example.

54d7de0

Max lambda calculation for logistic net.

4fbc931

Calculation researched and validated to give the same results as the fortran code. Cross validation for logistic models is done.

Example logistic cross validation.

f393c56

Clean up elastic net cv example.

0caf11e

Remove print statment from cv_glmnet.

df543da

Name change: plot_path -> plot_paths

4da6888

New README.

a5cd6ea

Debugging readme...

d785ba7

madrury and others added 30 commits November 30, 2014 13:47

Add col_names attribute to ElasticNet.

065e12a

Add get_coefficient methods to ElasticNet.

2bf9964

Add describe method to ElasticNet.

37ca803

Fix small bug in last commit.

3e33acc

Clean up LogisticNet a bit.

ff48161

Remove dependence on sklearn.standardize. NotImplementedErrors. Better handling of y in .fit.

Remove dependence on sklearn.standardize in enet.

8d8ee55

First tests on LogisticNet.

6cdcc1a

Add test of ridge regression case.

4786718

Refactor describe method.

8f30a3a

The describe method is now more robust. 1) Moved to glmnet.py, so its available to all subclasses. 2) Factored into few helper methods, alowing it to give different levels of detail depending on the state of the mdoel when called.

Merging changes from madrury.

87b7295

Fixing a typo

5fd7d57

Merge pull request #1 from AlexeyG/master

beb7550

Bug fixes: 1. Lambda max would be determined as a Nan because some variables have zero variance (commit 4407e8a) 2. The predict method would crash when all variables were eliminated due to high lambda (commit 794a337)

Fix spelling of _coefficients.

9accd12

Add constant colum fix in a few other places.

6e6460e

This is necessary any time a matrix is standardized.

Add check for non-zero relative penalty.

509e092

Add test case for zero rel_penalties.

dd07c8f

Add tests for edge cases:

2ed4f18

* zero variance predictors. * non non-zero predictors.

Increase data volume for more stable tests.

598762b

Still not sure how to get perfectly stable tests. Maybe consider not generating random data and instead taking data from static files.

Update README.

df1b15e

Cleaned up max_lambda calc in ElasticNet.

cf4d55f

Add check for weights in LogisticNet.

5b34b61

Remove print statements in test_elastic_net.

b645253

Work on cv_glmnet

bed954d

* Improve documentation. * Add describe method. * Add error checking.

Allow passing an existing kfold object when creating a new CVGlmNet.

2eecdbc

Merge pull request #2 from nenorbot/master

48e6a29

Allow passing an existing kfold object when creating a new CVGlmNet.

Fold generators are now objects.

733a5f4

Implement KFold class, which can be used as a generator in much the same way as weighted_k_fold could before, but carres meta information so that the same object can be used multiple times.

Fitting of full model in parallel with cvs.

b59c08a

This enhancement was suggested by Declan Groves. Allow the fill model to be fit in parallel along with the sub models (fit on folds of the data during cross validation). This can potentially cut cross validation model fitting time in half.

Refactor cv_glmnet a little.

32eb9aa

Move max lambda calculations outside of .fit

Remove unused glmnet._clone

fb397a6

Improve handling of optional dependencies.

e22ad5c

Added util/importers.py to handle importing of optional dependencies (joblib, pyplot).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glmnet work #7

glmnet work #7

madrury commented Jul 5, 2014

glmnet work #7

Are you sure you want to change the base?

glmnet work #7

Conversation

madrury commented Jul 5, 2014