Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glmnet work #7

Open
wants to merge 80 commits into
base: master
Choose a base branch
from
Open

glmnet work #7

wants to merge 80 commits into from

Conversation

madrury
Copy link

@madrury madrury commented Jul 5, 2014

I've been working on the python glmnet for a couple weeks now. Here's what i've accomplished:

  • Updated the fortran code to the most current from CRAN.
  • Added support for logistic regression.
  • Added cross validation code for choosing lambda.
  • Re-factored quite a lot.

I plan on continuing to expand, my next goal is to add support for poisson regression.

As a side note, I had trouble getting the setup.py you provided to work, so I have to come back to this at some point.

madrury added 30 commits June 13, 2014 18:42
  fit methods now take parameters such as weights and offsets that
should only be known at fit time.  Validation method factored into
multiple small mathods to help with this.
    Cross validation with two strategies: weighted and
unweighted.  Also refactored the cv code into multiple
modules.
  Cross validation folds all use same values of lambda.
Calculation of lambda max added to elastic_net to
facilitate this, need to investigate the calculation for
logistic_net more deeply before implementing.
  Imlemented weight adjustment for the max lambda calculation.
Also a strategy for the alpha = 0 (ridge) case.  No idea what
the fortran code does currently, the literature is silent on
this point.
  This bug was introduced upon updating the fortran code.
The interceps attribute shape changed, a call to ravel was
added to compensate.
   Calculation researched and validated to give the same
results as the fortran code.  Cross validation for logistic
models is done.
madrury and others added 30 commits November 30, 2014 13:47
  Remove dependence on sklearn.standardize.
  NotImplementedErrors.
  Better handling of y in .fit.
The describe method is now more robust.
  1) Moved to glmnet.py, so its available to all
     subclasses.
  2) Factored into few helper methods, alowing it
     to give different levels of detail depending
     on the state of the mdoel when called.
Bug fixes:
   1. Lambda max would be determined as a Nan because some variables have zero variance (commit 4407e8a)
2. The predict method would crash when all variables were eliminated due to high lambda (commit 794a337)
    This is necessary any time a matrix is
standardized.
   * zero variance predictors.
   * non non-zero predictors.
  Still not sure how to get perfectly stable
tests.  Maybe consider not generating random data
and instead taking data from static files.
    * Improve documentation.
    * Add describe method.
    * Add error checking.
Allow passing an existing kfold object when creating a new CVGlmNet.
  Implement KFold class, which can be used as a
generator in much the same way as weighted_k_fold
could before, but carres meta information so that
the same object can be used multiple times.
  This enhancement was suggested by Declan
Groves.  Allow the fill model to be fit in
parallel along with the sub models (fit on folds
of the data during cross validation).  This can
potentially cut cross validation model fitting
time in half.
   Move max lambda calculations outside of .fit
  Added util/importers.py to handle importing of
optional dependencies (joblib, pyplot).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants