Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip or conda install #11

Open
sgbaird opened this issue Sep 11, 2021 · 6 comments
Open

pip or conda install #11

sgbaird opened this issue Sep 11, 2021 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@sgbaird
Copy link
Collaborator

sgbaird commented Sep 11, 2021

How tough do you think it would be to package CrabNet into one of the package managers?

@anthony-wang
Copy link
Owner

anthony-wang commented Sep 11, 2021

Well, I think it probably won't be too difficult to package it into a package manager per se, but I think there are some additional things that should be done so that this works well:

  • automated testing (including writing test cases)
  • proper cross-platform checks
  • automated building, packaging and uploading to the repository
  • having access to a server that can do the above
  • maintaining the repository on pip / conda

I don't think I have the capacity to work on this right now. If anyone or you would like to give it a try you are very welcome to do so!

With that said though, I don't really see the difference between pip install crabnet vs. quickly cloning the repo and then conda env create --file conda-env.yml --- they both do the same thing with one command. What would be the advantage of making crabnet as its actual own package?

@anthony-wang anthony-wang added enhancement New feature or request help wanted Extra attention is needed labels Sep 11, 2021
@sgbaird
Copy link
Collaborator Author

sgbaird commented Sep 12, 2021

When using CrabNet standalone, I don't think it makes much of a difference to pip install crabnet vs. do the clone option for myself at least. I think the issue is more so when incorporating CrabNet with several other packages and workflows. You might have another workaround for this - interested to hear your thoughts. I've spent a good portion of the day trying to get ElM2D to play nice with Automatminer and then CrabNet without much luck. Manually switching between envs and using separate scripts seems fine a few times, but for me that might eventually mean switching between ~5-7 envs and I've had projects where I've needed to rerun HPC workflows 5-20 times before it was ready for a paper (which is arguably its own issue to address). In your experience, is it generally worth it to try to merge things, or is it better to just deal with them in separate envs?

Other comments on package manager

Aside from my personal interests, I think having a PyPi or conda version increases the visibility and reduces the amount of background knowledge required to use CrabNet. While it's pretty easy for you or me to do this, there are quite a few people who might be interested in using it but are new to Python, coding in general, command/shell terminals, environments, git and/or GitHub, etc. I think it's well worth the investment to learn these things, but everyone has to make that judgment call for themselves. Ironically, it usually takes a significant effort to make things easier for others (which is pretty clear from the bullets you gave and for example, the Numba repo which I've made some contributions to, and wow, that must have taken a decent amount of time to set up the workflow and a lot of time to maintain).

Edit

One other thing about project integration is that CrabNet seems to be fairly dependent on file paths (for example, storing a train.csv file in a specific folder), whereas pip and conda installations sort of force a non-filepath-dependent workflow (which obviously has to be set up by the devs). I like Automatminer (if it would work for me 😛) in this regard that by using a DataFrame, you can be in any folder, with the right environment active, and still use the functions. I'm also running into an issue where from crabnet.kingcrab import CrabNet doesn't work because I'm not in the right directory. When I get some time and depending on my usage, I may try to work on this.

@sgbaird
Copy link
Collaborator Author

sgbaird commented Sep 12, 2021

VSCode seems to be a lot more friendly with switching environments than Spyder. I'm doing a per-file switch of environments as a bit of a hacky workaround. Thanks for the suggestion about VSCode a while back!

@anthony-wang
Copy link
Owner

You are right that some of the paths and loading functions are hard-coded, and that would for sure be one of the things that are required if we want to make this package publishable and more "pluggable" if people intend to use it together with other packages and not just standalone!

It's hard for me to recommend what you should do; I also run into environment conflicts with different packages and often there is no one ideal way to resolve them. I think CrabNet will run with slightly newer/older versions of PyTorch/CUDA toolkit/Python, but these rely on the individual packages' compatibility with different regions and I have not tested these combinations.

@sgbaird
Copy link
Collaborator Author

sgbaird commented Sep 18, 2021

It got inconvenient pretty quickly to switch between conda envs (even in VSCode), so I took another look. By combining all of the dependencies I needed (pymatgen, ElM2D dependencies, and CrabNet dependencies) into the conda-env.yml file as follows:

name: elm2d-crabnet
channels:
  - pytorch
  - conda-forge
dependencies:
  - pymatgen
  - cython #ElM2D
  - numba #ElM2D
  - python=3.8.*
  - numpy
  - scipy #ElM2D
  - pandas
  - scikit-learn
  - plotly #ElM2D
  - setuptools #ElMD
  - umap-learn #ElM2D
  - hdbscan
  - matplotlib
  - seaborn
  - pytorch=1.7.*
  - cudatoolkit=11.*
  - tqdm

And it was able to resolve the dependency structure. (For the repo with only a pip install, I looked at setup.py to get the dependencies).

@sgbaird
Copy link
Collaborator Author

sgbaird commented Feb 5, 2022

For reference there is a pip/conda installable version via my fork of CrabNet, and I tried to keep the changes backwards-compatible.

conda install -c sgbaird crabnet

or

pip install crabnet

See also #18 and #16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants