Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup.py #12

Closed
wants to merge 17 commits into from
Closed

setup.py #12

wants to merge 17 commits into from

Conversation

hadim
Copy link
Contributor

@hadim hadim commented Oct 27, 2016

Here you have an example of setup.py.

To install the package people will have to do the following :

python setup.py install

That should compile all C extensions automatically under Linux, Windows and OSX (you can test it with python setup.py test).

Once setup.py is done it's very easy to build a conda-force recipe to automatically build binary packages for Linux, Windows and OSX.

Once this is done the end user won't need any compiler to install storm-analysis, only the Anaconda distribution and use conda install storm-analysis.

What do you think ?

@hadim
Copy link
Contributor Author

hadim commented Oct 27, 2016

Ping @HazenBabcock

@HazenBabcock
Copy link
Member

I think this is great, and I really appreciate your work on this! A few questions/comments..

(1) There is already a tests directory in the storm_analysis directory. If this is not the right place to have a tests directory then we should move the contents into the tests directory that you added. Or if this in an okay location for a tests folder then perhaps we could move the test_c_extensions_import.py into this folder.

(2) Some of the tests in storm_analysis/tests directory are now failing with these changes. I'm not sure if the problem is that the standalone C executables (apply-drift-correction, avemlist, fitz and tracker) are not getting built or if the paths to these executables (in std_analysis.py) is no longer correct. It looks like they are not getting built?

Once these are resolved I'd be happy to merge this.

@HazenBabcock
Copy link
Member

Is it hard to also include the dependencies (numpy, scipy, Pillow, matplotlib, etc..)?

@hadim
Copy link
Contributor Author

hadim commented Oct 27, 2016

For (1) it's ok to move test_c_extensions_import.py with others files (both folder are standard).

For (3), it's not a good practice to include dependencies in a Python project. The dependencies are going to be coded in the conda recipe and so will be automatically installed when using conda install storm-analysis. You could also add to the README some instructions to easily install them such as these command lines :

conda config --add channels conda-forge
conda install numpy scipy matplotlib pillow

(no compilation needed with conda)

For (2), I will add all the data files and rewrite the run_test.py script to use them correctly. Also the C executables are very hard to include in the setup.py. Any chance for you to convert them as library and create a small Python script that call the appropriate function after importing the library ? That would make things much easier.

@HazenBabcock
Copy link
Member

(3) My impression was that setup.py files included project dependencies so that if they were not present they also got downloaded and installed when you typed "python setup.py install"? If this a bad idea then I will update the README instructions with a complete list of the required Python modules.

(2) Sure, if that is what is necessary to make this work then I can make this change. I think I would prefer to keep things separate though, and not roll all of these into a single C library.

@HazenBabcock
Copy link
Member

Also, if you like, feel free to add your name to the header of any of the files that you wrote / modified such as setup.py and sa_library/loadclib.py.

@hadim
Copy link
Contributor Author

hadim commented Oct 27, 2016

(3) setup.py is great to compile extensions and install your Python package but it's the worst piece of software ever to install/track/upgrade related dependencies... This is why most of the scientific Python libs now use conda from the Anaconda Python distribution. It makes things much easier. That is the reason I didn't include dependencies in the setup.py because in the future conda will do that for you. So for the deps I would suggest you to write a small Install documentation where you advise users to download Anaconda and install dependencies with conda : conda config --add channels conda-forge and conda install numpy scipy matplotlib pillow.

But before building a package for conda-forge let's focus on this setup.py file.

(2) Of course it's better to keep things separated as you already did for the other C extensions. The idea would be to have one lib C extension per executable and write the appropriate Python code that call this library (this is exactly what has been done for the others C extensions). Let me few days to figure out what is the best way to make these tests to work.

@HazenBabcock
Copy link
Member

(2) I just a pushed a branch called "no_standalone_C" where all these standalone C programs have been replaced by C libraries with Python interfaces. I've tested it some and I think it all works correctly..

@hadim
Copy link
Contributor Author

hadim commented Oct 28, 2016

Very nice, it will make things much easier. I'll get back to you soon.

@HazenBabcock
Copy link
Member

HazenBabcock commented Oct 28, 2016

(2) It would also be easy to change things like 3D-DAOSTORM so that they also work as Python modules, i.e. both of these would work:

from the command line:

python path/to/mufit_analysis.py movie.tif movie_mlist.bin settings.xml`

in Python:

import storm_analysis.daostorm_3d.mufit_analysis as analysis

analysis.analyze("movie.tif", "movie_mlist.bin", "settings.xml")

@hadim
Copy link
Contributor Author

hadim commented Oct 28, 2016

I have integrated your changes from no_standalone_C in this PR so you can safely delete your branch.

I moved all the test data in a specific folder storm_analysis/test/data.

I have started to move all the old test from run_test.py in specific test file (one different per project) : look at test_frc.py and test_rcc.py for an example.

To make the test easier to read and write I had to modify corresponding script (rcc/rcc_drift_correction.py and frc/frc_calc2d.py) so they can be called by function (and not only with the Python command).

Can I let you rewrite the scripts for the others modules ? Follow the example I have done with test_frc.py and test_rcc.py. Ping me if you have any questions.

Note : to run all the tests at the same time you can use python setup.py test or also pytest.

@HazenBabcock
Copy link
Member

Great, thanks!

Sure I will write the scripts for the other modules. Once I have completed everything I will push up another branch for testing and if it all looks good I'll merge everything into master.

@hadim
Copy link
Contributor Author

hadim commented Oct 28, 2016

Ok so maybe you could merge this branch into master ? So you can work starting from the code included in this PR.

@HazenBabcock
Copy link
Member

No problem, I will work off this branch. Apologies for not doing that with my last set of changes.

@hadim
Copy link
Contributor Author

hadim commented Oct 28, 2016

The easiest way to do that is to click on the "Merge Pull Request" button below this page. Then you will have to pull it on your computer with git pull.

@HazenBabcock
Copy link
Member

HazenBabcock commented Oct 29, 2016

I pulled the build-system branch off your repository and have been working off that. I pushed my current version of this branch to this repo. Now I've run into some issues and I would appreciate your suggestions.

(1) Some of the parameter files refer to other files in the same directory and this breaks in the tests. For example all of the sCMOS settings file tell the analysis to load a file called "calib.npy" which is expected to be in the same directory as the settings file. Since this is not the case in the tests none of the sCMOS tests pass. This will be a problem for sCMOS, spliner and L1H as they all have parameter files that refer to other files.

(2) There seems to be some issue with the DBSCAN C library, specifically the kdtree aspect. If you set USEKD to 0 in dbscan.c then the clustering works properly, but if you set it to 1 then it does not. I feel like I have run on this before and there is some subtlety to getting the kdtree library complied properly. Not sure what that it is though.. The compile_linux.sh file would suggest this is easy..

(3) I've been running the tests with nose2, when I try to use python setup.py test I get a long error message, the most important part of which I think is "ValueError: Plugin already registered:". I'm trying to do all this in a virtual environment, if that makes any difference.

Overall though I think we are pretty close to getting this all working.

@hadim
Copy link
Contributor Author

hadim commented Oct 29, 2016

For (1), you could add a new parameter type called filename and process it with the path of the parameter file. Something like this :

class Parameters:
    # Dynamically create the class by processing the
    # parameters xml file.
    def __init__(self, parameters_file):
        xml = minidom.parse(parameters_file)
        settings = xml.getElementsByTagName("settings").item(0)
        for node in settings.childNodes:
            if node.nodeType == Node.ELEMENT_NODE:
                # single parameter setting
                if len(node.childNodes) == 1:
                    slot = node.nodeName
                    value = node.firstChild.nodeValue
                    ntype = node.attributes.item(0).value
                    if (ntype == "int"):
                        setattr(self, slot, int(value))
                    elif (ntype == "int-array"):
                        text_array = value.split(",")
                        int_array = []
                        for elt in text_array:
                            int_array.append(int(elt))
                        setattr(self, slot, int_array)
                    elif (ntype == "float"):
                        setattr(self, slot, float(value))
                    elif (ntype == "float-array"):
                        text_array = value.split(",")
                        float_array = []
                        for elt in text_array:
                            float_array.append(float(elt))
                        setattr(self, slot, float_array)
                    elif (ntype == "string-array"):
                        setattr(self, slot, value.split(","))
                    # HERE !
                    elif (ntype == "filename"):
                        dirname = os.path.dirname(os.path.abspath(parameters_file))
                        fname = os.path.join(dirname, value)
                        setattr(self, slot, fname)
                    else: # everything else is assumed to be a string
                        setattr(self, slot, value)
                # multiple parameter settings.
                else:
                    print("multi parameter setting unimplemented.")

        self.parameters_file = parameters_file

Then you would have to modify all you parameter files from this

<camera_calibration type="string">calib.npy</camera_calibration>

to this

<camera_calibration type="filename">calib.npy</camera_calibration>

For (2) I'll try to have a look.

For (3) I used pytest instead of nose maybe it explains the error... python setup.py test and pytest should do the exact same thing.

It's a very good practice to work in virtual env but I would suggest you to use the conda environment since they are very powerfull. You can create a new env like this conda create -n my_env_name python numpy scipy pytest.

@hadim
Copy link
Contributor Author

hadim commented Oct 29, 2016

I close this PR since you have already integrated the code in your repo. Feel free to keep the discussion here even if it's closed.

@hadim hadim closed this Oct 29, 2016
@hadim hadim deleted the build-system branch October 29, 2016 04:20
@hadim
Copy link
Contributor Author

hadim commented Oct 29, 2016

Also there is some code incompatible with Python 3 (the version I am using). You should try to run the tests with Python 3 to figure out where.

So far I have noticed print issue in storm_analysis/dbscan/cluster_size.py and storm_analysis/dbscan/find_clusters.py and also some str/bytes issue in storm_analysis/sa_utilities/apply_drift_correction_c.py.

@hadim
Copy link
Contributor Author

hadim commented Oct 29, 2016

I have also noticed a bug line 119 in storm_analysis/dbscan/cluster_images.py :

Traceback (most recent call last):
  File "storm_analysis/test/test_track_cluster.py", line 48, in <module>
    test_voronoi()
  File "storm_analysis/test/test_track_cluster.py", line 42, in test_voronoi
    clusterImages(clist_name, "Voronoi Clustering", 50, 20, image_name)
  File "/home/hadim/Insync/Documents/Code/postdoc/contrib/storm-analysis/storm_analysis/dbscan/cluster_images.py", line 119, in clusterImages
    draw2.text((2,2), title, fill=(255,255,255,0), font = font2)
  File "/home/hadim/local/conda/envs/test2/lib/python2.7/site-packages/PIL/ImageDraw.py", line 226, in text
    ink, fill = self._getink(fill)
  File "/home/hadim/local/conda/envs/test2/lib/python2.7/site-packages/PIL/ImageDraw.py", line 124, in _getink
    ink = self.draw.draw_ink(ink, self.mode)
TypeError: function takes exactly 1 argument (4 given)

@hadim
Copy link
Contributor Author

hadim commented Oct 29, 2016

Here you have some instructions you could add to the main README :


Install

Use the Anaconda Python distribution which make the installation and dependencies management very easy : https://www.continuum.io/downloads

Optionally you can create an environment to keep your main Python installation clean :

conda create -n my_env python
source activate my_env  # or activate my_env under Windows

Install dependencies :

# Linux / OSX
conda config --add channels conda-forge 
conda install numpy fftw lapack pytest pytest-runner gcc tifffile matplotlib pillow shapely randomcolor pywavelets

# Windows
conda config --add channels conda-forge 
conda install numpy fftw lapack pytest pytest-runner m2w64-toolchain tifffile matplotlib pillow shapely randomcolor pywavelets

Get the storm_analysis source code (you can use git for example) :

git clone https://github.com/ZhuangLab/storm-analysis.git
cd storm-analysis
python setup.py install

Test the installation :

python -c "import storm_analysis"

@HazenBabcock
Copy link
Member

HazenBabcock commented Oct 31, 2016

Ok, I think I have everything fixed now, except for the dbscan/kdtree problem. All the tests in the current version of the "build-system" branch should pass with either Python 2 or Python 3. I've also tried to update all the documentation, including adding your installation instructions. The plan is to remove all the Windows .exe and .dll binaries and the linux .sh compilation scripts once this has been in master for a few weeks.

I plan to test this on a Windows computer and hopefully solve the dbscan/kdtree problem. Once this has been done I'll merge this branch into master.

If you could also run the tests that would be great.

@hadim
Copy link
Contributor Author

hadim commented Oct 31, 2016

Very nice work @HazenBabcock ! Few comments below.

1

fftw and lapack packages are not yet available for Windows. lapack package will come this week. See this for tracking conda-forge/lapack-feedstock#3. fftw package should take more time before being available. See this for tracking : conda-forge/fftw-feedstock#17

Because Windows users won't be able to use the current code without theses packages (even with the dll in the repo I think), I suggest you to wait for the packages to be available.

I'll keep you in touch here about that.

2

When running the tests, a bunch of file are created outside of storm_analysis/test/output (not tracked by git). Ideally you want those files to go under storm_analysis/test/output. To do that you should modify the appropriate code to be able to control the output file paths (it's always a good practice anyway).

Files are :

        fista_decon_psf.dax
    fista_decon_psf.inf
    ia_matrix.dax
    ia_matrix.inf
    psf.dax
    psf.inf
    spline.dax
    spline.inf
    storm_analysis/test/data/test_drift_drift.txt
    storm_analysis/test/data/test_l1h_a7_k5_i8_o8_p8_4.amat
    storm_analysis/test/data/test_spliner_psf.spline

3

I confirm all the tests pass with a Python 2 and a Python 3 clean conda env.

I found the test storm_analysis/test/test_spliner.py and storm_analysis/test/test_track_cluster.py a bit long to run. Maybe you could reduce those ? (not a big deal...)

4

You should add *__pycache__* to .gitignore.

@hadim
Copy link
Contributor Author

hadim commented Nov 1, 2016

Your last commits solve all my comments. Congrats ! I think storm-analysis is in pretty good shape now to be used by "standard" Python users.

I let you know once lapack and ffwt are available for Windows. Then you should be ok to merge this branch to master.

@HazenBabcock
Copy link
Member

Hopefully pretty close anyway. I think the tests still need some work as I agree that they take a bit long to run. Also some of the tests depend on the outputs from other tests.

And there is still the compilation issue with the DBSCAN C library that I need to resolve. It looks like compile_linux.sh and Python are compiling the library the same way but they are using compilation flags. So now I'll need to figure out which flag is needed and / or which is causing this trouble..

Once I have figured out how to get DBSCAN working again I will probably rename this branch to development and add a suggestion in the README that linux users consider using it even before the merge.

@hadim
Copy link
Contributor Author

hadim commented Nov 1, 2016

What issue do you have with DBSCAN ? It compiles on my computer.

@HazenBabcock
Copy link
Member

The problem is that it does not work properly with #define USEKD 1. It does not find any clusters, which can be verified if you see that the cluster_stats.txt file only has a header and no clusters.

@HazenBabcock
Copy link
Member

Hopefully this issue (with DBSCAN) is fixed now. This was a learning experience for me about the dangers of blithely ignoring warning messages from the gcc compiler :).

@HazenBabcock
Copy link
Member

HazenBabcock commented Nov 2, 2016

Heh, wrong again, sigh. It looks like I was fooled into thinking this worked by a stale version of the library, apologies for the noise.

At this point all I can say is that if I use the same commands that setup.py uses to compile DBSCAN but remove the -DNDEBUG flag then this library works as expected.

@HazenBabcock
Copy link
Member

HazenBabcock commented Nov 2, 2016

It looks like -DNDEBUG and the C assert statement don't play nicely, and hopefully this is actually fixed this time. I added a test for this that should fail more obviously if something is wrong.

Quoting gnu.org

"Warning: Even the argument expression expression is not evaluated if NDEBUG is in effect. So never use assert with arguments that involve side effects. For example, assert (++i > 0); is a bad idea, because i will not be incremented if NDEBUG is defined."

@Dschoni
Copy link

Dschoni commented Nov 17, 2016

I just want to ask, why you are focussing so much on conda for dependency management. Most users in my environment do that with pip, which is easily integrated into a setup.py file. Or am I missing something?

@hadim
Copy link
Contributor Author

hadim commented Nov 17, 2016

Using conda forge you can easly generate binaries for your package. But pip should fine too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants