FPSim2 is a small NumPy centric Python/C++ RDKit based package to run fast compound similarity searches. FPSim2 performs better with high search thresholds (>=0.7). Currently used in the ChEMBL and SureChEMBL interfaces.
Highlights:
- Uses CPU POPCNT instruction for fast bit counting
- Bounds for sublinear speedups from 10.1021/ci600358f
- A compressed file format with optimised read speed based in PyTables and BLOSC2
- Fast multicore CPU and GPU similarity searches
- In memory and on disk search modes
- Distance matrix calculation
With pip:
pip install fpsim2
With conda:
conda install conda-forge::fpsim2
With SBGrid:
sbgrid-cli install fpsim2
Documentation is available at https://chembl.github.io/FPSim2/
To try out FPSim2 interactively in your web browser, check out this Google Colab notebook