Avoid identical computation in self tanimoto similarity #117

JochenSiegWork · 2025-01-31T09:39:06Z

The ´self_tanimoto_similarity´ function equates matrix_a to itself, and then it calls the tanimoto_similarity_sparse. Calculating norm_2 is repeated in this case which is unnecessarily costly for large arrays. See

MolPipeline/molpipeline/utils/kernel.py

Line 29 in 8190785

norm_2 = np.array(matrix_b.multiply(matrix_b).sum(axis=1))

We can add a simple check for identity of the two matrices to avoid redundant computation.

Thanks to Afnan for bringing this to our attention!

The text was updated successfully, but these errors were encountered:

JochenSiegWork added the type: maintenance Improvement of code or keeping the code up to date label Jan 31, 2025

JochenSiegWork self-assigned this Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid identical computation in self tanimoto similarity #117

Avoid identical computation in self tanimoto similarity #117

JochenSiegWork commented Jan 31, 2025

Avoid identical computation in self tanimoto similarity #117

Avoid identical computation in self tanimoto similarity #117

Comments

JochenSiegWork commented Jan 31, 2025