This is the GitHub repository for the paper "Harmonic Loss Trains Interpretable AI Models" [arXiv] [Twitter] [Github].
-
Harmonic logit
$d_i$ is defined as the$l_2$ distance between the weight vector$\mathbf{w}_i$ and the input (query)$\mathbf{x}$ :$d_i = |\mathbf{w}_i - \mathbf{x}|_2$ . -
The probability
$p_i$ is computed using the harmonic max function:
where
- Harmonic Loss achieves (1) nonlinear separability, (2) fast convergence, (3) scale invariance, (4) interpretability by design, properties that are not available in cross-entropy loss.
Download the results from the following link: Link
Figure 1: toy_points.ipynb
Figure 2,3,7: notebooks/final_figures.ipynb
Figure 4. notebooks/case_study_circle.ipynb
Figure 5. notebooks/mnist.ipynb
Figure 6. GPT2/function_vectors.ipynb