Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA power-iteration does one more iteration than asked (off-by-1) #1910

Open
jucor opened this issue Feb 9, 2025 · 0 comments
Open

PCA power-iteration does one more iteration than asked (off-by-1) #1910

jucor opened this issue Feb 9, 2025 · 0 comments

Comments

@jucor
Copy link
Contributor

jucor commented Feb 9, 2025

Reporting here a subtle internal math bug for tracking only.
To be clear: this does not affect the convergence of the PCA: the PCA is still valid at the end once it converges :)

But, if you ask for N power iterations, the current clojure code does N+1 iterations instead. This is due to the way the loop is coded in clojure in:

(loop [iters iters start-vector start-vector last-eigval 0]
(let [product-vector (xtxr data start-vector)
eigval (matrix/length product-vector)
normed (matrix/normalise product-vector)]
(if (or (= iters 0) (= eigval last-eigval))
normed
(recur (dec iters) normed eigval))))))

The let statement is evaluated even when iters is 0.
And on line 55 we return normed upon termination, therefore the result of evaluating the let block.

If we wanted to fix it to do exactly iters iteration, and not iters+1, line 55 should return start-vector instead of normed.

I suggest we do NOT fix this this for now, until I have full clarity of how partial-pca is computed. It will most matter much when we use a small number of iteration.

I see that

(defn partial-pca
"This function takes in the rating matrix, the current pca and a set of row indices and
computes the partial pca off of those, returning a lambda that will take the latest PCA
and make the update on that in case there have been other mini batch updates since started"
[mat pca indices & {:keys [n-comps iters learning-rate]
:or {n-comps 2 iters 10 learning-rate 0.01}}]

defaults to 10 iterations, but I am unclear how often that default is used, knowing that
(def base-conv-update-graph
"Base of all conversation updates; handles default update opts and does named matrix updating"
{:opts' (plmb/fnk [opts]
"Merge in opts with the following defaults"
;; TODO Answer and resolve this question:
;; QUESTION Does it make senes to have the defaults here or in the config.edn or both duplicated?
(merge {:n-comps 2 ; does our code even generalize to others?
:pca-iters 100
:base-iters 100
:base-k 100
:max-k 5
:group-iters 100
;; These three in particular we should be able to tune quickly
:max-ptpts 100000
:max-cmts 10000
:group-k-buffer 4}

defines a default of 100.

As to the python port in #1893 , for now I am reproducing the behaviour bug-for-bug, see 47c217b .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant