-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
difference in n_eff of population mean depending on model specification #405
Comments
Hi Chris. This is a major topic in my book and course, the centered vs non-centered parameterizations of multilevel priors. In the book, see Chapter 13. In most recent lectures, lecture 13. Here's a linked set to the beginning of the relevant section: https://www.youtube.com/watch?v=sgqMkZeslxA&list=PLDcUM9US4XdPz-KxHM4XHt7uUVGWWVSus&index=13&t=2432s Lecture 14 has more content on the topic, including some technical aspects with multivariate priors. |
Thank you! These are exactly the bits of your lecture series I watched last night which led to my edit - which has taught me a valuable lesson about not taking shortcuts 😅. Because of various factors, including brms using non-centered parameterization as the default, I assumed it was strictly better than the alternative. I understand now that my type of data (many trials, approximately equal trial numbers for each condition for each participant) might be an example of a case where it is not. Out of curiosity, is there any intuition for why the non-centered parameterization does worse in such data dominant cases? I mean to say: I understand the opposite, where unnesting allows for more efficient exploration of e.g. Neal's funnel-type posteriors, but how can the unnesting of parameters make the posterior harder for the MCMC chains to explore? Is it because the chains spend a lot of time in low-probability areas when the exploration of e.g. a_bar is made independent from the lower level parameters? I clearly have some more reading to do, so I shall return to the relevant chapters of your book. Thanks again! |
Michael Betancourt has a long post about this: https://betanalpha.github.io/assets/case_studies/hierarchical_modeling.html I didn't review it this morning, but if I remember right, you get an inverted funnel problem when the likelihood is strong and use non-centered. |
EDIT: after playing around with this and reading up a bit more I understand now that non-centering (which I suppose
m2
is a clumsy attempt of, but the same thing happens withtranspars> vector[I]:a <<- abar + z*asigma
) isn't always good for your sampling, which is the mistaken idea I was laboring under. Feel free to disregard the question below if it seems silly or trivial.I'll admit upfront that I ask this question mostly out of curiosity. I'm not entirely sure it's appropriate to post this here, since this doesn't concern a code issue, but I don't really know what a more appropriate place would be so I figured I'd drop a question here:
I'm using ulam alongside some other stan frontends (e.g. brms) to model data obtained from multiple participants in a hierarchical way, with observations nested in participants. What I noticed is that, depending on the model specification, the effective n for the population mean can vary widely.
For a reproducable example I coded up a rudimentary simulation of participants drawn from a population to illustrate my point:
I can model this data in (at least) two equivalent ways:
These two models have the same parameters and reach the same estimates of all coefficients, including abar. However, in my testing the
n_eff
for abar in model 1 is ~ 8000, while it only reaches 15(!) in model 2. While they reach the same conclusions numerically, I worry about the stability this has for posterior predictive checks etc.I'm sure there's something obvious I'm missing, but I'm curious if there are known reasons for this discrepancy. Of course generally I would be happy to just always specify my model as seen in m1, but for more tricky models with multiple varying effects, or non-centered parametrizations, as I understand it, I have no choice but to 0-center the effects, which always leads to low sampling efficiency for these group means in my experience. Any advice would be appreciated!
The text was updated successfully, but these errors were encountered: