-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Package Design #1
Comments
Maybe one could make use of the general interface in AbstractMCMC for MCMC inference instead of targeting a specific backend. The implementation in EllipticalSliceSampling is already quite general and uses I did some (preliminary) work with log Gaussian Cox processes a while ago and used EllipticalSliceSampling for some toy inference problems, so I would be very interested if the API could support such models! It would definitely be good to design the API in a way that makes it easy to use different inference algorithms (e.g., I used ESS with circulant matrices (was a major annoyance...) but it would be very nice if one could easily switch to VI or INLA instead). I'll try to add some more thorough comments in the next days. |
One possible issue with your API is that it will be incompatible with likelihood requiring more than one GP, e.g. Softmax, heteroscedastic etc. From my experience it's better to generalize from the beginning to accept For the VI part, if you would like to incorporate the augmentation methods, the best interface for that would be to get a |
Hmm yeah, that's good point. I suppose I had imagined that this would be best handled by viewing multi-output GPs as a single GP with a particular kernel structure, and a particular input vector type. That might not be the case though. I'll open an issue on AbstractGPs about this, as whatever choice we make there will propagate through to this package.
Is it not sufficient to just implement a custom edit: additionally, does AugmentedGPs have support for allowing dependencies between the approximate posteriors over likelihood functions that accept multiple GPs? |
I am not sure. What I get via the augmentation is the analytical (stochastic if needed) gradient given the variational parameters, and in the non-stochastic case this translates into block coordinate ascent updates.
No it does not. It assumes mean-field between all GPs |
Okay. Well it sounds like there's quite a bit more going on in your
Ah okay. I would quite like to avoid making this assumption by default if possible. So the advantage of the everything-is-a-single-output-GP approach is that you get dense-by-default, with mean field as a special case. |
That was not my main point though. It was more about the other way around. For example for heteroscedastic regression you will want to have 2 GPs (correlated or not, with different means or not) but only have one output. |
Do you think its a good idea to start off by making this compatible with elliptical slice sampling as @devmotion suggested? We can probably fine tune the different aspects of the API while we do this. I think having a common interface for as many inference schemes as possible would help make this much more structured. Regarding |
Oh, I see. Well you can express those kinds of models in this framework by making the likelihood depend on more than one location in input-space. |
I'm totally on board with this, and it's straightforward to do this if you don't want to tune hyperparameters. I was definitely not thinking we would include stuff for doing inference in the kernel parameters in this package though -- that's something I had envisaged stitching together in GPML.jl, when we would also bring in the notion of priors over kernel parameters etc. My aim for where separation of concerns between this package and GPML.jl for this package to know nothing about kernel parameters. You just give it a kernel and a likelihood function, and defines all of the functionality
This seems sensible, as long as we're clear what the scope of what we're trying to do is. I'm pretty sure that all we really need to be able to do is sample from the prior over edit: |
If it doesn't seem right to do inference of kernel parameters in this package, we could just define the abstractions Other alternative would be to design simple inference in a notebook for now and later make it a part of |
@sharanry 's initial attempt at the above in #3 is great, but it and a comment from him on slack have got me wondering about what I proposed above, in particular whether just specifying the likelihood really makes sense. It's not really clear what the generative interpretation is, and requires us to move away from the Proposal v2.0A function rand(rng::AbstractRNG, d::LatentGP)
v = rand(rng, d.fx)
y = rand(rng, d.ϕ(v))
return (v=v, y=y)
end Note that we make no assumptions about the particular distribution that In the special case that More generally we can introduce particular types for It's also really clear what the function logpdf(d::LatentGP, y::NamedTuple{(:v, :y)})
return logpdf(d.fx, y.v) + logpdf(d.ϕ(obs.v), y.y)
end Most approximate inference schemes will be in the business of finding an approximation to the posterior over I think this solution is much cleaner anyway as it doesn't introduce any new API components over what we've already got from @sharanry @devmotion @theogf what do you make of this? edit: For example, a manual way to implement a diagonal Gaussian likelihood (I'm not sure why you would ever need to do this in practice, but it's a good example) would be using Distributions, AbstractGPs
f = GP(Matern52Kernel())
x = range(-5.0, 5.0; length=100)
latent_gp = LatentGP(f(x), v -> Product(Normal.(v, noise_var)))
y = rand(latent_gp)
y.y # length 100 vector
y.v # length 100 vector
logpdf(latent_gp, y) # this works and does what was described above We could also implement a latent_gp = LatentGP(f(x), GaussianConditional(noise_var)) Then you could e.g. dispatch on the |
It became apparent when discussing approximate inference with pseudo-points that the above design can be a little annoying. See here for details. @sharanry what are your thoughts on this? I think I've become even more convinced over time that it's a good idea. Would only be a small refactor but would make the user-experience way better I think. |
This is intended as a discussion issue where we can hash out an initial design for the package. The goal is to
None of this is set in stone, so please feel free to chime in with any thoughts you might have on the matter. In particular if you think that I've missed something obvious from the design that could restrict us down the line, now would be a good time to bring it up.
Background
In an ideal world, the API for GPs with non-Gaussian likelihoods would be "Turing" or "Soss", in the sense that we would just put a GP into a probabilistic programme, and figure out everything from there. This package, however, is not aiming for that level of generality. Rather it is aiming for the tried-and-tested GP + likelihood function API, and providing a robust and well-defined API + collection of approximate inference algorithms to deal with this.
API
Taking a bottom-up approach to design, my thinking is that the following basic structure should be sufficient for our needs:
where
f
is some GP whose inputs are of typeTx
,x
is some subtype ofAbstractVector{Tx}
,ϕ
is a function fromAbstractVector{<:Real}
toReal
that computes the log likelihood a particular sample fromf
atx
, andlog_density(fx, f) := logpdf(fx, f) + ϕ(f)
(it's not clear to me whether this function is ever non-trivial)This structure encompasses all of the standard things that you'll see in ML, but is a little more general, as the likelihood function isn't restricted to be independent over outputs. To make things convenient for users, we can set up a couple of common cases of
ϕ
such as factorised likelihoods: a type that specifies thatϕ(f) = sum(n -> ϕ[n](f[n]), eachindex(x))
, and special cases of likelihoods for classification etc (the various things implemented in GPML). I've not figured out exactly what special cases we want here, so we need to put some thought into that.This interface obviously precludes expressing that the likelihood is a function of entire sample paths from
f
-- see e.g. [1] for an instance of this kind of thing. I can't imagine this being too much of an issue as all of the techniques for actually working with such likelihoods necessarily involve discretising the function, which we can handle. This means that they can still be implemented in an only slightly more ugly manner. If this does turn out to be an actual issue for a number of users, we can always generalise the likelihood a bit.Note that this approach feels quite stan-like, in that it just requires the user to specify a likelihood function.
Approximate Inference + Approximate Inference Interface
This is the bit of the design that I'm least comfortable with. I think that we should focus on getting NUTS / ESS working in the first instance, but it's not at all clear to me what the appropriate interface is for approximate inference with MCMC, given that we're working outside of a PPL. In the first instance I would propose to simply provide well documented examples that show how to leverage the above structure in conjunction with e.g. AdvancedHMC to perform approximate inference. It's possible that we really only want to provide this functionality at the GPML.jl level, since you really need to include all of the parameters of the model, both the function
f
and any kernel parameters, to do anything meaningful.The variational inference setting is probably a bit clearer what to do, because you can meaningfully talk about ELBOs etc without talking too much about any kernel parameters. e.g. we might implement function along the lines of
elbo(fx, q)
, whereq
is some approximate posterior overf(x)
. It's going to be a little bit down the line before we start looking at this though, possibly we won't get to it at all over the summer, although it would definitely be good to look at how to get some of the stuff from AugmentedGaussianProcesses into this package. @theogf do you have any thoughts on the kinds of things that would be necessary from an interface-perspective to make this feasible?Summary
In short, this package is likely to be quite small for a while -- more or less just a single new type and some corresponding documentation while we consider MCMC. I would envisage that this package will come into its own when we really start going for variational inference a little bit further down the line.
@yebai @sharanry @devmotion @theogf -- I would appreciate your input.
[1] - Cotter, Simon L., et al. "MCMC methods for functions: modifying old algorithms to make them faster." Statistical Science (2013): 424-446.
The text was updated successfully, but these errors were encountered: