Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminology: Anchors, Trackables, etc #4

Open
speigg opened this issue Apr 3, 2018 · 7 comments
Open

Terminology: Anchors, Trackables, etc #4

speigg opened this issue Apr 3, 2018 · 7 comments

Comments

@speigg
Copy link

speigg commented Apr 3, 2018

I'm creating this issue to document the differences in terminology/semantics of things like Anchors & Trackables between various platforms, and to start a discussion on what terminology/semantics we ultimately want to adopt here. The main platforms I am looking at are ARCore, ARKit, Windows Mixed Reality (WMR), and Vuforia (which attempts to provide a cross platform API that supports the previous three platforms):

ARCore

  • Trackables represent objects that can be individually tracked and which Anchors can be attached to
  • Anchors represent fixed locations relative to a particular Trackable
  • Both Anchors and Trackables have a separate TrackingState (paused, stopped, or tracking)
  • Plane/Point tracking is represented as subclasses of Trackable
  • Each hitTest() result includes any Trackable that was hit
  • Anchors can be created directly from hits, allowing, for example, an Anchor that is attached to a Trackable at the point of contact (with the Anchor independently being updated by the system)

ARKit

  • Anchors can be any location relative to the session coordinate system
  • Anchor subclasses may (or may not) conform to a Trackable protocol, which exposes:
    • "isTracked": if true, the pose is valid, meaning the Anchor is actively being tracked
  • Presumably, if an Anchor subclass does not implement the Trackable protocol, one can assume that the Anchor represents a fixed location in the world
  • Plane/Image/Face recognition/tracking are represented as subclasses of Anchor, though not all of these are not necessarily Trackable:
    • Plane/Image Anchors are not Trackable
    • Face Anchors are Trackable
    • Image Anchors could easily become Trackable in future ARKit updates, simply by adding an “isTracked” property
  • Each hitTest() result includes any Anchor that was hit (whether or not it is Trackable)
  • There is no way to create an Anchor that is attached to another Anchor

WMR

  • Anchors are fixed locations in the world which can be persisted between sessions, even after the device has been shut down. Anchors can also be shared with other devices.
  • Tracked objects, surface geometry, and hit tests are handled completely separately from any Anchor concept
    • Anchor and surface mesh types are unified through a "CoordinateSystem" property

Vuforia

  • Anchors can be created from a pose (world coordinates) or from a hit test result
  • Anchors are a subclass of Trackable
  • Image/Object/Model recognition/tracking are handled as subclasses of Trackable (not as subclasses ofAnchor)

It seems that there are basically three ways of defining anchors:

  1. fixed location in the world, period (WMR)
  2. fixed location relative to something else in the world (which might not be fixed) (ARCore, Vuforia)
  3. any location in the world that is maintained by the system, fixed or not (ARKit)

So given these differences, here are some things to discuss:

  • Do we want to consider object recognition/tracking and surface geometry as being potential anchors?
  • If anchors are fixed locations, do we want a separateXRTrackable? Should XRAnchors be a special type of XRTrackable?
  • How do we distinguish between trackable and non-trackable things (things can be recognized but not continually tracked)? Does it make sense for an XRTrackable to represent something that can only be recognized once but isn't actually trackable (e.g., setting it's "trackingState" to "paused" or "unknown"?
  • Do we want to support the use-case of having anchors be relative to movable things (e.g., other anchors/trackables)? This is useful for placing anchors at hit test intersections on tracked/movable objects.
  • Do we want to consider an API for persisting/restoring anchors? If so, it may be confusing if only some types of anchors can be persisted while others cannot.
@speigg speigg changed the title Terminology: What is an XRAnchor? Terminology: Anchors, Trackables, etc Apr 3, 2018
@toji
Copy link
Member

toji commented Apr 4, 2018

Thanks @speigg! This kind of per-platform breakdown is extremely valuable for individuals like myself who haven't been actively working with the full spectrum of AR devices.

@blairmacintyre
Copy link
Contributor

Thanks @speigg that's super useful.

Of this list:

  1. fixed location in the world, period (WMR)
    2. fixed location relative to something else in the world (which might not be fixed) (ARCore, Vuforia)
    3. any location in the world that is maintained by the system, fixed or not (ARKit)

I think my mental model model has been some mix of 2 and 3, in that I think of anchors as being

(a) locations relative to something in the world,
(b) have assumed those "somethings" include "the world" (i.e., can just be a fixed location in space relative to the currently known world), and
(c) the "somethings" will eventually include Trackables

I am comfortable with either the ARCore (all anchors are relative to something "trackable") or ARKit (some anchors are "trackable") models. I think I prefer the later, tho, since it feels simpler to me.

For WebXR, I've been assuming that we will implement something like these models on top of the underlying system, and will thus be able to provide a common semantics on all platforms, such as

(a) there is a world/session set of coordinates on any frame that the view is expressed relative to, but isn't guaranteed to be valid outside the one rAF
(b) anchors are locations relative to something (trackables, the world). Is there anything anchors can be relative to that isn't "the world" or something that is tracked relative to the world?
(c) anchors relationships can be nested and will update appropriately (internally, for example, this can be implemented in the obvious way with a platform anchor at the base) (it may be that we only want to have a single level of nesting -- e.g., I can have an anchor relative to another anchor that is relative to something in the world).

In light of that, you questions:

Do we want to consider object recognition/tracking and surface geometry as being potential anchors?

I think we want to assume that will happen eventually. It will be much easier for the programmer if all "locations that the system keeps track of and informs you about" are one "thing", and we will want to support "tracked stuff" eventually (even if through a separate API).

"Anchors" may be the base concept that allows another API to integrate with WebXR efficiently and cleanly.

If anchors are fixed locations, do we want a separateXRTrackable? Should XRAnchors be a special type of XRTrackable?

My personal preference is to have "trackables" as a subclass of "anchor", but I don't feel strongly one way or the other.

The way this would manifest itself to programmers is the question "Is this anchor currently valid?" (i.e., if it's a tracked thing, do I have a current value for it?". I could imagine this being implemented in different ways on different platforms, but for any, I'd hope there would be a way to know "what was the last valid location, what was the timestap for that location?".

So, the big manifestation will be that either a programmer says
if (anchor1.isTracked && anchor1.isValid) {...} (assume all things trackable)
or
if (anchor1 instanceof XRTrackable && anchor1.isValid) {...} (some things trackable)

How do we distinguish between trackable and non-trackable things (things can be recognized but not continually tracked)? Does it make sense for an XRTrackable to represent something that can only be recognized once but isn't actually trackable (e.g., setting it's "trackingState" to "paused" or "unknown"?

Property or subclass, I think.

On one hand, the programmer should be assuming that all anchors can change every frame, and programming appropriately.

There seems to be a difference between something that is actively tracked when sensed, and things like hunks of the world that system has inferred and updates (ie., planes and meshes).

I think right now, these systems don't expose any notion of "I can see that hunk of the world, or I can't see it" to the programmer; the nature of SLAM/etc make that a weird thing to do. So, the anchors attached to display/hidden parts of the world are still reported "to the best of its ability" or they are removed.

Trackable things, like a hand or marker or object or face, are different, since they are inherently "unknown" as soon as they aren't seen by the device cameras. How often this happens may vary per platform (E.g., an hmd with cameras pointing in all directions may track things much more robustly than a phone with the 2 narrow-fov front/back cameras).

For these things, I don't think we want to destroy the anchors when objects are lost. We want to indicate they aren't tracked, and maintain some state (last known location, etc).

BUT, this raises an interesting question (that I've been dealing with while playing computer vision in WebXR): what is that last known location relative TO? It is expressed at any point relative to the current world coordinates, but as the SLAM/vIO model updates, that system may change. So if we do this, we either make that location "queriable" and maintain some internals state (e.g., another system anchor we create) to provide a hook for determining where this "last known location" is relative to the current state, OR we marker the location as invalid when it's not tracked and tell the programmer its probably invalid.

The later seems problematic (i.e., programmers are likely to use it if they want to indicate last known location, independent of what you tell them), but it's not terrible in that programmers who want to know where something was last seen (as opposed, for example, to knowing it's simply "not seen") can also drop and anchor themselves near where the object is.

Do we want to support the use-case of having anchors be relative to movable things (e.g., other anchors/trackables)? This is useful for placing anchors at hit test intersections on tracked/movable objects.

I think so. If the system doesn't support it directly, it's easy to implement as a fixed transform relative to the one trackable thing.

Do we want to consider an API for persisting/restoring anchors? If so, it may be confusing if only some types of anchors can be persisted while others cannot.

Eventually, perhaps, but I suspect this will be a separate API. It would require a lot of infrastructure that's not available yet (e.g., either the ability to share these across platforms, or some 3rd party "ARCloud" thing, that opens a whole different can of worms).

I think that initially we won't support trackables or persistence, but need to design such that we don't close those off.

@judax
Copy link
Contributor

judax commented Apr 10, 2018

Thank you @speigg , a really great breakdown on the state of Anchors and Trackables in some of the most used platforms.

First of all, one question to everyone in this thread: Do we all agree that arbitrary 3D anchors are a necessary component in world understanding based tracking systems no matter what kind of trackable objects can be identified?

My point is that if that is the case, then at least we can start agreeing that there is the need, no matter the underlying system world understanding, to at least be able to create world level anchors that are updated by the system with no relation to any trackable (unless we understand the user POV or camera as a trackable).

On the more forward looking matter of Trackables, I have a slightly different POV. I see anchors as always being world level arbitrary data structures to represent a pose. In the case where systems are able to understand the world with more detail than just tracking the user POV/camera, then an anchor is created "relative" to a Trackable object. And that, to me, is a composition relationship. An Anchor knows the trackable it is relative to and the trackable will influence the pose represented by the anchor (the same way the camera influences an arbitrary anchor). This way is easy to know if an anchor is relative to a Trackable: Does it include a trackable? And this way the relationship is very clean and easy to expand in the future where more type of trackables can be created completely independent of the concept of anchor.

@thetuvix
Copy link
Contributor

This is great! Thanks for pulling together this breakdown!

To help fill out the WMR section, I'd add an extra bullet around WMR's CoordinateSystem concept (the SpatialCoordinateSystem type). As you point out, the Anchor pin-in-the-world concept (SpatialAnchor type) is treated separately from the SurfaceMesh world-understanding concept (SpatialSurfaceMesh type), but they all unify through a "has-a" relationship with the SpatialCoordinateSystem, by exposing a CoordinateSystem property. That CoordinateSystem concept serves the same unifying purpose as ARKit's Anchor concept.

We have a similar unifying basis already in WebXR through the XRCoordinateSystem type, which today is the common currency for every API that returns or accepts coordinate systems. The WebXR design thus far to represent more specific entities like stages has been through "is-a" relationships instead of "has-a", for example the XRFrameOfReference subtype of XRCoordinateSystem that establishes the stage bounds.

In that world, a strawman "is-a" type breakdown for some of our known objects could be:

  • XRCoordinateSystem (perhaps with a more evocative name like XRSpace, etc.)
    • XRStage (with bounds attribute, etc.)
    • XRAnchor
      • XRPlaneAnchor (with extents attribute, etc.)
      • XRFaceAnchor (with getGeometry() method, etc.)
      • XRMeshAnchor (with getVertexBuffer() method, etc.)

If we wanted to pursue a "has-a" model, we could do something this:

  • XRCoordinateSystem

    • XRStage (with bounds attribute, etc.)
  • XRAnchor (or XRTrackable, etc. - each of these types has attributes of type XRCoordinateSystem as needed)

    • XRPlaneAnchor (with extents attribute, etc.)
    • XRFaceAnchor (with getGeometry() method, etc.)
    • XRMeshAnchor (with getVertexBuffer() method, etc.)

Around the notion of how we represent tracked objects or not, the current WebXR API design assumes that each XRCoordinateSystem represents a stationary object, where at most the system will adjust its location over time as tracking improves. This is because you must relate coordinate systems by calling XRCoordinateSystem.getTransformTo(XRCoordinateSystem other), which does not accept an XRPresentationFrame or other mechanism for time-indexing. Since you can't reliably index for this frame's photon time or an input event's timestamp, you cannot use XRCoordinateSystems to productively relate moving objects.

If we augment XRCoordinateSystem.getTransformTo to accept an XRPresentationFrame for time-indexing (and given out interpolated XRPresentationFrames on historical input events), an XRCoordinateSystem can now track a dynamically-moving object, such as an ARKit face anchor or motion controller.

@machenmusik
Copy link

Even before we settle on nomenclature etc., I'd suggest that we modify the explainer to either mention other AR frameworks besides ARKit, or remove specific references until we sort things out.

@blairmacintyre recently pointed me at this repo discussion, and upon first reading the explainer, I was going to immediately create an issue much like this one; would be better not to give the wrong first impression.

@thetuvix
Copy link
Contributor

See webxr #384 for a proposed tweak to XRCoordinateSystem that would enable either the is-a or has-a model I describe above.

With that change, XRCoordinateSystem itself (perhaps with a rename to something concise like XRSpace) can serve as the unifying currency underlying the poses of stages, freespace anchors, plane anchors, face anchors, mesh anchors, heads, hands, etc., whether those objects require time-indexing to align with photon time or not.

@thetuvix
Copy link
Contributor

See #34 for bikeshedding around XRAnchor.detach() specifically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants