-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-pause for Captured Surface Switching (2nd edition) #15
Comments
Clarification - the |
Yes, the event will always fire. I just added it to the list of steps for auto-pause to show when it is supposed to happen in that case. |
A few questions:
I would tend to go with the following approach:
Wdyt? |
FWIW, I'd like to go to a place where when sink and track are synchronised (MSTP and MST in a worker), we would guarantee that frames emition and events are synchronised. |
Youenn, something to consider - if a track is in
|
To clarify the second bullet in my previous comment with an example - assume the app wants to show the user a dialog reading "are you sure?" You don't want to do it for every clone. So you need to catch the first event, whichever track it fires on, and then know to ignore subsequent events on other tracks, while not accidentally missing truly new events on any of these tracks. In that case, having an event handler on the controller is easier. |
Say X is used for local playback and Y is used for sending to the network, we will want X to continue displaying (ignore the event) and Y to stop sending (register the event and call replaceTrack to Say now we only have X which is transfered to a worker. Wrt dealing with several events, this is not too difficult to manage:
|
Let me comment on the OP first to narrow down what is being proposed.
Do you mean addEventListener? (EventHandlers are attributes the guidelines recommend adding as well,
This seems like a foot gun, and violates Media Capture and Streams which designates Having the browser set this attribute can race with the application. Worst case, mistakenly inverted application logic might cause others to see the user when they shouldn't. Other ideas we discussed (and I expressed support for), like postSurfaceSwitchingHandler does not have this problem. In that model, no state is touched, the default behavior is to inject (which I prefer), with late JS action required to pause. Was it considered?
Step 2 and 3 seem to invite confusion over whether a source or a track feature is being proposed. |
Also, what happened to avoiding black frames? |
Replying to @youennf
I put it on the controller as the default place for screen-capture-related functionality, but the track might be a better place for it. I think it should be manageable for applications centralize the control of when tracks should be re-enabled to one of the event-listeners as you propose when it is needed.
Yes, that might be better. That would also allow applications to not disable tracks where it is not needed.
Yes, that might be better. If going with
I like the more explicit
Yes, I think this should work. Replying to @jan-ivar
Yes, sorry, I meant
What do you think about Youenn’s suggestion of letting the application set
Yes, it was considered but as I replied to you in that comment thread, I think an event-based model using the enabled-state might be a better option:
Yes, it might be better and more consistent to have the event on the MediaStreamTrack as Youenn suggests. |
postSurfaceSwitchingHandler exposed new functionality to temporarily halt frame delivery. It also allowed asynchronous delay, e.g. to support awaiting Should we perhaps nail down desired behaviors first? |
If we go down with asynchronicity, I would go with a callback at capture controller level. I described some variants at: #4 (comment). If we prefer synchronicity, I would go with configurationchange on the track, which seems more convenient since using getSettings() might be helpful for the web application to decide what to do: continue, stop, prompt user... |
I think this specificity is great; otherwise, one might expect it to trigger on navigation too.
Why is it a footgun?
I don't think that's a contradiction here. If the UA were to set |
The One or more handlers can be added to the
If there is at least one handler registered, a surface switch goes through the following steps:
@jan-ivar,
|
I think this proposal is ok. No need for a boolean flag in this case.
We would expose My assessment is that both approaches solve the issue.
|
I think I like the However, I do have concerns about black frames being emitted when using |
Let's discuss this in this thread. I do not think we would want to pause frame generation globally until user decides, as web page might want to have a live preview for user to decide what to do. There might be a use for asynchronously pausing on a per track's scope but this would not work with the current For pausing RTCRtpSender without black frames in a synchronous way, I think the principle would be the following:
|
Ah, that seems to work! To pause frame delivery on the track-level, there is also the option of using |
Yes, but is it a bug? See w3c/webrtc-pc#3014. This is why I recommended we nail down requirements first. Use cases > desired behavior > API |
I think there is consensus on the requirement to control when new surface frames are sent via a peer connection. One discussion point is whether the synchronous event model is sufficient or not to support this requirement (this is what we are debating here). A second discussion point is whether it is better to pause at track level or to pause globally. Wrt pausing with the synchronous model, two solutions have been found so far (MSTP/VTG and track.ended). A third solution is to use We could check how implementations are doing by racing setParameters and postMessaging to a worker to enqueue a VideoFrame in a VTG stream. It seems getting interop there might be a good idea. I would also think that calling MediaRecorder.pause() synchronously would work. FWIW, choosing the
AFAIK, all browsers behave the same here. |
This issue is in screen capture so there are 4 sinks to consider: HTMLVideoElement, MediaRecorder, RTCRtpSender, and MediaStreamTrackProcessor. Is the requirement the same for these?
We should to decide which of these two behaviors we want. Tracks can't pause today. It might conceptually generalize to e.g. readwrite What we seem to be talking about as the synchronous model is ways for web developers to make due without these. But the workarounds web developers would need to use on the regular seem borderline unreasonable (clone, stop, replaceTrack-hop, or MSTP+VTG worker roundtrip). So far no other sources need pausing. So the only benefit would be to pause some tracks longer than others. This benefit seems marginal to me. Extending pausing for a subset of tracks also seems possible through the same workarounds floated. Captured Surface Switching happens at the source and is unique to this source. So pausing where the switch happens seems inherently simpler for web developers to understand. |
I would think so. Asynchronicity is needed for MediaStreamTrackProcessor with a CaptureController callback. There is no such requirement if the notification happens at the track level.
If you want to freeze instead of mute, there are solutions as illustrated above, some of which are only a few lines of code.
MSTP is one example that shows that notification at the track level is actually simpler. Also, |
If we allowed
There's a dependency here on w3c/webrtc-pc#3014.
There's a dependency here on (a new issue that needs to be filed following) w3c/mediacapture-screen-share#306.
That might work. |
For ending track, not for using setParameters to set
|
It seems we could try presenting the result of this discussion (the two options) at next F2F.
|
I have a slot booked at the next working group meeting for this topic. |
After further deliberations, I believe that it would be easier for applications to be notified by the CaptureController and to allow them to pause the capture at the source level. Ensuring that all tracks are paused individually puts an extra burden on application developers when tracks are cloned. Furthermore, the setup an application needs to do may have a wider scope than individual tracks, e.g., updates to the UI. From the discussion in this thread, it also seems more inline with existing semantics to pause on the source level. We could model this on
Besides being easier to understand and use than returning a promise, this approach would also give applications the flexibility to use other methods for pausing if they so wish, e.g., setting |
One usecase is to protect user from oversharing the display stream during a VC. The granularity in that use case is track, not source.
All tracks paused individually offers more flexibility as well which is useful for the above usecase. Please also note that if track is in a worker, it is more convenient to be notified in the worker. Thoughts? |
We should prioritize simple, fool-proof solutions to real problems that have been highlighted by Web developers as pressing, while still leaving ourselves the flexibility to address more nuanced issues in the future, if the need arises. Fool-proof: Real: Simple: By way of illustration of the coordination issue, suppose a track
Applications that need to only take action once per change, as the hypothetical app that wants to show a dialog, would find it non-trivial and error-prone to catch the first For that reason, I think the MVP is:
|
Wouldn't it be convenient for users to see a preview of what they have picked to answer the "are you sure?" question in an informed manner? Exposing an API to allow CaptureController to mute all tracks it is related to is orthogonal to the discussion we have on the API design.
This pattern is being used by real web sites using data channel and WebCodecs for video conferencing.
This has been discussed earlier in the thread, a simple solution was illustrated at #15 (comment).
Within |
The idea is that you'd get all tracks, video and audio, auto-paused, and then selectively unpause just the track used for the preview.
Sorry, that's not my proposal. I am not (currently) proposing a knob that the app can manipulate. Rather, I am saying:
Could you name a few such sites? I'd love to get their feedback.
Definitely. But future use-cases can be addressed by future extensions, while existing use-cases require more immediate solutions.
Note that I did NOT place
Yes. Good point. |
I can get behind this, if by "pause" you mean halting frame delivery rather than mute. Frame delivery is a property of the source today, so keeping a "pause" concept entirely contained there, and distinct from today's track states seems clean and appealing to me.
That seems wrong here. Muted is not unused in screen capture. This could work: controller.surfaceSwitchingCallback = () => {
controller.paused = true;
setup().then(controller.paused = false);
}); It would would NOT fire muted. @tovepet is that still close to what you proposed? My feedback then is that subtle details might be unobvious with this API:
Therefore, I think I'd still prefer the promise version which doesn't have these problems or subtleties: controller.surfaceSwitchingCallback = async () => {
await setup();
}); |
This seems solvable regardless of API. |
The purpose of this API is to ensure frames are not sent to a sink such as a PeerConnection between the time that the user changes the source, and the time that the application learns of the change and brings up a prompt. If there is a way to achieve this without this feature, I am not aware of it. Could you please explain? |
I think @jan-ivar meant that whatever API flavour, this seems doable. |
It does not seem that we are converging, we have more proposals than ever in this thread. Here are a bunch of questions that might be helpful for the above question and how it could interact with
|
It felt like we were quite close to converging 2-3 weeks ago, where everyone seemed to find the model with a callback on the CaptureController at least acceptable if maybe not the preferred model. After this, there has been a discussion about if If we start with the CaptureController-callback model and also emit I’ll need to get back to you on the other questions raised the last couple of days. |
Yes.
From an application point of view, I think the following order would be preferable for tracks on the main thread:
For tracks in worker threads, we might want to relax the ordering somewhat to avoid excessive synchronization.
If we want to support the distributed use case, we need to extend the auto-pause guarantee so that the
Maybe. Exposing deviceIds is one way of informing applications that a |
Yes. This is why I think a callback is appropriate. It's a synchronization window with two-way guarantees:
Events normally don't carry the second guarantee. The configurationchange event is also subject to "fuzzing", making it a questionable event to overload for timing sensitive information.
No, I'd leave configurationchange alone. It has none of these guarantees. Fires after. Anything else can be polyfilled: controller.surfaceSwitchingCallback = async () => {
await new Promise(r => referenceTrack.onconfigurationchange = r);
worker.postMessage({paused: referenceTrack.getSettings()});
await new Promise(r => worker.onmessage = r);
}); An upstream frame delay callback untangled from tracks. KISS |
The fuzzing note is related to device label change, which applies to all capture sessions using that device. Updating the settings before the callback makes sense to me, and the only way currently is for
I don't think the added delay of executing a few tasks will change much in practice (OS may take some time to switch capture, and web app wants user input which may take some time anyway).
There are plenty of events that carry a window of opportunity, click events to trigger As of the second guarantee, it is only useful for asynchronous pausing. Please also note that the use cases are all about pausing the sending of video frames to specific track sinks, not to pause the source. Conceptually, a callback on CaptureController (which represents the source) is misaligned with the presented use cases. |
My concern with this ordering is how applications know that the user is switching surfaces at the time of the Otherwise I think we should reverse the order of these tasks. |
The idea is to reuse |
A track's label is neither dynamic nor part of the "track's capabilities and settings" . AFAIK this event exists for downstream sinks to adapt their consumption of the track to its evolved properties (e.g. new dimensions or skip applying blur). The event won't even fire when users switch capture between tabs in the same window, because the configuration (e.g. resolution) will stay the same (w3c/mediacapture-screen-share#308 might change this). Nice separation of concerns or foot-gun? YMMV. I'm no fan of events that require websites to decode their cause, because some inevitably forget to anticipate all causes.
Agreed. Just unfortunate for the
Can you elaborate on why that makes sense to you? A singular upstream switch might cause a fan-out reaction of events on track clones potentially across multiple workers and main thread here. Seems simpler for websites to keep track of what's going on (no pun intended) if they're informed of the cause first. Doubly so if there are no events (no net configuration change). |
Fuzzing might be needed if two websites capture the same window (or different tabs of the same browser window) at the same time. Otherwise the user might be correlated across origins when they resize the window. If this fuzzing is significant, it might even interfere with firing the same event from a source switch. |
Are you merely proposing an order in which tasks are queued? If so, what "added delay of executing" are you referring to?
My understanding is both guarantees are needed to prevent frames resuming in the background media thread here: let {deviceId} = track.getSettings();
track.onconfigurationchange = e => {
if (deviceId != track.getSettings().deviceId) {
deviceId = track.getSettings().deviceId;
track.enabled = false;
}
} IOW, the UA MUST take JS state on callback return into account when deciding to resume frames in the background thread. I.e. it must block frames on main thread.
It's not a state as far as the web developer is concerned, merely a delay function. At least no more than a transform is a state.
The cause is at the source. Conceptually, it seems simpler for web developers to gate frames from a switch at the source. |
Sure, the question is about blocking for more than the current task, i.e. during a time that the web app is controlling.
It is a state if we introduce asynchronous pausing controlled by the web page.
In this particular case, user decided to switch one of the source, not the other. So the In general though, the fuzzing is not protecting the user since the two windows can easily correlate the user with the media content itself. This is the big difference with fuzzing
Having the old settings is a potential footgun and would probably come as a surprise to web developers. |
How so when |
This issue had an associated resolution in WebRTC December 2024 meeting – 10 December 2024 (Back to Auto-pause):
|
Note: This is a new issue separated out from #4
Some browsers allow users to switch an ongoing screen-capture to instead share another surface. When this happens, applications may need to react to the change of surface to, e.g.,
For some applications it is also important to separate frames delivered for one surface from frames delivered for another source, e.g., if recording different surfaces to different files.
To solve these issues, we propose the following:
CaptureController
that is emitted each time a user has switched to a new captured surface:autoPause
option is added togetDisplayMedia
:If
autoPause
istrue
, a surface switch goes through the following steps:enabled = false
for all tracks associated with the capture.surfaceswitch
event.The text was updated successfully, but these errors were encountered: