DirectSound full duplex produces a single glitch shortly after stream starts #770

dechamps · 2023-02-19T16:49:23Z

The DirectSound Host API is glitchy when used in full duplex mode (i.e. when both input and output devices are used in a single PortAudio stream) when using reasonable parameters (default frames per buffer, default high suggested latency).

Half duplex is fine.

Example output from paloopback -r48000 -s1024 --inputLatency 240 --outputLatency 240 -w:

************ Mode = Two Streams (Half Duplex) ************
|-   requested  -|-  stream info latency  -|- measured ------------------------------
|-sRate-|-fr/buf-|- in    - out   - total -|- over/under/calls for in, out -|- frm/buf -|-latency-|- channel results -
| 48000 |   1024 |   21.33  240.00  261.33 |    0/   0/ 469,    0/   0/ 470 | 1024-1024 |  297.29 | OK - #errs = 0
| 48000 |   1024 |   21.33  240.00  261.33 |    0/   0/ 469,    0/   0/ 470 | 1024-1024 |  298.04 | OK - #errs = 0
| 48000 |   1024 |   21.33  240.00  261.33 |    0/   0/ 469,    0/   0/ 470 | 1024-1024 |  298.52 | OK - #errs = 0
| 48000 |   1024 |   21.33  240.00  261.33 |    0/   0/ 469,    0/   0/ 470 | 1024-1024 |  295.52 | OK - #errs = 0
| 48000 |   1024 |   21.33  240.00  261.33 |    0/   0/ 469,    0/   0/ 470 | 1024-1024 |  296.77 | OK - #errs = 0

************ Mode = One Stream (Full Duplex) ************
|-   requested  -|-  stream info latency  -|- measured ------------------------------
|-sRate-|-fr/buf-|- in    - out   - total -|- over/under/calls for in, out -|- frm/buf -|-latency-|- channel results -
| 48000 |   1024 |   21.33  261.33  282.67 |    0/   0/ 469,    0/   0/ 469 | 1024-1024 |  295.40 | channel 0 POP 0.460 at 15220, ".\paloopback_0.wav", channel 1 POP 0.453 at 15220, ".\paloopback_1.wav",  - #errs = 0
| 48000 |   1024 |   21.33  261.33  282.67 |    0/   0/ 469,    0/   0/ 469 | 1024-1024 |  299.40 | channel 0 POP 0.460 at 15410, ".\paloopback_2.wav", channel 1 POP 0.453 at 15410, ".\paloopback_3.wav",  - #errs = 1
| 48000 |   1024 |   21.33  261.33  282.67 |    0/   0/ 469,    0/   0/ 469 | 1024-1024 |  299.44 | channel 0 POP 0.460 at 15412, ".\paloopback_4.wav", channel 1 POP 0.453 at 15412, ".\paloopback_5.wav",  - #errs = 2
| 48000 |   1024 |   21.33  261.33  282.67 |    0/   0/ 469,    0/   0/ 469 | 1024-1024 |  298.60 | channel 0 POP 0.460 at 15373, ".\paloopback_6.wav", channel 1 POP 0.453 at 15373, ".\paloopback_7.wav",  - #errs = 3
| 48000 |   1024 |   21.33  261.33  282.67 |    0/   0/ 469,    0/   0/ 469 | 1024-1024 |  296.06 | channel 0 POP 0.460 at 15252, ".\paloopback_8.wav", channel 1 POP 0.453 at 15252, ".\paloopback_9.wav",  - #errs = 4

A quick look at the recorded test signal confirms the issue:

MME passes the same test with flying colors.

PortAudio version: latest master (4bccdfb)
Windows 11 22H2 22621.1265
Loopback audio device: Virtual Audio Cable 4.67.0.12554

The text was updated successfully, but these errors were encountered:

dechamps · 2023-02-19T17:01:00Z

Actually, closer investigation of the recordings reveal that there is only a single glitch in each stream, happening exactly 20 ms after the start of the test signal. After that single glitch, a 10-second stream appears to be glitch-free:

This likely explains why no-one is complaining about this - a single glitch at the beginning of the stream is arguably benign, although this is still a bug that should be fixed (if only so that we can get proper results from paloopback).

dechamps · 2023-02-19T17:52:19Z

I was wondering if this could be related to the use of DirectSoundFullDuplexCreate8, but this doesn't seem to be the case: the behaviour is exactly the same if I undef PAWIN_USE_DIRECTSOUNDFULLDUPLEXCREATE.

dechamps · 2023-02-19T18:02:23Z

If I record the audio going through the loopback device using Audacity at the same time as the paloopback test is running, the glitch appears in the separately recorded audio. This indicates that the glitch is produced in the PortAudio → device (playback) direction.

RossBencina · 2023-02-20T23:09:54Z

Hi Etienne, please update your issue report to specify Windows version and hardware setup.

Can you consistently reproduce this on multiple systems with differing audio hardware?

I have never observed such behavior myself, nor do I recall anyone report in over the long history of PortAudio (DirectSound was one of the first host API implementations). This makes me think that it is possibly a regression in recent Windows versions, or with specific drivers.

dechamps · 2023-02-20T23:21:04Z

please update your issue report to specify Windows version and hardware setup.

I already did, see bottom of my first post.

Can you consistently reproduce this on multiple systems with differing audio hardware?

I only tried Virtual Audio Cable so far. I guess I could try other VAC KS modes or some real hardware. The fact that both MME and DS half duplex are working fine would seem to point away from the hardware, though.

I have never observed such behavior myself, nor do I recall anyone report in over the long history of PortAudio

It's possible it was always there but that no-one noticed it - a glitch happening in the first 20 ms could be quite hard to notice in real usage, especially given that the first 20 ms of streamed audio would probably be silence in many scenarios. Also an half-duplex application would not trigger it. I only noticed this problem when I saw paloopback fail.

RossBencina · 2023-02-20T23:40:34Z

I'm not saying it's not a bug, just that from my point of view PortAudio is not the most likely culprit.

Please at least test with real hardware.

dechamps · 2023-02-24T19:52:23Z

I'm still investigating but so far I am getting very suspicious there is a problem in the interaction between the DS host API code and the AdaptingProcess() buffer processor code.

portaudio/src/hostapi/dsound/pa_win_ds.c

Lines 2617 to 2626 in d07842c

    
           numFrames = PaUtil_EndBufferProcessing( &stream->bufferProcessor, &stream->callbackResult ); 
        
           stream->framesWritten += numFrames; 
        
           if( stream->bufferProcessor.outputChannelCount > 0 ) 
        
           { 
        
           /* FIXME: an underflow could happen here */ 
        
           /* Update our buffer offset and unlock sound buffer */ 
        
               bytesProcessed = numFrames * stream->outputFrameSizeBytes; 
        
               stream->outputBufferWriteOffsetBytes = (stream->outputBufferWriteOffsetBytes + bytesProcessed) % stream->outputBufferSizeBytes;

The return value of PaUtil_EndBufferProcessing() is somewhat vaguely documented as the "number of frames processed". The DS host code implicitly assumes that this is the number of frames that were read from the host input buffer, and also the number of frames that were written to the host output buffer. The DS host code increments the input and output buffer offsets accordingly, so that on the next TimeSlice() call we will process the next portions of both buffers.

The buffer processor business logic that is used here, specifically AdaptingProcess(), ultimately returns as the "number of frames processed" the number of input frames that were consumed. With regard to the DS caller code, this works as long as the number of output frames that were written is always exactly equal to the number of input frames read.

The problem is, I believe I have found a scenario in which that isn't the case. Here's how things actually unfold in reality, assuming framesPerBuffer 1024 and 48 kHz sample rate:

On initialization, the buffer processor initializes framesInTempOutputBuffer to 1024:

portaudio/src/common/pa_process.c

Lines 182 to 186 in d07842c

    
           else /* variable host buffer size, add framesPerUserBuffer latency */ 
        
           { 
        
               bp->initialFramesInTempInputBuffer = 0; 
        
               bp->initialFramesInTempOutputBuffer = framesPerUserBuffer; 
        
           }

Initially the DS cursors are at 0 and the DS host code polls repeatedly, waiting for cursors to move.
Eventually, the DS read position suddenly moves from 0 to 1500 frames. Meanwhile, the play cursor is at 960 frames.
The DS code sees that 960 frames can be processed on both sides, and moves to lock the buffers and call the buffer processor.
In CopyTempOutputBuffersToHostOutputBuffers() the buffer processor transfers 960 frames of silence to the host output buffer. framesInTempOutputBuffer is now 1024-960=64.
AdaptingProcess() does not read any input frames and does not call the stream callback because there are not enough frames available (960 < 1024).
AdaptingProcess() returns 0 "frames processed", since it did not process any input frames. (It did write some silence into the output host buffer without telling the host code, but that's benign - it just overwrote silence with silence at this point.)
Since PaUtil_EndBufferProcessing() returned 0, the DS code does not move the buffer offsets and everything stays as it is.
The DS code is back to polling. On a subsequent poll, the read position is still 1500 frames, but the play cursor is now at 1440 frames.
The DS code sees 1440 frames are available. It locks the buffers and calls the buffer processor again.
In CopyTempOutputBuffersToHostOutputBuffers() the buffer processor transfers the remaining 64 frames of silence to the beginning of the host output buffer (remember the DS buffer offsets haven't moved), overwriting some of the silence it already wrote in step 5. (This is starting to feel wrong, and things only go downhill from there.) framesInTempOutputBuffer is now 0, and hostOutputChannels[i].data pointers are incremented by 64 frames.
AdaptingProcess() sees that more than 1024 input frames are available. It calls the stream callback on the first 1024 input frames. framesInTempOutputBuffer is now 1024.
AdaptingProcess() calls CopyTempOutputBuffersToHostOutputBuffers() again. CopyTempOutputBuffersToHostOutputBuffers() copies the 1024 frames to the host output buffer after the 64 frames of silence it wrote in step 11 (this is because in step 11 the internal buffer processor hostOutputChannels[i].data pointers were incremented).
AdaptingProcess() returns the number of input frames processed, which is 1024.
The DS code moves the buffer offsets accordingly. It increments the input buffer offset by 1024, which is correct. It also increments the output buffer offset by 1024, which is incorrect because in reality the buffer processor wrote 64+1024=1088 frames into that buffer!
The DS code goes back to polling.
Because the DS write offset is wrong, the next buffer processor call will write the next frames at DS offset 1024, instead of resuming from the correct 1088 offset. This results in the last 64 frames of the previous buffer being trampled on. Hilarity ensues.

I believe this explains the symptoms. I believe the reason why this only happens at the end of the first user buffer likely has to do with the fact that this initial priming phase where framesInTempOutputBuffer is initialized to 1024 in step 1 is (presumably) the only scenario where output frames are written without any input frames being read. Because this has to do with initialization/priming, this would explain why there is only a single glitch and the rest of the stream is fine.

Next step is to come up with a fix. For that I need to understand how that code was originally intended to work. This buffer adaptation stuff is proving to be quite headache-inducing!

Currently the DirectSound host API uses the buffer processor in the "paUtilVariableHostBufferSizePartialUsageAllowed" buffer size mode. What's special about this mode is that it allows the buffer processor to leave input frames in the host buffer if there are not enough frames available to fire the stream callback. In practice the buffer processor will only do this in full duplex mode. However, just because we *can* leave these frames in the host buffer, doesn't mean we *should*. It is not clear to me why the DS code was set up to leave data in the host input buffer. The reason stated in a code comment, "because DS can split the host buffer when it wraps around" doesn't make sense to me - the buffer processor seems perfectly capable of handling buffer split and wraparound (through functions such as PaUtil_Set2ndInputFrameCount()) regardless of the buffer size mode. While it is not clear what could be the upside of leaving data in the host buffer, the downsides seem evident: the longer data stays in the input buffer, the more likely it is to get trampled on by the capture cursor (buffer overflow). Copying the data out of the host input buffer and into the buffer processor temp buffer at the earliest opportunity seems like a safer option with no downside. Another downside of this mode is that currently, the behavior with respect to the host *output* buffer seems... confused, with the buffer processor writing a number of frames to the output host buffer that is inconsistent with the number of "frames processed" it returns to the DS code, leading to PortAudio#770. Arguably we could try to fix the buffer processor, but why fix a mode whose reason to exist is unclear in the first place? This commit changes the mode to paUtilBoundedHostBufferSize. Not only does this fix PortAudio#770, it also has the nice side effect of having the buffer processor provision a temp buffer to match the host buffer size (if paFramesPerBufferUnspecified is used), avoiding unnecessary user buffer splitting. This change was tested using paloopback to test for glitches with various framesPerBuffer and suggested latency parameters, both in half duplex and full duplex mode. No regressions were observed, and paloopback results show that PortAudio#770 is fixed by this change. Fixes PortAudio#770

DirectSound was the only host API using this mode, and its usage was removed in the previous commit, making this mode dead code. As discussed in that commit the value proposition of this mode is unclear, and it does not behave correctly around stream start (priming) as shown in PortAudio#770.

Currently the DirectSound host API uses the buffer processor in the "paUtilVariableHostBufferSizePartialUsageAllowed" buffer size mode. What's special about this mode is that it allows the buffer processor to leave input frames in the host buffer if there are not enough frames available to fire the stream callback. In practice the buffer processor will only do this in full duplex mode. However, just because we *can* leave these frames in the host buffer, doesn't mean we *should*. It is not clear to me why the DS code was set up to leave data in the host input buffer. The reason stated in a code comment, "because DS can split the host buffer when it wraps around" doesn't make sense to me - the buffer processor seems perfectly capable of handling buffer split and wraparound (through functions such as PaUtil_Set2ndInputFrameCount()) regardless of the buffer size mode. While it is not clear what could be the upside of leaving data in the host buffer, the downsides seem evident: the longer data stays in the input buffer, the more likely it is to get trampled on by the capture cursor (buffer overflow). Copying the data out of the host input buffer and into the buffer processor temp buffer at the earliest opportunity seems like a safer option with no downside. Another downside of this mode is that currently, the behavior with respect to the host *output* buffer seems... confused, with the buffer processor writing a number of frames to the output host buffer that is inconsistent with the number of "frames processed" it returns to the DS code, leading to PortAudio#770. Arguably we could try to fix the buffer processor, but why fix a mode whose reason to exist is unclear in the first place? Another oddity of this mode is that it leads to the DS code locking the DS input and output buffers for read/write, but the buffer processor might then decide to do nothing with them. These DS Lock/Unlock calls are therefore wasted, reducing efficiency. This commit changes the mode to paUtilBoundedHostBufferSize. Not only does this fix PortAudio#770, it also has the nice side effect of having the buffer processor provision a temp buffer to match the host buffer size (if paFramesPerBufferUnspecified is used), avoiding unnecessary user buffer splitting. This change was tested using paloopback to test for glitches with various framesPerBuffer and suggested latency parameters, both in half duplex and full duplex mode. No regressions were observed, and paloopback results show that PortAudio#770 is fixed by this change. Fixes PortAudio#770

DirectSound was the only host API using this mode, and its usage was removed in the previous commit, making this mode dead code. As discussed in that commit the value proposition of this mode is unclear, and it does not behave correctly around stream start (priming) as shown in PortAudio#770.

Currently the DirectSound host API uses the buffer processor in the "paUtilVariableHostBufferSizePartialUsageAllowed" buffer size mode. What's special about this mode is that it allows the buffer processor to leave input frames in the host buffer if there are not enough frames available to fire the stream callback. In practice the buffer processor will only do this in full duplex mode. However, just because we *can* leave these frames in the host buffer, doesn't mean we *should*. It is not clear to me why the DS code was set up to leave data in the host input buffer. The reason stated in a code comment, "because DS can split the host buffer when it wraps around" doesn't make sense to me - the buffer processor seems perfectly capable of handling buffer split and wraparound (through functions such as PaUtil_Set2ndInputFrameCount()) regardless of the buffer size mode. While it is not clear what could be the upside of leaving data in the host buffer, the downsides seem evident: the longer data stays in the input buffer, the more likely it is to get trampled on by the capture cursor (buffer overflow). Copying the data out of the host input buffer and into the buffer processor temp buffer at the earliest opportunity seems like a safer option with no downside. Another downside of this mode is that currently, the behavior with respect to the host *output* buffer seems... confused, with the buffer processor writing a number of frames to the output host buffer that is inconsistent with the number of "frames processed" it returns to the DS code, leading to PortAudio#770. Arguably we could try to fix the buffer processor, but why fix a mode whose reason to exist is unclear in the first place? Another oddity of this mode is that it leads to the DS code locking the DS input and output buffers for read/write, but the buffer processor might then decide to do nothing with them. These DS Lock/Unlock calls are therefore wasted, reducing efficiency. This commit changes the mode to paUtilBoundedHostBufferSize. Not only does this fix PortAudio#770, it also has the nice side effect of having the buffer processor provision a temp buffer to match the host buffer size (if paFramesPerBufferUnspecified is used), avoiding unnecessary user buffer splitting. This change was tested using paloopback to test for glitches with various framesPerBuffer and suggested latency parameters, both in half duplex and full duplex mode. No regressions were observed, and paloopback results show that PortAudio#770 is fixed by this change. Fixes PortAudio#770

DirectSound was the only host API using this mode, and its usage was removed in the previous commit, making this mode dead code. As discussed in that commit the value proposition of this mode is unclear, and it does not behave correctly around stream start (priming) as shown in PortAudio#770.

dechamps changed the title ~~DirectSound full duplex is glitchy~~ DirectSound full duplex produces a single glitch shortly after stream starts Feb 19, 2023

RossBencina added src-dsound MS DirectSound Host API /src/hostapi/dsound P3 Priority: Normal labels Feb 20, 2023

This was referenced Feb 25, 2023

Stop using paUtilVariableHostBufferSizePartialUsageAllowed in DirectSound #772

Open

Add unit tests for buffer processor (pa_process) code #779

Open

dechamps mentioned this issue Mar 21, 2023

mme: don't restrict the host buffer to 16-bit #796

Merged

dechamps mentioned this issue May 25, 2024

dsound: clamp input buffer size at 62.5 ms #782

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DirectSound full duplex produces a single glitch shortly after stream starts #770

DirectSound full duplex produces a single glitch shortly after stream starts #770

dechamps commented Feb 19, 2023 •

edited

Loading

dechamps commented Feb 19, 2023

dechamps commented Feb 19, 2023

dechamps commented Feb 19, 2023

RossBencina commented Feb 20, 2023

dechamps commented Feb 20, 2023

RossBencina commented Feb 20, 2023

dechamps commented Feb 24, 2023

DirectSound full duplex produces a single glitch shortly after stream starts #770

DirectSound full duplex produces a single glitch shortly after stream starts #770

Comments

dechamps commented Feb 19, 2023 • edited Loading

dechamps commented Feb 19, 2023

dechamps commented Feb 19, 2023

dechamps commented Feb 19, 2023

RossBencina commented Feb 20, 2023

dechamps commented Feb 20, 2023

RossBencina commented Feb 20, 2023

dechamps commented Feb 24, 2023

dechamps commented Feb 19, 2023 •

edited

Loading