Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DirectSound full duplex produces a single glitch shortly after stream starts #770

Open
dechamps opened this issue Feb 19, 2023 · 7 comments · May be fixed by #772
Open

DirectSound full duplex produces a single glitch shortly after stream starts #770

dechamps opened this issue Feb 19, 2023 · 7 comments · May be fixed by #772
Labels
P3 Priority: Normal src-dsound MS DirectSound Host API /src/hostapi/dsound

Comments

@dechamps
Copy link
Contributor

dechamps commented Feb 19, 2023

The DirectSound Host API is glitchy when used in full duplex mode (i.e. when both input and output devices are used in a single PortAudio stream) when using reasonable parameters (default frames per buffer, default high suggested latency).

Half duplex is fine.

Example output from paloopback -r48000 -s1024 --inputLatency 240 --outputLatency 240 -w:

************ Mode = Two Streams (Half Duplex) ************
|-   requested  -|-  stream info latency  -|- measured ------------------------------
|-sRate-|-fr/buf-|- in    - out   - total -|- over/under/calls for in, out -|- frm/buf -|-latency-|- channel results -
| 48000 |   1024 |   21.33  240.00  261.33 |    0/   0/ 469,    0/   0/ 470 | 1024-1024 |  297.29 | OK - #errs = 0
| 48000 |   1024 |   21.33  240.00  261.33 |    0/   0/ 469,    0/   0/ 470 | 1024-1024 |  298.04 | OK - #errs = 0
| 48000 |   1024 |   21.33  240.00  261.33 |    0/   0/ 469,    0/   0/ 470 | 1024-1024 |  298.52 | OK - #errs = 0
| 48000 |   1024 |   21.33  240.00  261.33 |    0/   0/ 469,    0/   0/ 470 | 1024-1024 |  295.52 | OK - #errs = 0
| 48000 |   1024 |   21.33  240.00  261.33 |    0/   0/ 469,    0/   0/ 470 | 1024-1024 |  296.77 | OK - #errs = 0

************ Mode = One Stream (Full Duplex) ************
|-   requested  -|-  stream info latency  -|- measured ------------------------------
|-sRate-|-fr/buf-|- in    - out   - total -|- over/under/calls for in, out -|- frm/buf -|-latency-|- channel results -
| 48000 |   1024 |   21.33  261.33  282.67 |    0/   0/ 469,    0/   0/ 469 | 1024-1024 |  295.40 | channel 0 POP 0.460 at 15220, ".\paloopback_0.wav", channel 1 POP 0.453 at 15220, ".\paloopback_1.wav",  - #errs = 0
| 48000 |   1024 |   21.33  261.33  282.67 |    0/   0/ 469,    0/   0/ 469 | 1024-1024 |  299.40 | channel 0 POP 0.460 at 15410, ".\paloopback_2.wav", channel 1 POP 0.453 at 15410, ".\paloopback_3.wav",  - #errs = 1
| 48000 |   1024 |   21.33  261.33  282.67 |    0/   0/ 469,    0/   0/ 469 | 1024-1024 |  299.44 | channel 0 POP 0.460 at 15412, ".\paloopback_4.wav", channel 1 POP 0.453 at 15412, ".\paloopback_5.wav",  - #errs = 2
| 48000 |   1024 |   21.33  261.33  282.67 |    0/   0/ 469,    0/   0/ 469 | 1024-1024 |  298.60 | channel 0 POP 0.460 at 15373, ".\paloopback_6.wav", channel 1 POP 0.453 at 15373, ".\paloopback_7.wav",  - #errs = 3
| 48000 |   1024 |   21.33  261.33  282.67 |    0/   0/ 469,    0/   0/ 469 | 1024-1024 |  296.06 | channel 0 POP 0.460 at 15252, ".\paloopback_8.wav", channel 1 POP 0.453 at 15252, ".\paloopback_9.wav",  - #errs = 4

A quick look at the recorded test signal confirms the issue:

image

MME passes the same test with flying colors.

@dechamps
Copy link
Contributor Author

Actually, closer investigation of the recordings reveal that there is only a single glitch in each stream, happening exactly 20 ms after the start of the test signal. After that single glitch, a 10-second stream appears to be glitch-free:

image

This likely explains why no-one is complaining about this - a single glitch at the beginning of the stream is arguably benign, although this is still a bug that should be fixed (if only so that we can get proper results from paloopback).

@dechamps dechamps changed the title DirectSound full duplex is glitchy DirectSound full duplex produces a single glitch shortly after stream starts Feb 19, 2023
@dechamps
Copy link
Contributor Author

I was wondering if this could be related to the use of DirectSoundFullDuplexCreate8, but this doesn't seem to be the case: the behaviour is exactly the same if I undef PAWIN_USE_DIRECTSOUNDFULLDUPLEXCREATE.

@dechamps
Copy link
Contributor Author

If I record the audio going through the loopback device using Audacity at the same time as the paloopback test is running, the glitch appears in the separately recorded audio. This indicates that the glitch is produced in the PortAudio → device (playback) direction.

@RossBencina
Copy link
Collaborator

Hi Etienne, please update your issue report to specify Windows version and hardware setup.

Can you consistently reproduce this on multiple systems with differing audio hardware?

I have never observed such behavior myself, nor do I recall anyone report in over the long history of PortAudio (DirectSound was one of the first host API implementations). This makes me think that it is possibly a regression in recent Windows versions, or with specific drivers.

@dechamps
Copy link
Contributor Author

please update your issue report to specify Windows version and hardware setup.

I already did, see bottom of my first post.

Can you consistently reproduce this on multiple systems with differing audio hardware?

I only tried Virtual Audio Cable so far. I guess I could try other VAC KS modes or some real hardware. The fact that both MME and DS half duplex are working fine would seem to point away from the hardware, though.

I have never observed such behavior myself, nor do I recall anyone report in over the long history of PortAudio

It's possible it was always there but that no-one noticed it - a glitch happening in the first 20 ms could be quite hard to notice in real usage, especially given that the first 20 ms of streamed audio would probably be silence in many scenarios. Also an half-duplex application would not trigger it. I only noticed this problem when I saw paloopback fail.

@RossBencina
Copy link
Collaborator

I'm not saying it's not a bug, just that from my point of view PortAudio is not the most likely culprit.

Please at least test with real hardware.

@RossBencina RossBencina added src-dsound MS DirectSound Host API /src/hostapi/dsound P3 Priority: Normal labels Feb 20, 2023
@dechamps
Copy link
Contributor Author

I'm still investigating but so far I am getting very suspicious there is a problem in the interaction between the DS host API code and the AdaptingProcess() buffer processor code.

numFrames = PaUtil_EndBufferProcessing( &stream->bufferProcessor, &stream->callbackResult );
stream->framesWritten += numFrames;
if( stream->bufferProcessor.outputChannelCount > 0 )
{
/* FIXME: an underflow could happen here */
/* Update our buffer offset and unlock sound buffer */
bytesProcessed = numFrames * stream->outputFrameSizeBytes;
stream->outputBufferWriteOffsetBytes = (stream->outputBufferWriteOffsetBytes + bytesProcessed) % stream->outputBufferSizeBytes;

The return value of PaUtil_EndBufferProcessing() is somewhat vaguely documented as the "number of frames processed". The DS host code implicitly assumes that this is the number of frames that were read from the host input buffer, and also the number of frames that were written to the host output buffer. The DS host code increments the input and output buffer offsets accordingly, so that on the next TimeSlice() call we will process the next portions of both buffers.

The buffer processor business logic that is used here, specifically AdaptingProcess(), ultimately returns as the "number of frames processed" the number of input frames that were consumed. With regard to the DS caller code, this works as long as the number of output frames that were written is always exactly equal to the number of input frames read.

The problem is, I believe I have found a scenario in which that isn't the case. Here's how things actually unfold in reality, assuming framesPerBuffer 1024 and 48 kHz sample rate:

  1. On initialization, the buffer processor initializes framesInTempOutputBuffer to 1024:
    else /* variable host buffer size, add framesPerUserBuffer latency */
    {
    bp->initialFramesInTempInputBuffer = 0;
    bp->initialFramesInTempOutputBuffer = framesPerUserBuffer;
    }
  2. Initially the DS cursors are at 0 and the DS host code polls repeatedly, waiting for cursors to move.
  3. Eventually, the DS read position suddenly moves from 0 to 1500 frames. Meanwhile, the play cursor is at 960 frames.
  4. The DS code sees that 960 frames can be processed on both sides, and moves to lock the buffers and call the buffer processor.
  5. In CopyTempOutputBuffersToHostOutputBuffers() the buffer processor transfers 960 frames of silence to the host output buffer. framesInTempOutputBuffer is now 1024-960=64.
  6. AdaptingProcess() does not read any input frames and does not call the stream callback because there are not enough frames available (960 < 1024).
  7. AdaptingProcess() returns 0 "frames processed", since it did not process any input frames. (It did write some silence into the output host buffer without telling the host code, but that's benign - it just overwrote silence with silence at this point.)
  8. Since PaUtil_EndBufferProcessing() returned 0, the DS code does not move the buffer offsets and everything stays as it is.
  9. The DS code is back to polling. On a subsequent poll, the read position is still 1500 frames, but the play cursor is now at 1440 frames.
  10. The DS code sees 1440 frames are available. It locks the buffers and calls the buffer processor again.
  11. In CopyTempOutputBuffersToHostOutputBuffers() the buffer processor transfers the remaining 64 frames of silence to the beginning of the host output buffer (remember the DS buffer offsets haven't moved), overwriting some of the silence it already wrote in step 5. (This is starting to feel wrong, and things only go downhill from there.) framesInTempOutputBuffer is now 0, and hostOutputChannels[i].data pointers are incremented by 64 frames.
  12. AdaptingProcess() sees that more than 1024 input frames are available. It calls the stream callback on the first 1024 input frames. framesInTempOutputBuffer is now 1024.
  13. AdaptingProcess() calls CopyTempOutputBuffersToHostOutputBuffers() again. CopyTempOutputBuffersToHostOutputBuffers() copies the 1024 frames to the host output buffer after the 64 frames of silence it wrote in step 11 (this is because in step 11 the internal buffer processor hostOutputChannels[i].data pointers were incremented).
  14. AdaptingProcess() returns the number of input frames processed, which is 1024.
  15. The DS code moves the buffer offsets accordingly. It increments the input buffer offset by 1024, which is correct. It also increments the output buffer offset by 1024, which is incorrect because in reality the buffer processor wrote 64+1024=1088 frames into that buffer!
  16. The DS code goes back to polling.
  17. Because the DS write offset is wrong, the next buffer processor call will write the next frames at DS offset 1024, instead of resuming from the correct 1088 offset. This results in the last 64 frames of the previous buffer being trampled on. Hilarity ensues.

I believe this explains the symptoms. I believe the reason why this only happens at the end of the first user buffer likely has to do with the fact that this initial priming phase where framesInTempOutputBuffer is initialized to 1024 in step 1 is (presumably) the only scenario where output frames are written without any input frames being read. Because this has to do with initialization/priming, this would explain why there is only a single glitch and the rest of the stream is fine.

Next step is to come up with a fix. For that I need to understand how that code was originally intended to work. This buffer adaptation stuff is proving to be quite headache-inducing!

dechamps added a commit to dechamps/portaudio that referenced this issue Feb 25, 2023
Currently the DirectSound host API uses the buffer processor in the
"paUtilVariableHostBufferSizePartialUsageAllowed" buffer size mode.
What's special about this mode is that it allows the buffer processor to
leave input frames in the host buffer if there are not enough frames
available to fire the stream callback. In practice the buffer processor
will only do this in full duplex mode.

However, just because we *can* leave these frames in the host buffer,
doesn't mean we *should*.

It is not clear to me why the DS code was set up to leave data in the
host input buffer. The reason stated in a code comment, "because DS can
split the host buffer when it wraps around" doesn't make sense to me -
the buffer processor seems perfectly capable of handling buffer split
and wraparound (through functions such as
PaUtil_Set2ndInputFrameCount()) regardless of the buffer size mode.

While it is not clear what could be the upside of leaving data in the
host buffer, the downsides seem evident: the longer data stays in the
input buffer, the more likely it is to get trampled on by the capture
cursor (buffer overflow). Copying the data out of the host input buffer
and into the buffer processor temp buffer at the earliest opportunity
seems like a safer option with no downside.

Another downside of this mode is that currently, the behavior with
respect to the host *output* buffer seems... confused, with the buffer
processor writing a number of frames to the output host buffer that is
inconsistent with the number of "frames processed" it returns to the DS
code, leading to PortAudio#770. Arguably we could try to fix the buffer
processor, but why fix a mode whose reason to exist is unclear in the
first place?

This commit changes the mode to paUtilBoundedHostBufferSize. Not only
does this fix PortAudio#770, it also has the nice side effect of having the
buffer processor provision a temp buffer to match the host buffer size
(if paFramesPerBufferUnspecified is used), avoiding unnecessary user
buffer splitting.

This change was tested using paloopback to test for glitches with
various framesPerBuffer and suggested latency parameters, both in half
duplex and full duplex mode. No regressions were observed, and
paloopback results show that PortAudio#770 is fixed by this change.

Fixes PortAudio#770
dechamps added a commit to dechamps/portaudio that referenced this issue Feb 25, 2023
DirectSound was the only host API using this mode, and its usage was
removed in the previous commit, making this mode dead code. As discussed
in that commit the value proposition of this mode is unclear, and it
does not behave correctly around stream start (priming) as shown in
PortAudio#770.
dechamps added a commit to dechamps/portaudio that referenced this issue Feb 25, 2023
Currently the DirectSound host API uses the buffer processor in the
"paUtilVariableHostBufferSizePartialUsageAllowed" buffer size mode.
What's special about this mode is that it allows the buffer processor to
leave input frames in the host buffer if there are not enough frames
available to fire the stream callback. In practice the buffer processor
will only do this in full duplex mode.

However, just because we *can* leave these frames in the host buffer,
doesn't mean we *should*.

It is not clear to me why the DS code was set up to leave data in the
host input buffer. The reason stated in a code comment, "because DS can
split the host buffer when it wraps around" doesn't make sense to me -
the buffer processor seems perfectly capable of handling buffer split
and wraparound (through functions such as
PaUtil_Set2ndInputFrameCount()) regardless of the buffer size mode.

While it is not clear what could be the upside of leaving data in the
host buffer, the downsides seem evident: the longer data stays in the
input buffer, the more likely it is to get trampled on by the capture
cursor (buffer overflow). Copying the data out of the host input buffer
and into the buffer processor temp buffer at the earliest opportunity
seems like a safer option with no downside.

Another downside of this mode is that currently, the behavior with
respect to the host *output* buffer seems... confused, with the buffer
processor writing a number of frames to the output host buffer that is
inconsistent with the number of "frames processed" it returns to the DS
code, leading to PortAudio#770. Arguably we could try to fix the buffer
processor, but why fix a mode whose reason to exist is unclear in the
first place?

Another oddity of this mode is that it leads to the DS code locking the
DS input and output buffers for read/write, but the buffer processor
might then decide to do nothing with them. These DS Lock/Unlock calls
are therefore wasted, reducing efficiency.

This commit changes the mode to paUtilBoundedHostBufferSize. Not only
does this fix PortAudio#770, it also has the nice side effect of having the
buffer processor provision a temp buffer to match the host buffer size
(if paFramesPerBufferUnspecified is used), avoiding unnecessary user
buffer splitting.

This change was tested using paloopback to test for glitches with
various framesPerBuffer and suggested latency parameters, both in half
duplex and full duplex mode. No regressions were observed, and
paloopback results show that PortAudio#770 is fixed by this change.

Fixes PortAudio#770
dechamps added a commit to dechamps/portaudio that referenced this issue Feb 25, 2023
DirectSound was the only host API using this mode, and its usage was
removed in the previous commit, making this mode dead code. As discussed
in that commit the value proposition of this mode is unclear, and it
does not behave correctly around stream start (priming) as shown in
PortAudio#770.
dechamps added a commit to dechamps/portaudio that referenced this issue May 25, 2024
Currently the DirectSound host API uses the buffer processor in the
"paUtilVariableHostBufferSizePartialUsageAllowed" buffer size mode.
What's special about this mode is that it allows the buffer processor to
leave input frames in the host buffer if there are not enough frames
available to fire the stream callback. In practice the buffer processor
will only do this in full duplex mode.

However, just because we *can* leave these frames in the host buffer,
doesn't mean we *should*.

It is not clear to me why the DS code was set up to leave data in the
host input buffer. The reason stated in a code comment, "because DS can
split the host buffer when it wraps around" doesn't make sense to me -
the buffer processor seems perfectly capable of handling buffer split
and wraparound (through functions such as
PaUtil_Set2ndInputFrameCount()) regardless of the buffer size mode.

While it is not clear what could be the upside of leaving data in the
host buffer, the downsides seem evident: the longer data stays in the
input buffer, the more likely it is to get trampled on by the capture
cursor (buffer overflow). Copying the data out of the host input buffer
and into the buffer processor temp buffer at the earliest opportunity
seems like a safer option with no downside.

Another downside of this mode is that currently, the behavior with
respect to the host *output* buffer seems... confused, with the buffer
processor writing a number of frames to the output host buffer that is
inconsistent with the number of "frames processed" it returns to the DS
code, leading to PortAudio#770. Arguably we could try to fix the buffer
processor, but why fix a mode whose reason to exist is unclear in the
first place?

Another oddity of this mode is that it leads to the DS code locking the
DS input and output buffers for read/write, but the buffer processor
might then decide to do nothing with them. These DS Lock/Unlock calls
are therefore wasted, reducing efficiency.

This commit changes the mode to paUtilBoundedHostBufferSize. Not only
does this fix PortAudio#770, it also has the nice side effect of having the
buffer processor provision a temp buffer to match the host buffer size
(if paFramesPerBufferUnspecified is used), avoiding unnecessary user
buffer splitting.

This change was tested using paloopback to test for glitches with
various framesPerBuffer and suggested latency parameters, both in half
duplex and full duplex mode. No regressions were observed, and
paloopback results show that PortAudio#770 is fixed by this change.

Fixes PortAudio#770
dechamps added a commit to dechamps/portaudio that referenced this issue May 25, 2024
DirectSound was the only host API using this mode, and its usage was
removed in the previous commit, making this mode dead code. As discussed
in that commit the value proposition of this mode is unclear, and it
does not behave correctly around stream start (priming) as shown in
PortAudio#770.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 Priority: Normal src-dsound MS DirectSound Host API /src/hostapi/dsound
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants