Skip to content

Initial audio ducking implementation#380

Open
tylxr59 wants to merge 1 commit intoStypox:masterfrom
tylxr59:add-audio-ducking
Open

Initial audio ducking implementation#380
tylxr59 wants to merge 1 commit intoStypox:masterfrom
tylxr59:add-audio-ducking

Conversation

@tylxr59
Copy link
Contributor

@tylxr59 tylxr59 commented Dec 17, 2025

First initial implementation of audio ducking via AudioFocusManager. Looking for feedback on how it is structured and functions.

This functions through AUDIOFOCUS_GAIN_TRANSIENT_MAY_DUCK which will lower any background audio during the user's interaction with Dicio. Audio ducking starts in SttInputDeviceWrapper when it detects a listening state and is held until TTS finishes in AndroidTtsSpeechDevice.onDone().

I've added some fallback releases in VoskInputDevice.stopListening() (when the user taps the mic button to cancel an interaction), MainActivity.onStop() (when the user leaves the app), and when a skill errors out.

I believe this covers all cases but could definitely use some help testing this implementation.

I've merged this into my test build - https://github.com/tylxr59/dicio-android/tree/tylxrs-build

Resolves #363

Copy link
Owner

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, this looks simple enough, thanks!

I found this bug while playing music in background with NewPipe. If Dicio does not understand what I said, it says "Could you repeat", releases audio focus, and starts the stt again. When the STT restarts, there is no audio focus. Though note that NewPipe's reaction to ducking may be buggy itself soooo idk, which app would you suggest to test this with?

One thing I would suggest to reduce having to worry too much about when to release audio focus and when not to (e.g. because the next part of the workflow might need it) is to debounce by, say, 100ms the transition from focused to not focused. It might also help solve the bug above, and also avoid keeping the audio focus uselessly for a long time in case a skill takes long to produce output.

This can be achieved with a new variable shouldRequestFocus: Flow<Boolean> in AudioFocusManager, and then.

shouldRequestFocus.mapLatest { shouldBeFocused ->
    if (!shouldBeFocused) {
        delay(100ms);
    }
    return@mapLatest shouldBeFocused;
}.forEach { shouldBeFocused ->
    if (shouldBeFocused) requestFocus() else releaseFocus()
}

class AndroidTtsSpeechDevice(
private var context: Context,
locale: Locale,
private val audioFocusManager: AudioFocusManager
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you use runnablesWhenFinished instead? And I guess also call those in onError then

Comment on lines +89 to +91
if (!hasFocus) {
return
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you remove this? Just in case it becomes out of sync with Android's state and then the audio focus never gets released anymore.

@Synchronized
fun onTtsStarted() {
if (!hasFocus) {
Log.d(TAG, "TTS started without audio focus, requesting now")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Log.d(TAG, "TTS started without audio focus, requesting now")
Log.w(TAG, "TTS started without audio focus, requesting now")

return
}

if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you find this code? Can you add a comment with a link to documentation?

@Stypox Stypox linked an issue Feb 23, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wake word recognition fails during music playback / hard to give additional commands Stop playback while listening

2 participants