Conversation
Stypox
left a comment
There was a problem hiding this comment.
Good idea, this looks simple enough, thanks!
I found this bug while playing music in background with NewPipe. If Dicio does not understand what I said, it says "Could you repeat", releases audio focus, and starts the stt again. When the STT restarts, there is no audio focus. Though note that NewPipe's reaction to ducking may be buggy itself soooo idk, which app would you suggest to test this with?
One thing I would suggest to reduce having to worry too much about when to release audio focus and when not to (e.g. because the next part of the workflow might need it) is to debounce by, say, 100ms the transition from focused to not focused. It might also help solve the bug above, and also avoid keeping the audio focus uselessly for a long time in case a skill takes long to produce output.
This can be achieved with a new variable shouldRequestFocus: Flow<Boolean> in AudioFocusManager, and then.
shouldRequestFocus.mapLatest { shouldBeFocused ->
if (!shouldBeFocused) {
delay(100ms);
}
return@mapLatest shouldBeFocused;
}.forEach { shouldBeFocused ->
if (shouldBeFocused) requestFocus() else releaseFocus()
}| class AndroidTtsSpeechDevice( | ||
| private var context: Context, | ||
| locale: Locale, | ||
| private val audioFocusManager: AudioFocusManager |
There was a problem hiding this comment.
Can't you use runnablesWhenFinished instead? And I guess also call those in onError then
| if (!hasFocus) { | ||
| return | ||
| } |
There was a problem hiding this comment.
What if you remove this? Just in case it becomes out of sync with Android's state and then the audio focus never gets released anymore.
| @Synchronized | ||
| fun onTtsStarted() { | ||
| if (!hasFocus) { | ||
| Log.d(TAG, "TTS started without audio focus, requesting now") |
There was a problem hiding this comment.
| Log.d(TAG, "TTS started without audio focus, requesting now") | |
| Log.w(TAG, "TTS started without audio focus, requesting now") |
| return | ||
| } | ||
|
|
||
| if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O) { |
There was a problem hiding this comment.
Where did you find this code? Can you add a comment with a link to documentation?
First initial implementation of audio ducking via AudioFocusManager. Looking for feedback on how it is structured and functions.
This functions through AUDIOFOCUS_GAIN_TRANSIENT_MAY_DUCK which will lower any background audio during the user's interaction with Dicio. Audio ducking starts in SttInputDeviceWrapper when it detects a listening state and is held until TTS finishes in AndroidTtsSpeechDevice.onDone().
I've added some fallback releases in VoskInputDevice.stopListening() (when the user taps the mic button to cancel an interaction), MainActivity.onStop() (when the user leaves the app), and when a skill errors out.
I believe this covers all cases but could definitely use some help testing this implementation.
I've merged this into my test build - https://github.com/tylxr59/dicio-android/tree/tylxrs-build
Resolves #363