Commit 52204c12 authored by David Tseng's avatar David Tseng Committed by Commit Bot

Pre-fetch the first audio buffer within TtsService

On lower-end devices, given a very long, complex utterance, the first call to
GoogleTtsReadBuffered can take time enough to underflow the SyncReader and cause
an udnerrun.

This manifests itself by the first few chunks of the tts playback being dropped
entirely. No stuttering occurs though as playback is smooth the rest of the way.

Therefore, a reasonable fix that works on-device is to pre-fetch the first
buffer prior to starting audio playback for the utterance.

Within the critical path (TtsService::Render), we can simply use the cached
buffer and only call into read buffered for subsequent chunks, which is fast.

R=dmazzoni@chromium.org

Change-Id: I6bb049d3e8b487af11ca7f34d7ef7b7f82627792
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2464067
Commit-Queue: David Tseng <dtseng@chromium.org>
Reviewed-by: default avatarDominic Mazzoni <dmazzoni@chromium.org>
Cr-Commit-Position: refs/heads/master@{#815828}
parent 94f1e67d
......@@ -125,6 +125,15 @@ void TtsService::Speak(const std::vector<uint8_t>& text_jspb,
return;
}
// For lower end devices, pre-fetching the first buffer on the main thread is
// important. Not doing so can cause us to not respond quickly enough in the
// audio rendering thread/callback below.
size_t frames = 0;
first_buf_.first.clear();
first_buf_.first.resize(libchrometts_.GoogleTtsGetFramesInAudioBuffer());
first_buf_.second =
libchrometts_.GoogleTtsReadBuffered(&first_buf_.first[0], &frames);
output_device_->Play();
}
......@@ -147,13 +156,21 @@ int TtsService::Render(base::TimeDelta delay,
// can be extremely important if there's a long queue of pending Speak/Stop
// pairs being processed on the main thread. This can occur if the tts api
// receives lots of tts requests.
if (!state_lock_.Try()) {
if (!state_lock_.Try())
return 0;
}
size_t frames = 0;
int32_t status =
libchrometts_.GoogleTtsReadBuffered(dest->channel(0), &frames);
float* channel = dest->channel(0);
int32_t status = -1;
if (got_first_buffer_) {
status = libchrometts_.GoogleTtsReadBuffered(channel, &frames);
} else {
status = first_buf_.second;
float* buf = &first_buf_.first[0];
frames = first_buf_.first.size();
for (size_t i = 0; i < frames; i++)
channel[i] = buf[i];
}
if (status <= 0) {
// -1 means an error, 0 means done.
......
......@@ -76,6 +76,11 @@ class TtsService : public mojom::TtsService,
// Tracks whether any audio data came as a result of |Speak|. Reset for every
// call to |Speak|.
bool got_first_buffer_ GUARDED_BY(state_lock_);
// The first buffer; used for prefetching/warming up the engine for a new
// utterance. The first item is the audio data, the second is the status
// returned by a call to GoogleTtsReadBuffered.
std::pair<std::vector<float>, size_t> first_buf_;
};
} // namespace tts
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment