Pre-fetch the first audio buffer within TtsService

On lower-end devices, given a very long, complex utterance, the first call to GoogleTtsReadBuffered can take time enough to underflow the SyncReader and cause an udnerrun. This manifests itself by the first few chunks of the tts playback being dropped entirely. No stuttering occurs though as playback is smooth the rest of the way. Therefore, a reasonable fix that works on-device is to pre-fetch the first buffer prior to starting audio playback for the utterance. Within the critical path (TtsService::Render), we can simply use the cached buffer and only call into read buffered for subsequent chunks, which is fast. R=dmazzoni@chromium.org Change-Id: I6bb049d3e8b487af11ca7f34d7ef7b7f82627792 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2464067 Commit-Queue: David Tseng <dtseng@chromium.org> Reviewed-by: Dominic Mazzoni <dmazzoni@chromium.org> Cr-Commit-Position: refs/heads/master@{#815828}

Pre-fetch the first audio buffer within TtsService
On lower-end devices, given a very long, complex utterance, the first call to GoogleTtsReadBuffered can take time enough to underflow the SyncReader and cause an udnerrun. This manifests itself by the first few chunks of the tts playback being dropped entirely. No stuttering occurs though as playback is smooth the rest of the way. Therefore, a reasonable fix that works on-device is to pre-fetch the first buffer prior to starting audio playback for the utterance. Within the critical path (TtsService::Render), we can simply use the cached buffer and only call into read buffered for subsequent chunks, which is fast. R=dmazzoni@chromium.org Change-Id: I6bb049d3e8b487af11ca7f34d7ef7b7f82627792 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2464067 Commit-Queue: David Tseng <dtseng@chromium.org> Reviewed-by: Dominic Mazzoni <dmazzoni@chromium.org> Cr-Commit-Position: refs/heads/master@{#815828}
52204c12 · David Tseng · Commit Bot · 94f1e67d · 52204c12 · 52204c12
Commit 52204c12 authored Oct 09, 2020 by David Tseng Committed by Commit Bot Oct 09, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 26 additions and 4 deletions

chromeos/services/tts/tts_service.cc chromeos/services/tts/tts_service.cc +21 -4

chromeos/services/tts/tts_service.h chromeos/services/tts/tts_service.h +5 -0

No files found.
--- a/chromeos/services/tts/tts_service.cc
+++ b/chromeos/services/tts/tts_service.cc
@@ -125,6 +125,15 @@ void TtsService::Speak(const std::vector<uint8_t>& text_jspb,
    return;
  }

+  // For lower end devices, pre-fetching the first buffer on the main thread is
+  // important. Not doing so can cause us to not respond quickly enough in the
+  // audio rendering thread/callback below.
+  size_t frames = 0;
+  first_buf_.first.clear();
+  first_buf_.first.resize(libchrometts_.GoogleTtsGetFramesInAudioBuffer());
+  first_buf_.second =
+      libchrometts_.GoogleTtsReadBuffered(&first_buf_.first[0], &frames);
+
  output_device_->Play();
 }

@@ -147,13 +156,21 @@ int TtsService::Render(base::TimeDelta delay,
  // can be extremely important if there's a long queue of pending Speak/Stop
  // pairs being processed on the main thread. This can occur if the tts api
  // receives lots of tts requests.
-  if (!state_lock_.Try()) {
+  if (!state_lock_.Try())
    return 0;
-  }

  size_t frames = 0;
-  int32_t status =
-      libchrometts_.GoogleTtsReadBuffered(dest->channel(0), &frames);
+  float* channel = dest->channel(0);
+  int32_t status = -1;
+  if (got_first_buffer_) {
+    status = libchrometts_.GoogleTtsReadBuffered(channel, &frames);
+  } else {
+    status = first_buf_.second;
+    float* buf = &first_buf_.first[0];
+    frames = first_buf_.first.size();
+    for (size_t i = 0; i < frames; i++)
+      channel[i] = buf[i];
+  }

  if (status <= 0) {
    // -1 means an error, 0 means done.

--- a/chromeos/services/tts/tts_service.h
+++ b/chromeos/services/tts/tts_service.h
@@ -76,6 +76,11 @@ class TtsService : public mojom::TtsService,
  // Tracks whether any audio data came as a result of |Speak|. Reset for every
  // call to |Speak|.
  bool got_first_buffer_ GUARDED_BY(state_lock_);
+
+  // The first buffer; used for prefetching/warming up the engine for a new
+  // utterance. The first item is the audio data, the second is the status
+  // returned by a call to GoogleTtsReadBuffered.
+  std::pair<std::vector<float>, size_t> first_buf_;
 };

 }  // namespace tts