點燈坊

失くすものさえない今が強くなるチャンスよ

使用 AnalyserNode 偵測是否在講話

Sam Xiao's Avatar 2024-02-09

透過 Audio API 的 analyser,我們可取得聲音在頻域的平均值,可設定 threshold 判斷是否為 人聲

Version

Vue 3.4
extendable-media-recorder 9.1.7
extendable-media-recorder-wav-encoder 7.0.102

Install Package

$ npm install extendable-media-recorder
$ npm install extendable-media-recorder-wav-encoder
  • extendable-media-recorder:提供與原生 MediaRecorder 相同 interface 的新 MediaRecorder
  • extendable-media-recorder-wav-encoder:支援 WAV 格式

Architecture

speech001

  • MediaStream: 從麥克風取得 stream
  • AudioContext: 產生 16 bit stream
  • AudioContext 產生 SrcNodeAnalyserNodeDestNode,需使用 connect() 將各 node 連結在一起
  • AnalyserNode:轉頻域

Speech Detection

main.js

import { createApp } from 'vue'
import App from './App.vue'
import { register } from 'extendable-media-recorder'
import { connect } from 'extendable-media-recorder-wav-encoder'

await register(await connect())

createApp(App).mount('#app')

Line 7

await register(await connect())
  • 與原生 MediaRecorder 不同,extendable-media-recorder 所提供的 MediaRecorder 必須先註冊才能使用
  • register() 不能寫在每個頁面的 mounted(),只要 route 改變重新進入該 page,就會造成重複註冊的錯誤,因此只能寫在 main.js 只註冊一次

App.vue

<template>
  <div>Speech Detection Demo</div>
  <div>
    <button @click="onStart">Start</button>
    <button @click="onStop">Stop</button>
  </div>
  <div>
    Threshold:
    <input type="text" v-model.number="threshold" />
  </div>
  <div>avgVolume: {{ avgVolume }}</div>
  <div>isSpeaking: {{ isSpeaking }}</div>
</template>

<script setup>
import { ref } from 'vue'
import { MediaRecorder } from 'extendable-media-recorder'

const SAMPLE_RATE = 16000
const SAMPLE_INTERVAL = 250 // 0.5s

let threshold = ref(40)
let avgVolume = ref(0)
let isSpeaking = ref(false)

let mediaRecorder = null

let onStart = async () => {
  try {
    let mediaStream = await navigator.mediaDevices.getUserMedia({
      audio: true
    })

    let audioContext = new AudioContext({ sampleRate: SAMPLE_RATE })
    let analyserNode = audioContext.createAnalyser()
    analyserNode.fftSize = 256

    let srcNode = new MediaStreamAudioSourceNode(audioContext, { mediaStream })
    let destNode = new MediaStreamAudioDestinationNode(audioContext)

    srcNode.connect(analyserNode)
    analyserNode.connect(destNode)

    mediaRecorder = new MediaRecorder(destNode.stream, {
      mimeType: 'audio/wav'
    })

    mediaRecorder.ondataavailable = () => {
      let bufferLength = analyserNode.frequencyBinCount
      let bufferArray = new Uint8Array(bufferLength)
      analyserNode.getByteFrequencyData(bufferArray)

      avgVolume.value = bufferArray.reduce((sum, value) => sum + value, 0) / bufferLength
      isSpeaking.value = avgVolume.value > threshold.value
    }

    mediaRecorder.start(SAMPLE_INTERVAL)
  } catch (err) {
    console.warn(err)
  }
}

let onStop = () => {
  mediaRecorder.stop()
}
</script>

Line 19

const SAMPLE_RATE = 16000
const SAMPLE_INTERVAL = 250 // 0.5s

設定可改用的變數:

  • SAMPLE_RATE:取樣頻率
  • SAMPLE_INTERVAL:每次取樣時間

Line 22

let threshold = ref(40)
let avgVolume = ref(0)
let isSpeaking = ref(false)
  • threshold state:設定判斷聲音大小的 threshold 值
  • avgVolume state:顯示目前平均聲音值
  • isSpeaking state:顯示目前是否在說話

Line 30

let mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true })
  • 使用 Web API 內建的 navigator.mediaDevices.getUserMedia() 取得 MediaStream

Line 34

let audioContext = new AudioContext({ sampleRate: SAMPLE_RATE })
  • 要改變取樣頻率,必須使用 AudioContext

Line 35

let analyserNode = audioContext.createAnalyser()
analyserNode.fftSize = 256
  • AudioConext 必須靠 node 方式運行
  • AudioContext 建立 analyserNode,將由此 node 轉頻域
  • 設計其 ffSize256

Line 38

let srcNode = new MediaStreamAudioSourceNode(audioContext, { mediaStream })
let destNode = new MediaStreamAudioDestinationNode(audioContext)
  • AudioContextMediaStream 建立 srcNode

  • AudioContext 建立 destNode

Line 41

srcNode.connect(analyserNode)
analyserNode.connect(destNode)
  • 使用 connect() 連接各 node

Line 44

mediaRecorder = new MediaRecorder(destNode.stream, { mimeType: 'audio/wav' })
  • 使用 extendable-media-recorder 提供的 MediaRecorder,並改由 destNode 所處理過的 MediaStream
  • 指定 mineTypeaudio/wav

Line 60

mediaRecorder.start(SAMPLE_INTERVAL)
  • 只要設定 SAMPLE_INTERVAL 時間,則每次 dataavailable event 都會被精準觸發

Line 66

let onStop = () => {
  mediaRecorder.stop()
}
  • 啟動 MediaRecoder 開始錄音

Line 48

mediaRecorder.ondataavailable = () => {
  let bufferLength = analyserNode.frequencyBinCount
  let bufferArray = new Uint8Array(bufferLength)
  analyserNode.getByteFrequencyData(bufferArray)

  avgVolume.value = bufferArray.reduce((sum, value) => sum + value, 0) / bufferLength
  isSpeaking.value = avgVolume.value > threshold.value
}
  • analyserNodefrequencyBinCount()getByteFrequencyData() 取得頻域資料 Array
  • 由 Array 資料計算其頻域聲音平均值
  • 若平均大於 threshold 則為 人聲

Conclusion

  • Analyser 為 Audio API 一部分,不必另外安裝其他套件

Reference

MDN, AnalyserNode getByteFrequencyData() method