透過 Audio API 的 analyser
,我們可取得聲音在頻域的平均值,可設定 threshold
判斷是否為 人聲
。
Version
Vue 3.4
extendable-media-recorder 9.1.7
extendable-media-recorder-wav-encoder 7.0.102
Install Package
$ npm install extendable-media-recorder
$ npm install extendable-media-recorder-wav-encoder
extendable-media-recorder
:提供與原生MediaRecorder
相同 interface 的新MediaRecorder
extendable-media-recorder-wav-encoder
:支援WAV
格式
Architecture
MediaStream
: 從麥克風取得 streamAudioContext
: 產生 16 bit stream- 由
AudioContext
產生SrcNode
、AnalyserNode
與DestNode
,需使用connect()
將各 node 連結在一起 AnalyserNode
:轉頻域
Speech Detection
main.js
import { createApp } from 'vue'
import App from './App.vue'
import { register } from 'extendable-media-recorder'
import { connect } from 'extendable-media-recorder-wav-encoder'
await register(await connect())
createApp(App).mount('#app')
Line 7
await register(await connect())
- 與原生
MediaRecorder
不同,extendable-media-recorder
所提供的MediaRecorder
必須先註冊才能使用 register()
不能寫在每個頁面的mounted()
,只要 route 改變重新進入該 page,就會造成重複註冊的錯誤,因此只能寫在main.js
只註冊一次
App.vue
<template>
<div>Speech Detection Demo</div>
<div>
<button @click="onStart">Start</button>
<button @click="onStop">Stop</button>
</div>
<div>
Threshold:
<input type="text" v-model.number="threshold" />
</div>
<div>avgVolume: {{ avgVolume }}</div>
<div>isSpeaking: {{ isSpeaking }}</div>
</template>
<script setup>
import { ref } from 'vue'
import { MediaRecorder } from 'extendable-media-recorder'
const SAMPLE_RATE = 16000
const SAMPLE_INTERVAL = 250 // 0.5s
let threshold = ref(40)
let avgVolume = ref(0)
let isSpeaking = ref(false)
let mediaRecorder = null
let onStart = async () => {
try {
let mediaStream = await navigator.mediaDevices.getUserMedia({
audio: true
})
let audioContext = new AudioContext({ sampleRate: SAMPLE_RATE })
let analyserNode = audioContext.createAnalyser()
analyserNode.fftSize = 256
let srcNode = new MediaStreamAudioSourceNode(audioContext, { mediaStream })
let destNode = new MediaStreamAudioDestinationNode(audioContext)
srcNode.connect(analyserNode)
analyserNode.connect(destNode)
mediaRecorder = new MediaRecorder(destNode.stream, {
mimeType: 'audio/wav'
})
mediaRecorder.ondataavailable = () => {
let bufferLength = analyserNode.frequencyBinCount
let bufferArray = new Uint8Array(bufferLength)
analyserNode.getByteFrequencyData(bufferArray)
avgVolume.value = bufferArray.reduce((sum, value) => sum + value, 0) / bufferLength
isSpeaking.value = avgVolume.value > threshold.value
}
mediaRecorder.start(SAMPLE_INTERVAL)
} catch (err) {
console.warn(err)
}
}
let onStop = () => {
mediaRecorder.stop()
}
</script>
Line 19
const SAMPLE_RATE = 16000
const SAMPLE_INTERVAL = 250 // 0.5s
設定可改用的變數:
SAMPLE_RATE
:取樣頻率SAMPLE_INTERVAL
:每次取樣時間
Line 22
let threshold = ref(40)
let avgVolume = ref(0)
let isSpeaking = ref(false)
threshold
state:設定判斷聲音大小的 threshold 值avgVolume
state:顯示目前平均聲音值isSpeaking
state:顯示目前是否在說話
Line 30
let mediaStream = await navigator.mediaDevices.getUserMedia({ audio: true })
- 使用 Web API 內建的
navigator.mediaDevices.getUserMedia()
取得MediaStream
Line 34
let audioContext = new AudioContext({ sampleRate: SAMPLE_RATE })
- 要改變取樣頻率,必須使用
AudioContext
Line 35
let analyserNode = audioContext.createAnalyser()
analyserNode.fftSize = 256
AudioConext
必須靠node
方式運行- 由
AudioContext
建立analyserNode
,將由此 node 轉頻域 - 設計其
ffSize
為256
Line 38
let srcNode = new MediaStreamAudioSourceNode(audioContext, { mediaStream })
let destNode = new MediaStreamAudioDestinationNode(audioContext)
由
AudioContext
與MediaStream
建立srcNode
由
AudioContext
建立destNode
Line 41
srcNode.connect(analyserNode)
analyserNode.connect(destNode)
- 使用
connect()
連接各 node
Line 44
mediaRecorder = new MediaRecorder(destNode.stream, { mimeType: 'audio/wav' })
- 使用
extendable-media-recorder
提供的MediaRecorder
,並改由destNode
所處理過的MediaStream
- 指定
mineType
為audio/wav
Line 60
mediaRecorder.start(SAMPLE_INTERVAL)
- 只要設定
SAMPLE_INTERVAL
時間,則每次dataavailable
event 都會被精準觸發
Line 66
let onStop = () => {
mediaRecorder.stop()
}
- 啟動
MediaRecoder
開始錄音
Line 48
mediaRecorder.ondataavailable = () => {
let bufferLength = analyserNode.frequencyBinCount
let bufferArray = new Uint8Array(bufferLength)
analyserNode.getByteFrequencyData(bufferArray)
avgVolume.value = bufferArray.reduce((sum, value) => sum + value, 0) / bufferLength
isSpeaking.value = avgVolume.value > threshold.value
}
- 由
analyserNode
的frequencyBinCount()
與getByteFrequencyData()
取得頻域資料 Array - 由 Array 資料計算其頻域聲音平均值
- 若平均大於
threshold
則為人聲
Conclusion
- Analyser 為 Audio API 一部分,不必另外安裝其他套件