整合语音服务实现与ChatGPT语音对话
ChatGPT最近挺火,感觉确实是完爆了之前传统的对话机器人,想着将ChatGPT整合到小爱音箱上,就先实现了一下语音对话功能,后期可以用homeassistant看能不能整合到小爱音箱上。
用到了以下服务:
- 语音唤醒Picovoice:Train, develop and deploy custom voice features - Picovoice
- Azure 文本转语音:文本转语音文档 - 教程和 API 参考 - Azure 认知服务 - Azure Cognitive Services | Microsoft Learn
- Azure 语音转文本:语音转文本文档 - 教程和 API 参考 - Azure 认知服务 - Azure Cognitive Services | Microsoft Learn
- OpenAI GPT3 API:Overview - OpenAI API
效果
语音关键词唤醒
语音唤醒Picovoice:Train, develop and deploy custom voice features - Picovoice
语音关键词唤醒服务使用的Picovoice,因为是免费的而且效果还不错,就是只支持训练英文关键词,
注册后,进入Porcupine Wake Word Detection & Keyword Spotting - Picovoice即可开始构建唤醒词了,
选择语言,输入唤醒词就行
稍等一会儿训练成功,下载模型(ppn格式),就可以开始使用了。
下面这段代码就是说出唤醒词,检测到后使用 say 命令说出“a”。
def picovoice():
picovoice_access_key = 'picovoice的AccessKey'
porcupine = pvporcupine.create(
access_key=picovoice_access_key,
keyword_paths=['hi-chat_en_mac_v2_1_0.ppn']
)
pa = pyaudio.PyAudio()
cobra = pvcobra.create(picovoice_access_key)
audio_stream = pa.open(
rate=porcupine.sample_rate,
channels=1,
format=pyaudio.paInt16,
input=True,
frames_per_buffer=porcupine.frame_length)
while True:
pcm = audio_stream.read(porcupine.frame_length, exception_on_overflow=False)
#
_pcm = struct.unpack_from("h" * porcupine.frame_length, pcm)
keyword_index = porcupine.process(_pcm)
if keyword_index >= 0:
os.system(f'say -v "Mei-Jia" "a"')
语音转文字,文字转语音
收到语音后就需要将语言转为文字,传给ChatGPT API了,语音转文字,文字转语音我都是用了Azure的服务,都很简单,看文档就行,不细讲。
Azure 文本转语音:文本转语音文档 - 教程和 API 参考 - Azure 认知服务 - Azure Cognitive Services | Microsoft Learn
Azure 语音转文本:语音转文本文档 - 教程和 API 参考 - Azure 认知服务 - Azure Cognitive Services | Microsoft Learn
代码片段:
SPEECH_KEY = ""
SPEECH_REGION = "eastasia"
def recognize_from_microphone():
# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=SPEECH_REGION)
speech_config.speech_recognition_language="zh-CN"
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
# print("Speak into your microphone.")
speech_recognition_result = speech_recognizer.recognize_once_async().get()
if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
# print("Recognized: {}".format(speech_recognition_result.text))
return speech_recognition_result.text
elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = speech_recognition_result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))
print("Did you set the speech resource key and region values?")
return ""
def tts(text):
# This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=SPEECH_REGION)
audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)
# The language of the voice that speaks.
speech_config.speech_synthesis_voice_name='zh-CN-XiaoxuanNeural'
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
# if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
# print("")
# # print("Speech synthesized for text [{}]".format(text))
# el
if speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = speech_synthesis_result.cancellation_details
print("Speech synthesis canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
if cancellation_details.error_details:
print("Error details: {}".format(cancellation_details.error_details))
print("Did you set the speech resource key and region values?")
OpenAI API
OpenAI GPT3 API:Overview - OpenAI API
API我这边是使用的Chat服务
示例代码:
import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
response = openai.Completion.create(
model="text-davinci-003",
prompt="The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.\n\nHuman: Hello, who are you?\nAI: I am an AI created by OpenAI. How can I help you today?\nHuman: I'd like to cancel my subscription.\nAI:",
temperature=0.9,
max_tokens=150,
top_p=1,
frequency_penalty=0.0,
presence_penalty=0.6,
stop=[" Human:", " AI:"]
)