整合语音服务实现与ChatGPT语音对话 - 小米笔记

ChatGPT最近挺火，感觉确实是完爆了之前传统的对话机器人，想着将ChatGPT整合到小爱音箱上，就先实现了一下语音对话功能，后期可以用homeassistant看能不能整合到小爱音箱上。

用到了以下服务：

语音唤醒Picovoice：Train, develop and deploy custom voice features - Picovoice
Azure 文本转语音：文本转语音文档 - 教程和 API 参考 - Azure 认知服务 - Azure Cognitive Services | Microsoft Learn
Azure 语音转文本：语音转文本文档 - 教程和 API 参考 - Azure 认知服务 - Azure Cognitive Services | Microsoft Learn
OpenAI GPT3 API：Overview - OpenAI API

效果

语音关键词唤醒

语音唤醒Picovoice：Train, develop and deploy custom voice features - Picovoice

语音关键词唤醒服务使用的Picovoice，因为是免费的而且效果还不错，就是只支持训练英文关键词，

注册后，进入Porcupine Wake Word Detection & Keyword Spotting - Picovoice即可开始构建唤醒词了，

选择语言，输入唤醒词就行

稍等一会儿训练成功，下载模型(ppn格式)，就可以开始使用了。

下面这段代码就是说出唤醒词，检测到后使用 say 命令说出“a”。

def picovoice():
    picovoice_access_key = 'picovoice的AccessKey'
    porcupine = pvporcupine.create(
        access_key=picovoice_access_key,
        keyword_paths=['hi-chat_en_mac_v2_1_0.ppn']
    )
    pa = pyaudio.PyAudio()
    cobra = pvcobra.create(picovoice_access_key)
    audio_stream = pa.open(
        rate=porcupine.sample_rate,
        channels=1,
        format=pyaudio.paInt16,
        input=True,
        frames_per_buffer=porcupine.frame_length)
    while True:
        pcm = audio_stream.read(porcupine.frame_length, exception_on_overflow=False)
        #
        _pcm = struct.unpack_from("h" * porcupine.frame_length, pcm)
        keyword_index = porcupine.process(_pcm)
        if keyword_index >= 0:
            os.system(f'say -v "Mei-Jia" "a"')

语音转文字，文字转语音

收到语音后就需要将语言转为文字，传给ChatGPT API了，语音转文字，文字转语音我都是用了Azure的服务，都很简单，看文档就行，不细讲。

Azure 文本转语音：文本转语音文档 - 教程和 API 参考 - Azure 认知服务 - Azure Cognitive Services | Microsoft Learn

Azure 语音转文本：语音转文本文档 - 教程和 API 参考 - Azure 认知服务 - Azure Cognitive Services | Microsoft Learn

代码片段：

SPEECH_KEY = ""
SPEECH_REGION = "eastasia"

def recognize_from_microphone():
    # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
    speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=SPEECH_REGION)
    speech_config.speech_recognition_language="zh-CN"

    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    # print("Speak into your microphone.")
    speech_recognition_result = speech_recognizer.recognize_once_async().get()

    if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
        # print("Recognized: {}".format(speech_recognition_result.text))
        return speech_recognition_result.text
    elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
    elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_recognition_result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))
            print("Did you set the speech resource key and region values?")
    return ""

def tts(text):
    # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
    speech_config = speechsdk.SpeechConfig(subscription=SPEECH_KEY, region=SPEECH_REGION)
    audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=True)

    # The language of the voice that speaks.
    speech_config.speech_synthesis_voice_name='zh-CN-XiaoxuanNeural'

    speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

    speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

    # if speech_synthesis_result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
    #     print("")
    #     # print("Speech synthesized for text [{}]".format(text))
    # el
    if speech_synthesis_result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = speech_synthesis_result.cancellation_details
        print("Speech synthesis canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            if cancellation_details.error_details:
                print("Error details: {}".format(cancellation_details.error_details))
                print("Did you set the speech resource key and region values?")

OpenAI API

OpenAI GPT3 API：Overview - OpenAI API

API我这边是使用的Chat服务

示例代码：

import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(
  model="text-davinci-003",
  prompt="The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.\n\nHuman: Hello, who are you?\nAI: I am an AI created by OpenAI. How can I help you today?\nHuman: I'd like to cancel my subscription.\nAI:",
  temperature=0.9,
  max_tokens=150,
  top_p=1,
  frequency_penalty=0.0,
  presence_penalty=0.6,
  stop=[" Human:", " AI:"]
)

代码已开源

https://github.com/FlickerMi/hello-chatgpt

小米笔记

效果

语音关键词唤醒

语音转文字，文字转语音

OpenAI API

代码已开源

Comments (唉呀 ~ 仅有一条评论)

唉呀 ~ 仅有一条评论

海加尔金鹰 2023-04-21 11:01