Discord, a popular instant messaging and social media platform, is widely favored by online communities, streamers, and gamers. One of its most cherished features is its voice channels, which allow members to connect over voice and video. Another significant advantage of Discord, especially for developers, is its customizability, enabling the creation of bots to add new functionalities. According to AssemblyAI, this tutorial will guide you through developing a Discord bot that can join voice channels, transcribe audio, generate intelligent responses via ChatGPT, and convert these responses back to speech.
Set Up the Bot
To build the Discord bot, you will use Node.js along with third-party services such as AssemblyAI for speech-to-text, OpenAI for intelligent responses, and ElevenLabs for text-to-speech conversion. Familiarity with JavaScript and Node.js, as well as setting up a Node.js project, installing dependencies, and writing basic asynchronous code, is assumed.
First, ensure you have Node.js (version 18 or higher) installed and access to a Discord server with administrator rights. Create a project directory and initialize a Node.js project:
mkdir discord-voice-bot && cd discord-voice-bot
npm init -y
Install the required dependencies:
npm install discord.js libsodium-wrappers ffmpeg-static @discordjs/opus @discordjs/voice dotenv assemblyai elevenlabs-node openai
Store API keys in a .env
file for security:
OPENAI_API_KEY=
ASSEMBLYAI_API_KEY=
ELEVENLABS_API_KEY=
DISCORD_TOKEN=
Set up a Discord developer account, create an application, enable necessary permissions, and save the bot token in the .env
file. Add the bot to your server using the generated URL.
Develop the Discord Voice Bot Functions
The bot will join a voice channel, record audio, transcribe it using AssemblyAI, generate responses via ChatGPT, and convert these responses to speech using ElevenLabs.
Join the Voice Channel
To make the bot respond to the !join
command and enter a voice channel, update the index.js
file:
const { joinVoiceChannel, VoiceConnectionStatus } = require("@discordjs/voice");
client.on(Events.MessageCreate, async (message) => {
if (message.content.toLowerCase() === "!join") {
channel = message.member.voice.channel;
if (channel) {
const connection = joinVoiceChannel({
channelId: channel.id,
guildId: message.guild.id,
adapterCreator: message.guild.voiceAdapterCreator,
});
connection.on(VoiceConnectionStatus.Ready, () => {
message.reply(`Joined voice channel: ${channel.name}!`);
listenAndRespond(connection, message);
});
} else {
message.reply("You need to join a voice channel first!");
}
}
});
Record and Transcribe Audio
Capture audio streams from voice channels and transcribe them using AssemblyAI:
const { AssemblyAI } = require("assemblyai");
const assemblyAI = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });
const transcriber = assemblyAI.realtime.transcriber({ sampleRate: 48000 });
transcriber.on("transcript", (transcript) => {
if (transcript.message_type === "FinalTranscript") {
transcription += transcript.text + " ";
}
});
async function listenAndRespond(connection, message) {
const audioStream = connection.receiver.subscribe(message.author.id);
const prism = require("prism-media");
const opusDecoder = new prism.opus.Decoder({ rate: 48000, channels: 1 });
audioStream.pipe(opusDecoder).on("data", (chunk) => {
transcriber.sendAudio(chunk);
});
audioStream.on("end", async () => {
await transcriber.close();
const chatGPTResponse = await getChatGPTResponse(transcription);
const audioPath = await convertTextToSpeech(chatGPTResponse);
playAudio(connection, audioPath);
});
}
Generate Responses with ChatGPT
Use OpenAI’s GPT-3.5 Turbo model to generate intelligent responses:
const { OpenAI } = require("openai");
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function getChatGPTResponse(text) {
const response = await openai.completions.create({
model: "gpt-3.5-turbo",
prompt: text,
max_tokens: 100,
});
return response.choices[0].text.trim();
}
Convert Text to Speech with ElevenLabs
Convert ChatGPT responses to speech using ElevenLabs:
const ElevenLabs = require("elevenlabs-node");
const voice = new ElevenLabs({ apiKey: process.env.ELEVENLABS_API_KEY });
async function convertTextToSpeech(text) {
const fileName = `${Date.now()}.mp3`;
const response = await voice.textToSpeech({ fileName, textInput: text });
return response.status === "ok" ? fileName : null;
}
Conclusion
This tutorial demonstrated how to create a sophisticated Discord voice bot integrating AssemblyAI for speech transcription, OpenAI’s GPT-3.5 Turbo model for intelligent responses, and ElevenLabs for speech synthesis. This project showcases the potential of modern AI and voice technologies for creating interactive, accessible, and engaging applications.
Image source: Shutterstock
Credit: Source link