Using the ChatGPT Python library to make language-learning tool

2024-04-23 00:00:00 ChatGPT AI Python

I’m learning German. There are so many AI-enabled apps for learning languages in the past few years with a multitude of features, but sometimes I want just one simple thing.

German has a very different word order from English, and also a much stricter choice of words compared with English. It matters if you translate to change as wechseln, verwechseln, umstellen, andern, verandern etc.

ChatGPT is great at writing fluent simple text in multiple languages, and choosing words that fit the full context of the text. Using the ChatGPT API, I therefore decided to create a simple single-purpose app that would listen to me speak a couple of sentences in English, translate them to fluent and natural-sounding German, and then read them back to me in a nice voice using the OpenAI Python library.

Program outline

We need five functions to make this app:

A function to record our audio message
A function to transcribe the audio into text, using the whisper API
A function that passes the text into ChatGPT and asks for the German translation
A function that turns our translated text back into audio
A function that puts all the bits together into a program

To make the program, install and load the following modules:

import sounddevice as sd
import wavio
import keyboard
import requests
from openai import OpenAI
from playsound import playsound

You also need to use your own api key. For this, you must set up an account at OpenAI. The API is pay as you go, with the price determined by the length of your requests and the model you choose. Here, each request costs at most a few cents/pence/Rappen.

api_key = 'YOUR_KEY_HERE'

First, the function for recording our message. The first time you run this, you may be asked by your OS to give permission to use the microphone. The recording will be saved as recording.wav.

def record_message():
    # Ask user if they want to record a message
    if input("Record prompt? (y/n) ").lower() != 'y':
        print("Recording aborted.")
        return

    # Define the audio parameters
    fs = 44100  # Sample rate
    seconds = 20  # Maximum recording duration in seconds (for safety)

    print("Recording... Press 'q' to stop.")

    # Start recording
    recording = sd.rec(int(seconds * fs), samplerate=fs, channels=1)
    sd.wait()

    # Wait until 'q' is pressed
    #keyboard.wait('q')

    # Stop recording
    sd.stop()

    # Save the recording as a wav file
    wavio.write("recording.wav", recording, fs, sampwidth=2)

Next, the function to turn this recording into text. This also prints the transcription to the terminal to help us keep an eye on it’s accuracy.

def transcribe_text(api_key):
  client = OpenAI(api_key = api_key)

  audio_file = open("recording.wav", "rb")
  transcription = client.audio.transcriptions.create(
    model="whisper-1", 
    file=audio_file, 
    response_format="text"
  )
  
  print(transcription)
  return transcription

Now we can pass the text to ChatGPT using the function chat.completions.create(). For simple messages, we can specify ChatGPT 3.5 as our model of choice, which is much cheaper per request than ChatGPT4. We also literally ask the model to translate our prompt into German by starting our message with “Translate the user prompt into German”. You can change ‘German’ to any language you want, or change the prompt entirely. ‘Rewrite the following prompt as a limerick`for example.

def translate_text(text):

  client = OpenAI(api_key = api_key)

  response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
      {"role": "system", "content": "Translate the user prompt into fluent and natural German"},
      {"role": "user", "content": text},
    ]
  )
  return response.choices[0].message.content

Because we want to feel some human connection, we ask OpenAI to read the text back into audio using the ’tts-1’ model, which gives us ‘podcast quality’ audio.

def create_audio(text):
  
  client = OpenAI(api_key = api_key)
  
  response = client.audio.speech.create(
      model="tts-1",
      voice="onyx", # The voice/accent to read our message
      input=description,
  )
  
  response.stream_to_file("output.mp3")

Last of all, we create the main() function to bring everything together.

def main():

  record_message()
  transcription = transcribe_text()
  translation = translate_text(transcription)
  create_audio(translation)
  playsound("output.mp3")

if __name__ == "__main__":
  main()

And that’s all there is to it. By adjusting the framework and prompt, create whatever language learning tool you wish and let me know how it goes.

You can download the entire script on my (github)[https://github.com/danielgreenwood/python_examples/], and find the full API documentation on the OpenAI website.