Audio-to-Text Transcription

In this tutorial, we demonstrate deploying Whisper, an audio-to-text model for speech recognition, in LLMBoost.

Step 0: Before you start

Enter the following to set up the environment variables and start an LLMBoost container on the host node where you want to run the Whisper inference service.

export MODEL_PATH=<absolute_path_to_model_directory>
export LICENSE_FILE=<absolute_path_to_license_file>
export HUGGING_FACE_HUB_TOKEN=<your_huggingface_token>

💡 These variables are used when launching the Docker container to ensure correct model loading and authentication.

Set the model directory MODEL_PATH with the absolute path to the directory on your host file system where your local models are stored.
Set the license file path LICENSE_FILE to your license file location. Please contact us through contact@mangoboost.io if you don't have a llmboost license.
Set the HuggingFace token HUGGING_FACE_HUB_TOKEN by obtaining a Hugging Face token from huggingface.co/settings/tokens.

On NVIDIA
On AMD

docker run -it --rm \
  --network host \
  --gpus all \
  --pid=host \
  --group-add video \
  --ipc host \
  --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  -v $MODEL_PATH:/workspace/models \
  -v $LICENSE_FILE:/workspace/llmboost_license.skm \
  -w /workspace \
  -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
  <llmboost-docker-image-name>:prod-cuda \
  bash

docker run -it --rm \
  --network host \
  --group-add video \
  --ipc host \
  --cap-add=SYS_PTRACE \
  --security-opt seccomp=unconfined \
  --device=/dev/dri:/dev/dri \
  --device=/dev/kfd:/dev/kfd \
  -v $MODEL_PATH:/workspace/models \
  -v $LICENSE_FILE:/workspace/llmboost_license.skm \
  -w /workspace \
  -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
  <llmboost-docker-image-name>:prod-rocm-audio \
  bash

Note: Replace <llmboost-docker-image-name> with the image name provided by MangoBoost.

Note: Audio models on AMD environment require a different docker tag

An LLMBoost container should start in an interactive command shell session. The commands in the rest of this tutorial should be entered into the command shell prompt within the container.

Step 1: Starting an audio-to-text inference service

Starting an inference service with an audio-to-text model like Whisper is not very different from serving text-to-text models. Type the following command into the container command shell.

llmboost serve --model_name openai/whisper-large-v3 --query_type=audio

After starting, the service will, by default, wait for inference requests on port 8011. The Whisper model takes in audio inputs and returns their transcription in text.

It will take a few minutes for the service to be ready. Wait until the service status message reports that it is ready before proceeding with the rest of the steps.

Step 2: Submit audio query using curl

When the inference service is ready, you can connect to it by connecting to port 8011 of the host node. The service can be accessed from any node that can reach the host node by network. To easily try out the Whisper service in this tutorial, start another LLMBoost container in interactive mode on the same host node following the instructions in Step 0.

💡 Instead of starting a new container, you can use the below to attach to the LLMBoost container you already started. Use docker ps to find your container's DOCKER_ID.

docker ps
docker exec -it <DOCKER_ID> bash

From the second LLMBoost container's shell prompt, type the following curl command to submit an audio input query for the audio file /workspace/apps/demo/audio_transcription_client/example_audio.wav. You are welecome to record and submit your own .wav files.

curl localhost:8011/v1/audio/transcriptions \
  -X POST \
  -H "Content-Type: multipart/form-data" \
  -F file="@/workspace/apps/demo/audio_transcription_client/example_audio.wav" \
  -F model="whisper-large-v3"

The command should return the following transcription result.

{"text":"<|en|><|transcribe|><|notimestamps|> \"Most wonderful."}

Step 3: Submit audio query using LLMBoost Python API

The Whisper inference service can also be accessed by LLMBoost Python client API. Start a Python interpreter session. Cut and paste the following into the session to transcribe the example audio input file.

from llmboost.entrypoints.client import send_prompt

response = send_prompt(
    host="localhost",
    port=8011,
    model_path="openai/whisper-large-v3",
    role="user",
    user_input="/workspace/apps/demo/audio_transcription_client/example_audio.wav"
)

print(response)

OpenAI client is also supported

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8011/v1", api_key="-")

audio_file = open("/workspace/apps/demo/audio_transcription_client/example_audio.wav", "rb")
transcript = client.audio.transcriptions.create(model="openai/whisper-large-v3", file=audio_file)

print(transcript.text)

These example files can be found at /workspace/apps/demo/audio_transcription_client

To end this first part of the tutorial, type ctrl-C in the first container window to terminate the Whisper inference service. You can also use llmboost shutdown --port XXXX to terminate the service associated with the specified port. Or, you can use llmboost shutdown --all to shut down all LLMBoost inference service instances on the host node.

Step 4: Offline inference example

We provide an example python script to demonstrate how to carry out audio inference offline in LLMBoost. Browse /workspace/apps/demo/benchmark_audio.py to see the example program that loads an audio test dataset from HuggingFace, starts an LLMBoost inference engine for the Whisper model, and submits the test input to the inference engine for transcription.

To run the program, enter the following into an LLMBoost container.

python3 /workspace/apps/demo/benchmark_audio.py --num_prompts=10

After the LLMBoost deployment message, you should see the transcription results.

Step 0: Before you start​

Step 1: Starting an audio-to-text inference service​

Step 2: Submit audio query using curl​

Step 3: Submit audio query using LLMBoost Python API​

Step 4: Offline inference example​

Step 0: Before you start

Step 1: Starting an audio-to-text inference service

Step 2: Submit audio query using curl

Step 3: Submit audio query using LLMBoost Python API

Step 4: Offline inference example