Audio-to-Text Transcription
In this tutorial, we demonstrate deploying Whisper, an audio-to-text model for speech recognition, in LLMBoost.
Step 0: Before you start
Enter the following to set up the environment variables and start an LLMBoost container on the host node where you want to run the Whisper inference service.
export MODEL_PATH=<absolute_path_to_model_directory>
export LICENSE_FILE=<absolute_path_to_license_file>
export HUGGING_FACE_HUB_TOKEN=<your_huggingface_token>
💡 These variables are used when launching the Docker container to ensure correct model loading and authentication.
- Set the model directory
MODEL_PATH
with the absolute path to the directory on your host file system where your local models are stored. - Set the license file path
LICENSE_FILE
to your license file location. Please contact us throughcontact@mangoboost.io
if you don't have a llmboost license. - Set the HuggingFace token
HUGGING_FACE_HUB_TOKEN
by obtaining a Hugging Face token fromhuggingface.co/settings/tokens
.
- On NVIDIA
- On AMD
docker run -it --rm \
--network host \
--gpus all \
--pid=host \
--group-add video \
--ipc host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-v $MODEL_PATH:/workspace/models \
-v $LICENSE_FILE:/workspace/llmboost_license.skm \
-w /workspace \
-e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
<llmboost-docker-image-name>:prod-cuda \
bash
docker run -it --rm \
--network host \
--group-add video \
--ipc host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--device=/dev/dri:/dev/dri \
--device=/dev/kfd:/dev/kfd \
-v $MODEL_PATH:/workspace/models \
-v $LICENSE_FILE:/workspace/llmboost_license.skm \
-w /workspace \
-e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
<llmboost-docker-image-name>:prod-rocm-audio \
bash
Note: Replace
<llmboost-docker-image-name>
with the image name provided by MangoBoost.
Note: Audio models on AMD environment require a different docker tag
An LLMBoost container should start in an interactive command shell session. The commands in the rest of this tutorial should be entered into the command shell prompt within the container.
Step 1: Starting an audio-to-text inference service
Starting an inference service with an audio-to-text model like Whisper is not very different from serving text-to-text models. Type the following command into the container command shell.
llmboost serve --model_name openai/whisper-large-v3 --query_type=audio
After starting, the service will, by default, wait for inference requests on port 8011. The Whisper model takes in audio inputs and returns their transcription in text.
It will take a few minutes for the service to be ready. Wait until the service status message reports that it is ready before proceeding with the rest of the steps.
Step 2: Submit audio query using curl
When the inference service is ready, you can connect to it by connecting to port 8011 of the host node. The service can be accessed from any node that can reach the host node by network. To easily try out the Whisper service in this tutorial, start another LLMBoost container in interactive mode on the same host node following the instructions in Step 0.
💡 Instead of starting a new container, you can use the below to attach to the LLMBoost container you already started. Use
docker ps
to find your container'sDOCKER_ID
.
docker ps
docker exec -it <DOCKER_ID> bash
From the second LLMBoost container's shell prompt, type the following
curl
command to submit an audio input query for the audio
file /workspace/apps/demo/audio_transcription_client/example_audio.wav
. You
are welecome to record and submit your own .wav
files.
curl localhost:8011/v1/audio/transcriptions \
-X POST \
-H "Content-Type: multipart/form-data" \
-F file="@/workspace/apps/demo/audio_transcription_client/example_audio.wav" \
-F model="whisper-large-v3"
The command should return the following transcription result.
{"text":"<|en|><|transcribe|><|notimestamps|> \"Most wonderful."}
Step 3: Submit audio query using LLMBoost Python API
The Whisper inference service can also be accessed by LLMBoost Python client API. Start a Python interpreter session. Cut and paste the following into the session to transcribe the example audio input file.
from llmboost.entrypoints.client import send_prompt
response = send_prompt(
host="localhost",
port=8011,
model_path="openai/whisper-large-v3",
role="user",
user_input="/workspace/apps/demo/audio_transcription_client/example_audio.wav"
)
print(response)
OpenAI client is also supported
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8011/v1", api_key="-")
audio_file = open("/workspace/apps/demo/audio_transcription_client/example_audio.wav", "rb")
transcript = client.audio.transcriptions.create(model="openai/whisper-large-v3", file=audio_file)
print(transcript.text)
These example files can be found at /workspace/apps/demo/audio_transcription_client
To end this first part of the tutorial, type ctrl-C
in the first container window to terminate the Whisper inference service.
You can also use llmboost shutdown --port XXXX
to terminate the service associated with the specified port.
Or, you can use llmboost shutdown --all
to shut down all LLMBoost inference service instances on the host node.
Step 4: Offline inference example
We provide an example python script to demonstrate how to carry out audio inference offline in LLMBoost.
Browse /workspace/apps/demo/benchmark_audio.py
to see the example program that loads an audio test dataset from HuggingFace, starts an LLMBoost inference engine for the Whisper model, and submits the test input to the inference engine for transcription.
To run the program, enter the following into an LLMBoost container.
python3 /workspace/apps/demo/benchmark_audio.py --num_prompts=10
After the LLMBoost deployment message, you should see the transcription results.