How-To Guides | Mango LLMBoost

📄️ Using LLMBoost Interactively

In this introductory tutorial, we will launch the LLMBoost container into an interactive command-line session to use a large language model for chat. We will demonstrate the features and flexibility of the llmboost command in shielding the user from the details of running different models and inference engines on different hardware platform choices. Later tutorials will showcase more powerful and flexible ways to use LLMBoost through its Python programming API and its server deployment options.

📄️ Using LLMBoost Python API

In the last tutorial, you used the llmboost command to use LLMBoost interactively.

📄️ Deploying an Inference Service

One of the most powerful uses of the LLMBoost container is to deploy a containerized inference service compatible with the Kubernetes framework.

📄️ Deploying Scalable Inference on a Cluster

In this tutorial, we will demonstrate using LLMBoost to deploy

📄️ Deploying Retrieval-Augmented Generation

LLMBoost supports Retrieval-Augmented Generation (RAG) on top of its standard LLM inference endpoint.