📄️ Using LLMBoost Interactively
In this introductory tutorial, we will launch the LLMBoost container into an interactive command-line session to use a large language model for chat. We will demonstrate the features and flexibility of the llmboost command in shielding the user from the details of running different models and inference engines on different hardware platform choices. Later tutorials will showcase more powerful and flexible ways to use LLMBoost through its Python programming API and its server deployment options.
📄️ Using LLMBoost Python API
In the last tutorial, you used the llmboost command to use LLMBoost interactively.
📄️ Deploying an Inference Service
One of the most powerful uses of the LLMBoost container is to deploy a containerized inference service compatible with the Kubernetes framework.
📄️ Deploying Scalable Inference on a Cluster
In this tutorial, we will demonstrate using LLMBoost to deploy
📄️ Deploying Retrieval-Augmented Generation
LLMBoost supports Retrieval-Augmented Generation (RAG) on top of its standard LLM inference endpoint.