Install OramaCore
Downloading, building, and running OramaCore on your machine or in production.
Using Docker Compose
The absolute easiest way to get started with OramaCore is by using Docker Compose. While we discourage using Docker Compose in production, it can be a great way to:
- Test OramaCore locally
- Understanding how the system works and how the components interact
If you're using OramaCore on a GPU, ensure that you have the NVIDIA Container Toolkit installed. If you're using a CPU, you can skip this step.
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Remember to restart Docker if it's already installed.
sudo systemctl restart docker
First thing first, let's create a docker-compose.yml
file:
version: "3.8"
services:
oramacore:
image: oramasearch/oramacore:latest
environment:
- RUST_LOG=oramacore=trace,warn
volumes:
- ./config.yaml:/app/config.yaml
ports:
- "8080:8080"
depends_on:
- python-ai-server
- vllm
restart: unless-stopped
python-ai-server:
image: oramasearch/oramacore-ai-server:latest
volumes:
- ./config.yaml:/config.yaml
ports:
- "50051:50051"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [ gpu ]
restart: unless-stopped
vllm:
image: vllm/vllm-openai:v0.7.3
command: --model Qwen/Qwen2.5-3B-Instruct --host 0.0.0.0 --port 8000
ports:
- "8000:8000"
environment:
- HF_TOKEN=${HF_TOKEN}
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [ gpu ]
restart: unless-stopped
envoy:
image: envoyproxy/envoy:v1.26-latest
ports:
- "80:80"
- "9901:9901"
volumes:
- ./envoy/envoy.yaml:/etc/envoy/envoy.yaml
depends_on:
- oramacore
- python-ai-server
- vllm
restart: unless-stopped
One thing to remember when using Docker is network management. When configuring the OramaCore configuration file (config.yaml
), ensure that the ai_server.host
is set to python-ai-server
, and the ai_server.llm.host
is set to vllm
.
Also, we recommend exposing the services through Envoy. You can find an example configuration file here.
Building from source
You can also build OramaCore from source by cloning the repository from GitHub:
git clone https://github.com/oramasearch/orama-core
The project consists of two parts: a Rust core and a Python server.
The Python server is responsible for generating embeddings, and it communicates with the Rust core using gRPC.
To build the entire system, ensure that you have Rust installed (installation guide) and Python (recommended version: 3.11).
Building Rust
Simply run the following command from the root directory:
cargo build --release
This will generate a binary located in /target/release/oramacore
.
Building Python
Navigate to the src/ai_server
directory and install the required dependencies. You'll find two distinct requirements files:
requirements.txt
requirements-cpu.txt
The first file contains dependencies for GPU usage, which we highly recommend for production with an NVIDIA GPU.
If you are running OramaCore on a system without an NVIDIA GPU (e.g., a Mac), use requirements-cpu.txt
.
Before installing, create a virtual environment:
python3.11 -m venv .venv
source .venv/bin/activate
Then, install the dependencies:
pip install -r requirements.txt # or pip install -r requirements-cpu.txt
When you run the server, OramaCore will automatically download the required models specified in the configuration file.
The download time will depend on your internet connection.
Large Language Models
OramaCore uses VLLM for providing access to local LLMs. You can follow the installation guide here: VLLM Installation.
Since OramaCore interacts with VLLM through an OpenAI-compatible API, you can choose to use Ollama, OpenAI, or any other LLM provider that supports the OpenAI API.
Just set the host, port, and API key in the config.yaml
file:
...
ai_server:
...
llm:
port: 11434 # In this example, that's the default Ollama port.
host: "http://localhost"
model: "Qwen/Qwen2.5-3B-Instruct"
...
Starting the OramaCore server
After installing the dependencies and compiling the binaries, you'll need to start two separate services.
In one terminal tab, run the Python server inside of src/ai_server
:
python server.py
Once the process started, run the Rust core binary:
./target/release/oramacore
In future versions of OramaCore, we plan to unify everything into a single binary, so you won't need to run two separate processes manually.
Last updated on