Install OramaCore

Using Docker Compose

The absolute easiest way to get started with OramaCore is by using Docker Compose. While we discourage using Docker Compose in production, it can be a great way to:

Test OramaCore locally
Understanding how the system works and how the components interact

If you're using OramaCore on a GPU, ensure that you have the NVIDIA Container Toolkit installed. If you're using a CPU, you can skip this step.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Remember to restart Docker if it's already installed.
sudo systemctl restart docker

First thing first, let's create a docker-compose.yml file:

version: "3.8"

services:
  oramacore:
    image: oramasearch/oramacore:latest
    environment:
      - RUST_LOG=oramacore=trace,warn
    volumes:
      - ./config.yaml:/app/config.yaml
    ports:
      - "8080:8080"
    depends_on:
      - python-ai-server
      - vllm
    restart: unless-stopped

  python-ai-server:
    image: oramasearch/oramacore-ai-server:latest
    volumes:
      - ./config.yaml:/config.yaml
    ports:
      - "50051:50051"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu ]
    restart: unless-stopped

  vllm:
    image: vllm/vllm-openai:v0.7.3
    command: --model Qwen/Qwen2.5-3B-Instruct --host 0.0.0.0 --port 8000
    ports:
      - "8000:8000"
    environment:
      - HF_TOKEN=${HF_TOKEN}
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu ]
    restart: unless-stopped

  envoy:
    image: envoyproxy/envoy:v1.26-latest
    ports:
      - "80:80"
      - "9901:9901"
    volumes:
      - ./envoy/envoy.yaml:/etc/envoy/envoy.yaml
    depends_on:
      - oramacore
      - python-ai-server
      - vllm
    restart: unless-stopped

One thing to remember when using Docker is network management. When configuring the OramaCore configuration file (config.yaml), ensure that the ai_server.host is set to python-ai-server, and the ai_server.llm.host is set to vllm.

Also, we recommend exposing the services through Envoy. You can find an example configuration file here.

Building from source

You can also build OramaCore from source by cloning the repository from GitHub:

git clone https://github.com/oramasearch/orama-core

The project consists of two parts: a Rust core and a Python server.

The Python server is responsible for generating embeddings, and it communicates with the Rust core using gRPC.

To build the entire system, ensure that you have Rust installed (installation guide) and Python (recommended version: 3.11).

Building Rust

Simply run the following command from the root directory:

cargo build --release

This will generate a binary located in /target/release/oramacore.

Building Python

Navigate to the src/ai_server directory and install the required dependencies. You'll find two distinct requirements files:

requirements.txt
requirements-cpu.txt

The first file contains dependencies for GPU usage, which we highly recommend for production with an NVIDIA GPU.

If you are running OramaCore on a system without an NVIDIA GPU (e.g., a Mac), use requirements-cpu.txt.

Before installing, create a virtual environment:

python3.11 -m venv .venv
source .venv/bin/activate

Then, install the dependencies:

pip install -r requirements.txt # or pip install -r requirements-cpu.txt

When you run the server, OramaCore will automatically download the required models specified in the configuration file.

The download time will depend on your internet connection.

Large Language Models

OramaCore uses VLLM for providing access to local LLMs. You can follow the installation guide here: VLLM Installation.

Since OramaCore interacts with VLLM through an OpenAI-compatible API, you can choose to use Ollama, OpenAI, or any other LLM provider that supports the OpenAI API.

Just set the host, port, and API key in the config.yaml file:

...

ai_server:
    ...

    llm:
        port: 11434 # In this example, that's the default Ollama port.
        host: "http://localhost"
        model: "Qwen/Qwen2.5-3B-Instruct"

...

Starting the OramaCore server

After installing the dependencies and compiling the binaries, you'll need to start two separate services.

In one terminal tab, run the Python server inside of src/ai_server:

python server.py

Once the process started, run the Rust core binary:

./target/release/oramacore

In future versions of OramaCore, we plan to unify everything into a single binary, so you won't need to run two separate processes manually.