OramaSearch Inc. LogoOramaCore

OramaCore Architecture

A deep dive into the OramaCore architecture.

As we saw in the introduction, OramaCore enables you to create complex, interaction-rich, and highly customizable AI and discovery experiences with remarkable simplicity.

This section compares OramaCore to a typical production-ready architecture, addressing all the key components required to build scalable, production-grade solutions.

High-Level Architecture

At a very high level, we can identify the following components as the core of OramaCore:

High-Level OramaCore Architecture

Let's examine these components in more detail.

Frontend

The OramaCore application communicates with the outside world through an API layer, which we call the application frontend.

Rust Webserver

The front-facing webserver is built with Rust, like approximately 90% of OramaCore. We chose Rust for its performance, reliability, and robust type system.

Powered by Axum, all communications between OramaCore and the outside world pass through this webserver.

Application Layer

The application layer is the core of all operations performed by OramaCore. It manages data, performs CRUD operations, handles persistence to disk, and more. It communicates with the outside world exclusively through the frontend layer.

Full-text Search Engine

Our expertise in search enabled us to build a highly performant and scalable full-text search engine that allows users to perform fast lookups and filtering on data.

It's based on a FST (Finite State Transducer), a space-efficient, automaton-based data structure that maps input terms (words, prefixes, or n-grams) to output values (such as document IDs, term frequencies, or other metadata), enabling fast prefix searches, fuzzy matching, and efficient storage of large lexicons.

Users can ask broad questions like:

"I just started playing golf, I'd need a pair of shoes that costs less than $100"

Orama automatically translates this into an optimized full-text search query to retrieve exactly what the user is asking for:

{
    "term": "Golf Shoes",
    "where": {
        "price": {
            "lt": 100
        }
    }
}

This approach delivers the fastest and most accurate results possible.

Vector Database

Vector databases have become increasingly popular in modern, large-scale applications, and for good reason.

We've implemented a fully-fledged vector database powered by a HNSW (Hierarchically Navigable Small Worlds) graph, a powerful and extremely scalable data structure that enables vector similarity searches through billions of vectors in just milliseconds.

While other excellent vector databases allow you to choose the index type (HNSW, IVF_FLAT, LSH, and more), we decided to provide a single, simple, yet extremely powerful index type.

We handle the optimization for you, allowing you to focus on building your applications.

Answer Engine

Products like SearchGPT and Perplexity demonstrate how people are increasingly using search in new ways.

OramaCore provides a single, simple API to achieve similar results with your data, enabling you to create SearchGPT or Perplexity-like experiences. It pushes data to the client using SSEs (server-side events).

AI Layer

The AI Layer comprises OramaCore components responsible for providing AI capabilities and features, such as embeddings generation and LLM integrations.

Currently, this entire layer is managed by Python and communicates with the Application Layer via a local gRPC server.

Future versions of OramaCore will move away from this approach by either integrating the Python interpreter inside the Rust Application Layer (via PyO3) or by using the ONNX runtime directly via Rust.

Embeddings Generation

OramaCore automatically generates embeddings for your data. You can configure which models to use via the configuration.

Current benchmarks indicate this implementation can generate up to 1,200 embeddings per second on an RTX 4080 Super. We acknowledge this seems optimistic and will release reproducible benchmarks soon.

Party Planner

The Answer Engine can operate in two modes: performing classic RAG or planning a series of operations before providing an answer.

Party Planner orchestrates all the operations the Answer Engine must perform before generating an answer.

For example, if a user asks:

"I just installed a Ryzen 9 9900X CPU but I fear I bent some pins, as the PC doesn't turn on. What should I do?"

Party Planner might determine that to provide the most accurate answer, it needs to:

  1. Split the query into multiple, optimized queries (e.g., "Ryzen 9 9900X bent pins", "AMD CPUs bent pins troubleshooting")
  2. Run these queries on OramaCore or external services (Google or other APIs)
  3. Provide the response

This process ensures high-quality output.

Fine-tuned LLMs

OramaCore uses several fine-tuned LLMs to power Party Planner and its actions (like splitting input into multiple queries). These models enable OramaCore to perform quick, high-quality inference sessions and provide users with optimal performance and quality.

Runtime Layer

The runtime layer is OramaCore's deepest layer. It supports key operations like embeddings generation, LLM orchestration, and JavaScript function calling.

ONNX Runtime

OramaCore uses the ONNX Runtime for CPU and GPU-accelerated operations on embeddings generated by the AI Layer.

We're working to move all LLM operations to the ONNX Runtime to provide a more stable, faster, and more reliable experience.

Deno

Since OramaCore treats JavaScript as a first-class citizen, we integrated Deno, a powerful, Rust-based, secure JavaScript runtime that allows you to write and execute custom business logic in JavaScript directly on OramaCore.

Final Considerations

OramaCore is in beta, and we aim to release the first stable version by February 28th, 2025. Some components described in this document are still in development and may change.

We will maintain this documentation and continue providing you with the best possible experience.

On this page