OramaCore Architecture
A deep dive into the OramaCore architecture.
As we saw in the introduction, OramaCore enables you to create complex, interaction-rich, and highly customizable AI and discovery experiences with remarkable simplicity.
This section compares OramaCore to a typical production-ready architecture, addressing all the key components required to build scalable, production-grade solutions.
High-Level Architecture
At a very high level, we can identify the following components as the core of OramaCore:
Let's examine these components in more detail.
Frontend
The OramaCore application communicates with the outside world through an API layer, which we call the application frontend.
Rust Webserver
The front-facing webserver is built with Rust, like approximately 90% of OramaCore. We chose Rust for its performance, reliability, and robust type system.
Powered by Axum, all communications between OramaCore and the outside world pass through this webserver.
Application Layer
The application layer is the core of all operations performed by OramaCore. It manages data, performs CRUD operations, handles persistence to disk, and more. It communicates with the outside world exclusively through the frontend layer.
Full-text Search Engine
Our expertise in search enabled us to build a highly performant and scalable full-text search engine that allows users to perform fast lookups and filtering on data.
It's based on a FST (Finite State Transducer), a space-efficient, automaton-based data structure that maps input terms (words, prefixes, or n-grams) to output values (such as document IDs, term frequencies, or other metadata), enabling fast prefix searches, fuzzy matching, and efficient storage of large lexicons.
Users can ask broad questions like:
Orama automatically translates this into an optimized full-text search query to retrieve exactly what the user is asking for:
This approach delivers the fastest and most accurate results possible.
Vector Database
Vector databases have become increasingly popular in modern, large-scale applications, and for good reason.
We've implemented a fully-fledged vector database powered by a HNSW (Hierarchically Navigable Small Worlds) graph, a powerful and extremely scalable data structure that enables vector similarity searches through billions of vectors in just milliseconds.
While other excellent vector databases allow you to choose the index type (HNSW, IVF_FLAT, LSH, and more), we decided to provide a single, simple, yet extremely powerful index type.
We handle the optimization for you, allowing you to focus on building your applications.
Answer Engine
Products like SearchGPT and Perplexity demonstrate how people are increasingly using search in new ways.
OramaCore provides a single, simple API to achieve similar results with your data, enabling you to create SearchGPT or Perplexity-like experiences. It pushes data to the client using SSEs (server-side events).
AI Layer
The AI Layer comprises OramaCore components responsible for providing AI capabilities and features, such as embeddings generation and LLM integrations.
Currently, this entire layer is managed by Python and communicates with the Application Layer via a local gRPC server.
Future versions of OramaCore will move away from this approach by either integrating the Python interpreter inside the Rust Application Layer (via PyO3) or by using the ONNX runtime directly via Rust.
Embeddings Generation
OramaCore automatically generates embeddings for your data. You can configure which models to use via the configuration.
Current benchmarks indicate this implementation can generate up to 1,200 embeddings per second on an RTX 4080 Super. We acknowledge this seems optimistic and will release reproducible benchmarks soon.
Party Planner
The Answer Engine can operate in two modes: performing classic RAG or planning a series of operations before providing an answer.
Party Planner orchestrates all the operations the Answer Engine must perform before generating an answer.
For example, if a user asks:
Party Planner might determine that to provide the most accurate answer, it needs to:
- Split the query into multiple, optimized queries (e.g.,
"Ryzen 9 9900X bent pins"
,"AMD CPUs bent pins troubleshooting"
) - Run these queries on OramaCore or external services (Google or other APIs)
- Provide the response
This process ensures high-quality output.
Fine-tuned LLMs
OramaCore uses several fine-tuned LLMs to power Party Planner and its actions (like splitting input into multiple queries). These models enable OramaCore to perform quick, high-quality inference sessions and provide users with optimal performance and quality.
Runtime Layer
The runtime layer is OramaCore's deepest layer. It supports key operations like embeddings generation, LLM orchestration, and JavaScript function calling.
ONNX Runtime
OramaCore uses the ONNX Runtime for CPU and GPU-accelerated operations on embeddings generated by the AI Layer.
We're working to move all LLM operations to the ONNX Runtime to provide a more stable, faster, and more reliable experience.
Deno
Since OramaCore treats JavaScript as a first-class citizen, we integrated Deno, a powerful, Rust-based, secure JavaScript runtime that allows you to write and execute custom business logic in JavaScript directly on OramaCore.
Final Considerations
OramaCore is in beta, and we aim to release the first stable version by February 28th, 2025. Some components described in this document are still in development and may change.
We will maintain this documentation and continue providing you with the best possible experience.