Skip to content

Example: Hybrid High-Fidelity Architectures

In this guide, we'll build a Hybrid Context System. We will use a traditional Vector Database (Embeddings) for broad discovery, and CiteKit as the High-Fidelity Orchestrator to provide the exact visual or temporal evidence (diagrams, charts, or video segments) required for a reliable response.

Run the Complete Example

A fully implemented version of this example is available in the repository:

Location: examples/rag-fusion/

Quick Start:

bash
cd examples/rag-fusion
pip install -r requirements.txt
export GEMINI_API_KEY="your_key_here"  # or $env:GEMINI_API_KEY on Windows
# Add a PDF as sample_document.pdf
python hybrid_rag.py

See the README for full instructions including production integration patterns.

Get the Example (Git)

bash
git clone https://github.com/abdushakurob/citekit.git
cd citekit/examples/rag-fusion

CLI Options: While this example uses Python programmatically, you can explore documents via CLI with either:

  • python -m citekit <command> (Python SDK)
  • npx citekit <command> (JavaScript SDK, v0.1.8+)

Beyond "Dumb" Retrieval

In modern AI systems, retrieving a text chunk is often insufficient for reasoning:

  1. Semantic Orphanage: A retrieved text chunk might say "As shown in Figure 4...", but without the actual figure, the model is left to hallucinate details.
  2. Context Overload: Sending the entire document just to see one chart increases cost and reduces model focus (attention loss).

The Orchestration Workflow

  1. Hybrid Indexing:
    • Text Part: Embed the document text into a Vector Database.
    • Structure Part: Ingest the document into CiteKit. This sends the file to the configured mapper (Gemini by default) one time to generate a structural map.
  2. Multimodal Retrieval:
    • A user query triggers a standard Vector Search to find the relevant text.
  3. Context Orchestration (CiteKit):
    • The agent detects that the text reference requires visual evidence.
    • CiteKit Resolves the specific section/node as an image or mini-PDF instantly.
  4. High-Fidelity Reasoning:
    • The agent sends the highly-focused "evidence package" (Text + High-Res Image) to a Multimodal LLM (e.g., Gemini 1.5 or GPT-4o).

Implementation (Python)

python
import asyncio
from citekit import CiteKitClient

# CiteKit acts as our high-fidelity orchestrator
citekit = CiteKitClient()

async def modern_hybrid_query(query: str, vector_db):
    # Discovery Phase: Find relevant text via semantic similarity
    results = vector_db.search(query, k=1)
    if not results: return "No text found."
    
    text_chunk = results[0].content
    
    # Orchestration Phase: CiteKit provides the "Evidence"
    # Imagine the text mentions: "Figure 7: System Topology"
    if "Figure 7" in text_chunk:
        print("Detected visual reference. Orchestrating evidence...")
        
        # Deterministic resolution of Section 7 (Diagram)
        evidence = await citekit.resolve(
            resource_id="internal_spec", 
            node_id="fig_7_topology"
        )
        
        return await call_multimodal_model(
            prompt=f"Based on the spec: {text_chunk}",
            image_path=evidence.output_path
        )

    return text_chunk

Performance & Reliability

StrategySpeedCostReliability
Discovery-Only (Text)Very FastLowPoor (Visual hallucinations)
Full Vision RAGSlowExtremeHigh (Expensive tokens)
CiteKit HybridFastLowHighest (Verified evidence)

Why this is the Future

By treating CiteKit as a Context Orchestrator for your Vector Database, you enable High-Density Agents. These agents don't just "guess" based on text chunks; they "verify" by opening the exact part of the source file they need—exactly like a human engineer would.

Released under the MIT License.