JavaScript/TypeScript Client API Reference

The CiteKitClient is the primary interface for CiteKit in Node.js and TypeScript environments. It provides async/await-based ingestion, map management, and resolution.

Constructor

typescript

import { CiteKitClient } from 'citekit';
import { OllamaMapper } from './ollama-mapper';

// Using default Gemini mapper
const client = new CiteKitClient({
    apiKey: "YOUR_GEMINI_API_KEY"
});

// Using custom mapper
const client = new CiteKitClient({
    mapper: new OllamaMapper("llama3")
});

// Full options
const client = new CiteKitClient({
    apiKey?: "YOUR_GEMINI_API_KEY",
    baseDir?: ".",
    storageDir?: ".resource_maps",
    outputDir?: ".citekit_output",
    model?: "gemini-2.0-flash",
    maxRetries?: 3,
    concurrencyLimit?: 5,
    mapper?: undefined
});

Constructor Options

typescript

interface CiteKitClientOptions {
    /** Gemini API Key. Falls back to GEMINI_API_KEY environment variable. */
    apiKey?: string;

    /** Gemini Model ID (default: "gemini-2.0-flash") */
    model?: string;

    /** Max retries for Gemini API calls (default: 3) */
    maxRetries?: number;

    /** Base directory for all CiteKit operations. Acts as the anchor for resolving relative source paths. */
    baseDir?: string;

    /** Directory where resource maps are stored (relative to baseDir). Default: ".resource_maps" */
    storageDir?: string;

    /** Directory for resolved output files (relative to baseDir). Default: ".citekit_output" */
    outputDir?: string;

    /** Max concurrent ingestion calls (default: 5). Prevents rate-limiting. */
    concurrencyLimit?: number;

    /** Custom mapper implementation. If provided, overrides Gemini mapper. */
    mapper?: MapperProvider;
}

Option	Type	Default	Description
`apiKey`	`string \| undefined`	`GEMINI_API_KEY` env	Gemini API Key (only used if `mapper` is not provided).
`model`	`string`	`"gemini-2.0-flash"`	Gemini model ID (only used if `mapper` is not provided).
`maxRetries`	`number`	`3`	Retry attempts for failed mapper calls (only used if `mapper` is not provided).
`baseDir`	`string`	`"."`	Root directory for all operations (useful in serverless environments).
`storageDir`	`string`	`".resource_maps"`	Relative path where resource maps are persisted as JSON.
`outputDir`	`string`	`".citekit_output"`	Relative path where resolved clips/extracts are written.
`concurrencyLimit`	`number`	`5`	Maximum number of parallel mapper calls (ingestion).
`mapper`	`MapperProvider \| undefined`	`undefined`	Custom mapper instance. If provided, Gemini is not used.

Throws

Error: If neither mapper nor apiKey (or GEMINI_API_KEY env) is provided when calling ingest().

Methods

`async ingest(resourcePath, resourceType, options?)`

Analyzes a file using the configured mapper and generates a ResourceMap. This is the primary entry point for structuring your resources.

typescript

async ingest(
    resourcePath: string,
    resourceType: 'video' | 'audio' | 'document' | 'image' | 'text',
    options?: { resourceId?: string }
): Promise<ResourceMap>

Parameters

Parameter	Type	Description
`resourcePath`	`string`	Absolute or relative path to the resource file.
`resourceType`	`'video' \| 'audio' \| 'document' \| 'image' \| 'text'`	The resource modality.
`options?.resourceId`	`string \| undefined`	Optional custom ID. Defaults to the filename stem.

Returns

Promise<ResourceMap>: The generated resource structure containing nodes, metadata, and location data.

Ingestion Workflow

The ingestion process is atomic and idempotent:

Path Validation: Checks that the file exists.
SHA-256 Hashing: Computes a content hash for deduplication.
Cache Lookup: Scans storageDir for an existing map with the same hash (skips LLM call if found).
Concurrency Gate: Waits for a semaphore slot (respects concurrencyLimit).
Mapper Generation: Calls the configured MapperProvider.generateMap().
JSON Extraction: Automatically extracts JSON from the LLM response.
Persistence: Saves the map as <resourceId>.json in storageDir.
Metadata Injection: Adds source_hash and source_size to the map.

Examples

Basic ingestion:

typescript

import { CiteKitClient } from 'citekit';

const client = new CiteKitClient({
    apiKey: process.env.GEMINI_API_KEY
});

const resourceMap = await client.ingest('lecture_01.mp4', 'video');
console.log(`Mapped '${resourceMap.resource_id}' with ${resourceMap.nodes.length} nodes`);

Explicit type and custom ID:

typescript

const resourceMap = await client.ingest(
    'src/main.ts',
    'text',
    { resourceId: 'codebase_v2' }
);
console.log(resourceMap.resource_id);  // "codebase_v2"

Using a custom mapper:

typescript

import { OllamaMapper } from './ollama-mapper';

const client = new CiteKitClient({
    mapper: new OllamaMapper('llama3')
});

// Ingest with local LLM (no API calls)
const resourceMap = await client.ingest('docs/README.md', 'text');
console.log(`Mapped locally with ${resourceMap.nodes.length} sections`);

Concurrent ingestion:

typescript

// Multiple ingests run in parallel (respecting concurrencyLimit)
const maps = await Promise.all([
    client.ingest('video1.mp4', 'video'),
    client.ingest('document1.pdf', 'document'),
    client.ingest('code.ts', 'text')
]);

console.log(`Ingested ${maps.length} resources`);

Throws

Error: If resourcePath does not exist.
Error: If no mapper is configured.
Error: If resourceType is not recognized.

`async resolve(resourceId, nodeId, options?)`

Resolves a node to extracted evidence. Extracts the physical segment from the resource (video clip, PDF pages, image crop, etc.) or returns a metadata-only reference.

typescript

async resolve(
    resourceId: string,
    nodeId: string,
    options?: { virtual?: boolean; sourcePath?: string }
): Promise<ResolvedEvidence>

Parameters

Parameter	Type	Description
`resourceId`	`string`	The resource ID (from `ingest()` or `listMaps()`).
`nodeId`	`string`	The node ID to resolve (e.g., `"chapter_1.scene_2"`).
`options?.virtual`	`boolean \| undefined`	If `true`, returns metadata without extracting files (no FFmpeg/PDF library calls). Defaults to `false`.
`options?.sourcePath`	`string \| undefined`	Optional override for the source file location. If provided, CiteKit uses this path instead of the one stored in the resource map.

Returns

Promise<ResolvedEvidence>: An object containing:
- output_path (string or undefined): Path to the extracted file (undefined if virtual=true)
- address (string): CiteKit URI address (e.g., "video://lecture_01#t=10-20")
- modality (string): The node's modality (e.g., "video", "document")
- node (Node): The resolved node object
- resource_id (string): The resource ID

Resolution Workflow

Map Lookup: Loads the resource map from storageDir.
Node Search: Recursively finds the node by ID in the hierarchical structure.
Smart Path Rebasing:
- If sourcePath is provided in options, use it.
- Otherwise, take the source_path from the map.
- If it's a relative path, resolve it against the client's baseDir.
- If it's an absolute path but doesn't exist, CiteKit attempts to find the file inside baseDir (handling WSL/Windows cross-platform migration).
Address Building: Generates a CiteKit URI based on the node's location.
Virtual Check: If virtual=true, returns address without extraction.
Modality Dispatch: Selects the appropriate resolver (VideoResolver, DocumentResolver, etc.).
Physical Extraction: Resolver writes the extracted segment to outputDir (async I/O).

Examples

Virtual resolution (metadata only):

typescript

const client = new CiteKitClient({
    apiKey: process.env.GEMINI_API_KEY
});

const evidence = await client.resolve(
    'lecture_01',
    'chapter_1.intro',
    { virtual: true }
);

console.log(evidence.address);     // e.g., "video://lecture_01#t=145-285"
console.log(evidence.output_path); // undefined

Physical resolution (extracts file):

typescript

const evidence = await client.resolve('lecture_01', 'chapter_1.intro');

console.log(evidence.output_path);  // e.g., ".citekit_output/lecture_01_chapter_1_intro.mp4"
console.log(evidence.modality);     // "video"

Document page extraction:

typescript

const evidence = await client.resolve('textbook', 'chapter_2.definition');
// Output: ".citekit_output/textbook_chapter_2_definition.pdf"
// Contains only pages 12-15 (as specified in the node's location)

Resolve multiple nodes in parallel:

typescript

const nodeIds = ['chapter_1.intro', 'chapter_1.body', 'chapter_1.conclusion'];

const allEvidence = await Promise.all(
    nodeIds.map(id => client.resolve('lecture_01', id))
);

allEvidence.forEach(ev => {
    console.log(`${ev.node.title}: ${ev.output_path}`);
});

Throws

Error: If the resource map doesn't exist.
Error: If the node ID is not found.
Error: If no resolver is available for the node's modality.

`getMap(resourceId)`

Loads a previously ingested resource map from local storage.

typescript

getMap(resourceId: string): ResourceMap

Parameters

Parameter	Type	Description
`resourceId`	`string`	The resource ID to retrieve.

Returns

ResourceMap: The deserialized resource structure.

Example

typescript

const client = new CiteKitClient({
    apiKey: process.env.GEMINI_API_KEY
});

const resourceMap = client.getMap('lecture_01');
console.log(`Resource: ${resourceMap.title}`);
console.log(`Nodes: ${resourceMap.nodes.length}`);

Throws

Error: If no map exists for the given resourceId.

`listMaps()`

Returns all resource IDs (ingested maps) currently stored locally.

typescript

listMaps(): string[]

Returns

string[]: Array of resource IDs.

Example

typescript

const client = new CiteKitClient({
    apiKey: process.env.GEMINI_API_KEY
});

const maps = client.listMaps();
console.log(`Available resources: ${maps.join(', ')}`);
// Output: Available resources: lecture_01, textbook, codebase_v2

`getStructure(resourceId)`

Retrieves a resource map as a plain JavaScript object (JSON-serializable).

typescript

getStructure(resourceId: string): ResourceMap

`search(query)`

Searches across all ingested resource maps for nodes matching the query in their title or summary.

typescript

search(query: string): Array<{ resourceId: string, node: Node }>

`resolveFromUrl(url)`

Helper to map a standard URL or CiteKit address back to evidence.

typescript

resolveFromUrl(url: string): ResolvedEvidence | undefined

`isVisited(nodeId)`

Checks if a node has been physically resolved/extracted recently.

typescript

isVisited(nodeId: string): boolean

`registerResolver(modality, resolver)`

Extensibility point: Register a custom resolver for a specific modality.

typescript

registerResolver(modality: string, resolver: Resolver): void

`registerAdapter(name, adapter)`

Extensibility point: Register a custom adapter for external data sources.

typescript

registerAdapter(name: string, adapter: any): void

Type Definitions

See Core Data Models for unified definitions across all implementations.

Quick Reference (TypeScript):

typescript

export interface ResourceMap {
    resource_id: string;    // Unique identifier
    type: "document" | "video" | "audio" | "image" | "text" | "virtual";
    title: string;          // Human-readable title
    source_path: string;    // Absolute path to the source file
    nodes: Node[];          // Hierarchical nodes
    metadata?: Record<string, string | number | null>;  // Custom metadata
    created_at: string;     // ISO 8601 timestamp
}

export interface Node {
    id: string;             // Unique within resource
    title?: string;         // Display name
    type: string;           // "section", "scene", "chapter", "class", etc.
    location: Location;     // Temporal/spatial bounds
    summary?: string;       // Brief description
    // Root-level coordinate fields (for structural consistency)
    lines?: [number, number];
    pages?: number[];
    bbox?: [number, number, number, number];
    start?: number;
    end?: number;
    children?: Node[];      // Nested nodes (optional)
}

export interface Location {
    modality: "document" | "video" | "audio" | "image" | "text" | "virtual";
    start?: number;          // Video/Audio start (seconds)
    end?: number;            // Video/Audio end (seconds)
    pages?: number[];        // Document pages (1-indexed list)
    lines?: [number, number]; // Text lines (1-indexed)
    bbox?: [number, number, number, number];  // Image bbox [x1, y1, x2, y2] 0-1 normalized corners
    virtual_address?: string; // Virtual reference URI
}

export interface ResolvedEvidence {
    output_path?: string;   // Path to extracted file (undefined if virtual)
    modality: string;       // Node's modality
    address: string;        // CiteKit URI (e.g., "video://lecture_01#t=145.5-285.0")
    node: Node;             // The resolved node
    resource_id: string;    // The resource ID
}

All field names use snake_case (e.g., resource_id, not resourceId) for consistency with JSON serialization across all implementations (Python, JavaScript, MCP).

Error Handling

Common Errors

Missing mapper or API key:

typescript

try {
    const client = new CiteKitClient();  // No mapper, no apiKey
    await client.ingest('file.mp4', 'video');
} catch (error) {
    console.error(error.message);  // "GEMINI_API_KEY or custom 'mapper' required..."
}

Resource not found:

typescript

try {
    const map = client.getMap('nonexistent');
} catch (error) {
    console.error(error.message);  // "No map found for resource 'nonexistent'..."
}

Node not found:

typescript

try {
    const evidence = await client.resolve('lecture_01', 'invalid.node.id');
} catch (error) {
    console.error(error.message);  // "Node 'invalid.node.id' not found..."
}

typescript

import { CiteKitClient } from 'citekit';

async function ragPipeline() {
    // Initialize client
    const client = new CiteKitClient({
        apiKey: process.env.GEMINI_API_KEY
    });

    // 1. Ingest resources
    console.log('Ingesting lecture...');
    const videoMap = await client.ingest('lecture.mp4', 'video', { resourceId: 'lecture_01' });

    console.log('Ingesting textbook...');
    const docMap = await client.ingest('textbook.pdf', 'document', { resourceId: 'textbook' });

    // 2. List all resources
    const allResources = client.listMaps();
    console.log(`Mapped resources: ${allResources.join(', ')}`);

    // 3. Resolve specific nodes
    const nodeIds = ['chapter_1.intro', 'chapter_1.definition'];
    
    for (const nodeId of nodeIds) {
        console.log(`\nResolving ${nodeId}...`);

        // Virtual resolution (metadata only)
        const virtualEvidence = await client.resolve('lecture_01', nodeId, { virtual: true });
        console.log(`  Address: ${virtualEvidence.address}`);

        // Physical extraction
        const physicalEvidence = await client.resolve('lecture_01', nodeId);
        console.log(`  Extracted to: ${physicalEvidence.output_path}`);
    }

    // 4. Export a resource structure (e.g., for MCP)
    const structure = client.getStructure('lecture_01');
    console.log(`\nStructure JSON: ${JSON.stringify(structure, null, 2)}`);
}

ragPipeline().catch(console.error);

JavaScript/TypeScript Client API Reference ​

Constructor ​

Constructor Options ​

Throws ​

Methods ​

async ingest(resourcePath, resourceType, options?) ​

Parameters ​

Returns ​

Ingestion Workflow ​

Examples ​

Throws ​

async resolve(resourceId, nodeId, options?) ​

Parameters ​

Returns ​

Resolution Workflow ​

Examples ​

Throws ​

getMap(resourceId) ​

Parameters ​

Returns ​

Example ​

Throws ​

listMaps() ​

Returns ​

Example ​

getStructure(resourceId) ​

search(query) ​

resolveFromUrl(url) ​

isVisited(nodeId) ​

registerResolver(modality, resolver) ​

registerAdapter(name, adapter) ​

Type Definitions ​

Error Handling ​

Common Errors ​

Complete Example: Multi-Modal RAG Pipeline ​

JavaScript/TypeScript Client API Reference

Constructor

Constructor Options

Throws

Methods

`async ingest(resourcePath, resourceType, options?)`

Parameters

Returns

Ingestion Workflow

Examples

Throws

`async resolve(resourceId, nodeId, options?)`

Parameters

Returns

Resolution Workflow

Examples

Throws

`getMap(resourceId)`

Parameters

Returns

Example

Throws

`listMaps()`

Returns

Example

`getStructure(resourceId)`

`search(query)`

`resolveFromUrl(url)`

`isVisited(nodeId)`

`registerResolver(modality, resolver)`

`registerAdapter(name, adapter)`

Type Definitions

Error Handling

Common Errors

Complete Example: Multi-Modal RAG Pipeline