Core Data Models

All CiteKit implementations (Python, JavaScript, MCP) use these unified data model structures. Language-specific type syntax differs (Pydantic BaseModel vs interface vs schema), but the structure and field names are identical.

`ResourceMap`

Represents the hierarchical structure of a resource. Generated by mappers, persisted as JSON.

Field Definitions:

Field	Type	Description
`resource_id`	string	Unique identifier (e.g., "paper_v1", filename stem)
`type`	string	Resource modality: `document`, `video`, `audio`, `image`, `text`, `virtual`
`title`	string	Human-readable title
`source_path`	string	Absolute path to the source file
`nodes`	Node[]	Hierarchical array of nodes
`metadata`	object \| null	Optional custom metadata (e.g., `source_hash`, `source_size`)
`created_at`	string	ISO 8601 timestamp

JSON Example:

json

{
  "resource_id": "lecture_01",
  "type": "video",
  "title": "Introduction to Machine Learning",
  "source_path": "/home/user/lecture_01.mp4",
  "nodes": [
    {
      "id": "intro",
      "title": "Introduction",
      "type": "section",
      "summary": "Course overview and objectives",
      "location": {
        "modality": "video",
        "start": 0,
        "end": 60
      }
    }
  ],
  "metadata": {
    "source_hash": "abc123...",
    "source_size": 524288000
  },
  "created_at": "2024-01-15T10:30:00Z"
}

`Node`

Represents a segment or logical unit within a resource. Hierarchical (supports children).

Field Definitions:

Field	Type	Description
`id`	string	Unique within resource (e.g., "chapter_1.scene_2", no spaces)
`title`	string \| null	Display name (e.g., "Chapter 1: Fundamentals")
`type`	string	Node category (e.g., "section", "scene", "chapter", "class", "function")
`summary`	string \| null	Brief description of node content
`location`	Location	Full location metadata object
`lines`	[number, number] \| null	Text/Code lines (Root-level copy for structural consistency)
`pages`	number[] \| null	Document page numbers (Root-level copy)
`bbox`	[number, number, number, number] \| null	Image bounding box (Root-level copy)
`start`	number \| null	Video/Audio start time (Root-level copy)
`end`	number \| null	Video/Audio end time (Root-level copy)
`children`	Node[]	Nested child nodes (optional, defaults to empty list)

Example:

json

{
  "id": "chapter_2.section_1",
  "title": "Chapter 2: Advanced Topics - Section 1",
  "type": "section",
  "summary": "Overview of neural networks",
  "location": {
    "modality": "document",
    "pages": [45, 67]
  },
  "pages": [45, 67],
  "children": [
    {
      "id": "chapter_2.section_1.subsection_1",
      "title": "Backpropagation",
      "type": "subsection",
      "location": {
        "modality": "document",
        "pages": [45, 52]
      }
    }
  ]
}

`Location`

Specifies where a node is positioned within a resource. The modality field determines which modality-specific fields are populated.

Field Definitions:

Field	Type	Modalities	Description
`modality`	string	All	Resource type: `document`, `video`, `audio`, `image`, `text`, `virtual`
`start`	number	video, audio	Start time in seconds (float, 0-indexed)
`end`	number	video, audio	End time in seconds (float, 0-indexed)
`pages`	number[]	document	List of page numbers (1-indexed, inclusive)
`lines`	[number, number]	text	Start and end line numbers (1-indexed, inclusive tuple)
`bbox`	[number, number, number, number]	image	Bounding box as `[x1, y1, x2, y2]` normalized 0-1 (top-left to bottom-right corners)
`virtual_address`	string	virtual	URI reference for virtual nodes

Usage Rules:

Only ONE modality-specific field is populated per Location
For virtual modality, use virtual_address (no extraction possible)
Coordinates are inclusive on both ends
Bbox coordinates are normalized 0-1 where (0,0) is top-left and (1,1) is bottom-right

Examples by Modality:

json

// Video/Audio (start and end in seconds)
{
  "modality": "video",
  "start": 145.5,
  "end": 285.0
}

// Document/PDF (list of page numbers)
{
  "modality": "document",
  "pages": [12, 13, 14, 15]
}

// Image (bounding box with corners)
{
  "modality": "image",
  "bbox": [0.1, 0.2, 0.9, 0.8]
}

// Text (line range)
{
  "modality": "text",
  "lines": [5, 25]
}

// Virtual (metadata-only, no extraction)
{
  "modality": "virtual",
  "virtual_address": "graphrag://entity_123"
}

`ResolvedEvidence`

The result of calling resolve() - extracted content or virtual reference.

Field Definitions:

Field	Type	Description
`output_path`	string \| null	Path to extracted file (null if virtual modality)
`modality`	string	Node's modality
`address`	string	CiteKit URI (e.g., `video://lecture_01#t=145.5-285.0`)
`node`	Node	The resolved node object
`resource_id`	string	The resource ID

Example (Physical Resolution):

json

{
  "output_path": ".citekit_output/lecture_01_intro.mp4",
  "modality": "video",
  "address": "video://lecture_01#t=0-60",
  "node": {
    "id": "intro",
    "title": "Introduction",
    "type": "section",
    "location": {
      "modality": "video",
      "start": 0,
      "end": 60
    }
  },
  "resource_id": "lecture_01"
}

Example (Virtual Resolution):

json

{
  "output_path": null,
  "modality": "virtual",
  "address": "virtual://research_paper#abstract",
  "node": {
    "id": "abstract",
    "title": "Abstract",
    "type": "section",
    "location": {
      "modality": "virtual",
      "virtual_address": "virtual://research_paper#abstract"
    }
  },
  "resource_id": "research_paper"
}

Language-Specific Implementations

Python (`citekit/models.py`)

python

from datetime import datetime, timezone
from typing import Literal

from pydantic import BaseModel, Field

ResourceType = Literal["document", "video", "audio", "image", "text", "virtual"]

class Location(BaseModel):
    modality: ResourceType
    # Video/Audio
    start: float | None = None    # seconds
    end: float | None = None      # seconds
    # Document
    pages: list[int] | None = None  # list of page numbers
    # Text
    lines: tuple[int, int] | None = None  # (start_line, end_line)
    # Image
    bbox: tuple[float, float, float, float] | None = None  # (x1, y1, x2, y2)
    # Virtual
    virtual_address: str | None = None

class Node(BaseModel):
    id: str
    title: str | None = None
    type: str
    location: Location
    summary: str | None = None
    # Root-level coordinate fields (for structural consistency)
    lines: tuple[int, int] | None = None
    pages: list[int] | None = None
    bbox: tuple[float, float, float, float] | None = None
    start: float | None = None
    end: float | None = None
    children: list["Node"] = Field(default_factory=list)

class ResourceMap(BaseModel):
    resource_id: str
    type: ResourceType  # "document", "video", "audio", "image", "text", "virtual"
    title: str
    source_path: str
    metadata: dict[str, str | int | float | None] | None = None
    nodes: list[Node] = Field(default_factory=list)
    created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))

JavaScript/TypeScript (`javascript/src/models.ts`)

typescript

export interface Location {
    modality: "document" | "video" | "audio" | "image" | "text" | "virtual";
    // Video/Audio
    start?: number;        // seconds
    end?: number;          // seconds
    // Document
    pages?: number[];      // list of page numbers
    // Text
    lines?: [number, number];  // [start_line, end_line]
    // Image
    bbox?: [number, number, number, number];  // [x1, y1, x2, y2]
    // Virtual
    virtual_address?: string;
}

export interface Node {
    id: string;
    title?: string;
    type: string;
    location: Location;
    summary?: string;
    // Root-level coordinate fields (for structural consistency)
    lines?: [number, number];
    pages?: number[];
    bbox?: [number, number, number, number];
    start?: number;
    end?: number;
    children?: Node[];
}

export interface ResourceMap {
    resource_id: string;
    type: "document" | "video" | "audio" | "image" | "text" | "virtual";
    title: string;
    source_path: string;
    nodes: Node[];
    metadata?: Record<string, string | number | null>;
    created_at: string;  // ISO 8601
}

Field Name Consistency

Important: Despite language differences, field names are identical across all implementations:

Consistent: resource_id, source_path, created_at, modality, start, end, pages, lines, bbox, virtual_address
Never: resourceId (JavaScript native camelCase not used in data models)
Never: sourceHash (use snake_case in metadata only)

This ensures JSON serialization is consistent across Python, JavaScript, and MCP protocol.

Import Examples

Python:

python

from citekit.models import ResourceMap, Node, Location, ResolvedEvidence

# Use types in signatures
def process_map(resource_map: ResourceMap) -> None:
    for node in resource_map.nodes:
        print(f"{node.title} ({node.type})")

JavaScript:

typescript

import type { ResourceMap, Node, Location, ResolvedEvidence } from 'citekit';

// Use types in signatures
function processMap(resourceMap: ResourceMap): void {
    for (const node of resourceMap.nodes) {
        console.log(`${node.title} (${node.type})`);
    }
}

Serialization & JSON Schema

All models serialize to JSON with snake_case field names (platform-independent).

Validation (for custom adapters):

json

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "resource_id": { "type": "string" },
    "type": { "enum": ["document", "video", "audio", "image", "text", "virtual"] },
    "title": { "type": "string" },
    "source_path": { "type": "string" },
    "nodes": { "type": "array" },
    "created_at": { "type": "string", "format": "date-time" }
  },
  "required": ["resource_id", "type", "nodes"]
}

Core Data Models ​

ResourceMap ​

Node ​

Location ​

ResolvedEvidence ​

Language-Specific Implementations ​

Python (citekit/models.py) ​

JavaScript/TypeScript (javascript/src/models.ts) ​

Field Name Consistency ​

Import Examples ​

Serialization & JSON Schema ​

Core Data Models

`ResourceMap`

`Node`

`Location`

`ResolvedEvidence`

Language-Specific Implementations

Python (`citekit/models.py`)

JavaScript/TypeScript (`javascript/src/models.ts`)

Field Name Consistency

Import Examples

Serialization & JSON Schema