Changelog
All notable changes to the CiteKit project will be documented in this file.
[0.2.1] - 2026-05-14
Added
- Power Features:
client.search(query): Hierarchical keyword search across all ingested resources (Python & JS).client.resolve_from_url(url): Helper to map CiteKit addresses back to evidence (Python & JS).client.is_visited(node_id): extractor state tracker for extraction history (Python & JS).
- Extendability (Resolvers & Adapters):
client.register_resolver(modality, resolver): Allows community-driven modality support (e.g., CSV, SQL, Slack).client.register_adapter(name, adapter): Dynamic adapter registration for external data sources.
- Schema Evolution: Promoted
lines,pages,start,end, andbboxto root-level fields in theNodemodel for better accessibility and structural parity.
Changed
- Cross-Platform Hardening:
- Implemented automatic path normalization (
os.path.normpath/path.normalize) throughout the core. - Standardized
source_pathas POSIX-style (forward slashes) in JSON maps for seamless migration between Windows and Linux.
- Implemented automatic path normalization (
- Mapper Precision: Updated
GeminiMapperprompts to explicitly extract structural coordinates at the root level.
Fixed
- Source Path Rebasing: Fixed a critical bug where
CiteKitClient.resolve()ignored thebase_dirpassed in the constructor. - Resolution Bug: Resolved
TextResolverfile-slicing bug where full files were being copied.
Changed
- Cross-Platform Hardening:
- Implemented automatic path normalization (
os.path.normpath/path.normalize) throughout the core. - Standardized
source_pathas POSIX-style (forward slashes) in JSON maps for seamless migration between Windows and Linux.
- Implemented automatic path normalization (
- Mapper Precision: Updated
GeminiMapperprompts to explicitly extract line numbers for text/code resources.
Fixed
- Resolution Bug: Resolved
TextResolverfile-slicing bug where full files were being copied; now correctly extracts specific line ranges with robust bounds checking.
[0.1.8] - 2026-02-16
Added
- JavaScript CLI Expansion: Implemented core CLI commands in the JavaScript package (
ingest,resolve,list,structure,check-map,inspect,serve). - Runnable Examples: Added
examples/folder with 4 real-world implementations:- Research App (Node.js) - Agentic research engine for technical papers
- Study Companion (MCP) - Claude Desktop integration for lecture videos
- Video Search CLI (Python) - Concept-based video library navigation
- RAG Fusion (Python) - Hybrid vector database + CiteKit architecture
Changed
- CLI Scope: JavaScript CLI is no longer MCP-only; adapter conversion (
adapt) remains Python CLI. - Version: Bumped to 0.1.8 across both SDKs.
Fixed
- Documentation URLs: Changed all references from
citekit.orgtoabdushakurob.github.io/citekit. - Examples: Updated example READMEs and runnable setup instructions.
- Doc Pages: Added "Run the Complete Example" sections to all guide example pages.
[0.1.7] - 2026-02-16
Added
- Text & Code Support: First-class support for
.txt,.md, and.pyfiles. Includes sliding window analysis and line-range resolution. - Map Portability:
- Adapters: New
citekit adaptcommand to ingest data from GraphRAG, LlamaIndex, or custom sources. - Validator: New
citekit check-mapcommand to verify map schema compliance. - Standardized Schema: Strict JSON schema alignment between Python and TypeScript SDKs.
- Adapters: New
- Documentation:
- Added "Map Adapters" guide.
- Clarified MCP tool error responses and CLI output formatting.
- Updated client and model references to match actual SDK types and fields.
- Expanded resolver and client references across Python/JS to reflect real schema shapes.
- Published full API model and MCP tool docs alignment.
- Added utilities sections for address parsing and agent-context helpers.
- Added MCP integration details and CLI usage guides across Python/Node contexts.
- Added requirements, troubleshooting, and deployment guidance for mapper-based ingestion.
- Updated architecture/ingestion deep dives to reflect mapper abstraction.
JavaScript SDK
- Resolvers: Added audio resolver and expanded modality-specific resolver docs.
- Addressing: Added
parseAddress/buildAddresssupport fortextandvirtualschemes. - Client: Added concurrency limiting, source-size metadata injection, and recursive node lookup.
- Mappers: Added audio/image prompts and improved MIME detection.
- MCP: Updated server metadata and tooling behavior to align with MCP docs.
- Exports: Added missing exports for text/audio resolvers and address utilities.
Python SDK
- CLI: Added
--mapperand--mapper-config, expanded text file type detection, and improved list output. - Adapters: Added LlamaIndex adapter and normalized GraphRAG output to
virtual. - Client: Added
save_map()convenience and virtual-modality short-circuiting. - Addressing: Added virtual URI parsing/building and improved media MIME handling.
- Mappers: Async-safe Gemini calls and improved retry timing.
Changed
- CLI Scope: CLI is Python-only; JavaScript package provides MCP server integration (CLI commands are not yet implemented in JS).
- Client Type Snippets: Quick references now mirror Pydantic and TypeScript interfaces exactly.
- Docs Language: Replaced Gemini-only language with mapper-agnostic phrasing across guides.
- URI Format: Standardized time ranges to
start-endacross docs and examples.
Fixed
- CLI Output: Replaced emoji prefixes with professional text labels (
[INFO],[SUCCESS]). - Schema Parity: Fixed model field mismatches (
start/end,pageslist,bboxcorners,virtual_address) across docs. - Resolver Docs: Aligned image bbox coordinates and document pages list with resolver behavior.
- MCP Tools Docs: Error handling now reflects plain-text
Error: ...responses. - Client Docs: Corrected ingest examples, node discovery guidance, and location schemas in Python/JS docs.
- CLI Docs: Updated resolve output formatting and removed unsupported/incorrect options.
- Requirements: Corrected dependency guidance (optional peer deps in Node, PyMuPDF in Python).
[0.1.6] - 2026-02-16
Added
- Strategic Documentation Overhaul: Completely refactored all guides to focus on Modern AI Architectures (Agentic RAG, LongRAG, GraphRAG, and Context Orchestration).
- Context Economics: Added transparency regarding the two-phase lifecycle (Cloud-mapped ingestion vs. Local-first resolution).
- Dedicated API Docs: Added separate technical specifications for Virtual Resolution and MCP Protocol.
- CLI Upgrades: Support for
--virtual,--concurrency(-c), and--retries(-r) flags in the Python CLI. - Virtual Pointer Protocol: Official recommendation for the
virtual:URI prefix in databases.
[0.1.5] - 2026-02-16
Added
- Standardized Constructors: Python
CiteKitClientnow supportsapi_key,model, andmax_retriesdirectly, matching the JS SDK. - Robustness: Implemented exponential backoff and retry logic in
GeminiMapper(429 handling).
Fixed
- TypeScript: Fixed the
maxRetries"Ghost Property" inCiteKitClientOptionsinterface.
[0.1.4] - 2026-02-15
Changed
- Serverless First Refactor: Optimized all SDKs for Vercel/AWS Lambda. This involved moving
sharp,fluent-ffmpeg, andpdf-libto optional peer dependencies. - Mapping Logic: Removed local PDF parsing in favor of Google's Gemini File API for zero-binary environments.
Added
baseDirsupport: Added ability to redirect all storage/output to/tmpfor read-only filesystems.
[0.1.3] - 2026-02-15
Added
- Initial JavaScript Port: Established the core JS logic to match the Python resolver patterns.
[0.1.2] - 2026-02-15
Added
- Performance: Added hashing, caching, and concurrency support for heavy mapping tasks.
[0.1.0] - 2026-02-14
Added
- Initial Release: Core multimodal resolution patterns for Video, Audio, and Images.
- Gemini Integration: Initial support for multimodal mapping via Gemini 1.5.