docscontext

command module
v0.0.0-beta.18 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 20, 2026 License: MIT Imports: 1 Imported by: 0

README

DocsContext

Security Scan OpenSSF Scan OpenSSF Score Release Beta Go Version Frontend Version

A pure Go (CGO_ENABLED=0) GraphRAG tool inspired by Microsoft GraphRAG. Ingests unstructured documents, builds a hierarchical knowledge graph with community detection, and exposes an MCP server + embedded Web UI on a single port.

Features

  • GraphRAG pipeline — 5-phase: load → chunk → embed → graph extraction → community detection
  • Knowledge graph — entity/relationship/claim extraction via LLM (JSON mode)
  • Louvain community detection — pure Go, hierarchical, no external dependencies
  • Three LLM providers — Azure OpenAI, Ollama (local), HuggingFace TGI
  • 12 MCP tools — local search, global search, graph walk, community reports, and more
  • Embedded Web UI — vis-network graph explorer, semantic search, document browser
  • Single binary — zero CGO, cross-compiles to Linux / macOS / Windows

Install

go install github.com/RandomCodeSpace/docscontext@latest

Or build from source:

git clone https://github.com/RandomCodeSpace/docscontext.git
cd DocsContext
CGO_ENABLED=0 go build -o docscontext .

Quick Start

# 1. Create config
mkdir -p ~/.docscontext
cp config.example.yaml ~/.docscontext/config.yaml
# Edit ~/.docscontext/config.yaml — set your LLM provider

# 2. Index documents (Phases 1-2: load, chunk, embed, extract entities)
docscontext index ./your-docs/ --workers 4

# 3. Build knowledge graph (Phases 3-4: community detection + summaries)
docscontext index --finalize

# 4. Check stats
docscontext stats

# 5. Start server
docscontext serve --port 8080

Open http://localhost:8080 for the Web UI.

Configuration

Copy config.example.yaml to ~/.docscontext/config.yaml and edit:

data_dir: ~/.docscontext/data

llm:
  provider: ollama          # azure | ollama | huggingface

  ollama:
    base_url: http://localhost:11434
    chat_model: llama3.2
    embed_model: nomic-embed-text

  azure:
    endpoint: https://myresource.openai.azure.com
    api_key: ${AZURE_OPENAI_API_KEY}
    api_version: "2024-02-01"
    chat_model: gpt-4o
    embed_model: text-embedding-3-small

indexing:
  chunk_size: 512
  chunk_overlap: 50
  workers: 4

server:
  host: 127.0.0.1
  port: 8080

Environment variable overrides use the docscontext_ prefix:

docscontext_LLM_PROVIDER=azure
docscontext_LLM_AZURE_API_KEY=sk-...
docscontext_SERVER_PORT=9090

CLI

# Index a file or directory
docscontext index ./docs/ [--force] [--workers 4] [--verbose]

# Run community detection + LLM summaries
docscontext index --finalize

# Show statistics
docscontext stats
docscontext stats --json

# Start MCP + Web UI server
docscontext serve [--port 8080] [--host 127.0.0.1]

MCP Tools

Connect any MCP client to http://localhost:8080/mcp/sse.

Tool Description
search_documents Vector similarity search over chunks
local_search Vector + graph walk (GraphRAG local)
global_search Community summary aggregation with LLM synthesis
query_entity Entity details + relationships by name
find_relationships Relationship lookup by source / target / predicate
get_graph_neighborhood Subgraph JSON for visualization
get_document_structure LLM-generated structured summary
list_entities Browse entities with type filter
list_documents Browse indexed documents
get_community_report Community summary + member entities
get_chunk Retrieve chunk by ID
stats Full index statistics

REST API

GET  /api/stats
GET  /api/documents
GET  /api/documents/{id}
POST /api/search          {"query":"...","mode":"local|global","top_k":5}
GET  /api/graph/neighborhood?entity=<name>&depth=2
GET  /api/entities
GET  /api/communities
GET  /api/communities/{id}
POST /api/upload

Architecture

Document In
    │
    ▼ Phase 1 — Text Units
  Loader (PDF/DOCX/TXT/MD) → Chunker → Embedder → SQLite

    ▼ Phase 2 — Graph Extraction  [parallel per document]
  LLM → Entities + Relationships + Claims → SQLite

    ▼ Phase 3 — Community Detection  [post-index finalization]
  Louvain algorithm → hierarchical community assignments

    ▼ Phase 4 — Community Summaries  [parallel]
  LLM → CommunityReport → embed summary → SQLite

    ▼ Phase 5 — Structured Doc
  LLM → JSON summary → SQLite

All data lives in a single SQLite file at $DATA_DIR/docscontext.db.

Supported File Types

Format Extensions Notes
PDF .pdf Text extraction via pdfcpu; scanned/image-only PDFs yield no text
Word .docx Open XML format (Office 2007+); legacy .doc not supported
Markdown .md, .markdown Heading # Title used as document title
Plain text .txt, .text UTF-8 encoding expected

Tip: For best graph quality, prefer documents with clear structure (headings, named entities, factual prose). Scanned PDFs or heavily formatted spreadsheets will produce sparse graphs.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
internal
api
crawler
Package crawler discovers and fetches pages from documentation websites.
Package crawler discovers and fetches pages from documentation websites.
llm
mcp

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL