docscontext

command module

v0.0.0-beta.30 Latest Latest Go to latest Published: Mar 23, 2026 License: MIT Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/RandomCodeSpace/docscontext

Links

Open Source Insights

README ¶

DocsContext

A pure Go (CGO_ENABLED=0) GraphRAG tool inspired by Microsoft GraphRAG. Ingests unstructured documents, builds a hierarchical knowledge graph with community detection, and exposes an MCP server + embedded Web UI on a single port.

Features

GraphRAG pipeline — 5-phase: load → chunk → embed → graph extraction → community detection
Knowledge graph — entity/relationship/claim extraction via LLM (JSON mode)
Louvain community detection — pure Go, hierarchical, no external dependencies
Two LLM providers — Azure OpenAI, Ollama (local), via langchaingo
12 MCP tools — local search, global search, graph walk, community reports, and more
Embedded Web UI — vis-network graph explorer, semantic search, document browser
Single binary — zero CGO, cross-compiles to Linux / macOS / Windows

Install

go install github.com/RandomCodeSpace/docscontext@latest

Or build from source:

git clone https://github.com/RandomCodeSpace/docscontext.git
cd DocsContext
CGO_ENABLED=0 go build -o docscontext .

Quick Start

# 1. Create config
mkdir -p ~/.docscontext
cp config.example.yaml ~/.docscontext/config.yaml
# Edit ~/.docscontext/config.yaml — set your LLM provider

# 2. Index documents (Phases 1-2: load, chunk, embed, extract entities)
docscontext index ./your-docs/ --workers 4

# 3. Build knowledge graph (Phases 3-4: community detection + summaries)
docscontext index --finalize

# 4. Check stats
docscontext stats

# 5. Start server
docscontext serve --port 8080

Open http://localhost:8080 for the Web UI.

Configuration

Copy config.example.yaml to ~/.docscontext/config.yaml and edit:

data_dir: ~/.docscontext/data

llm:
  provider: ollama          # azure | ollama

  ollama:
    base_url: http://localhost:11434
    chat_model: llama3.2
    embed_model: nomic-embed-text

  azure:
    # Shared defaults — used when chat/embed-specific values are not set.
    endpoint: https://myresource.openai.azure.com
    api_key: ${AZURE_OPENAI_API_KEY}
    api_version: "2024-08-01"

    chat:
      model: gpt-4o
      # endpoint: ...   # optional override (falls back to shared)
      # api_key: ...    # optional override (falls back to shared)
    embed:
      model: text-embedding-3-small
      # endpoint: ...   # optional override for separate deployment
      # api_key: ...    # optional override

indexing:
  chunk_size: 512
  chunk_overlap: 50
  workers: 4
  extract_graph: true
  extract_claims: true
  max_gleanings: 1

community:
  min_community_size: 2
  max_levels: 3

server:
  host: 127.0.0.1
  port: 8080

Environment Variables

All config keys can be set via environment variables with the DOCSCONTEXT_ prefix. Dots become underscores; the prefix is case-insensitive.

# Provider
export DOCSCONTEXT_LLM_PROVIDER=azure

# Azure shared (used by both chat and embed unless overridden)
export DOCSCONTEXT_LLM_AZURE_ENDPOINT=https://myresource.openai.azure.com
export DOCSCONTEXT_LLM_AZURE_API_KEY=sk-...
export DOCSCONTEXT_LLM_AZURE_API_VERSION=2024-08-01

# Azure chat (overrides shared values for chat completions)
export DOCSCONTEXT_LLM_AZURE_CHAT_ENDPOINT=https://chat-deployment.openai.azure.com
export DOCSCONTEXT_LLM_AZURE_CHAT_MODEL=gpt-4o

# Azure embed (overrides shared values for embeddings)
export DOCSCONTEXT_LLM_AZURE_EMBED_ENDPOINT=https://embed-deployment.openai.azure.com
export DOCSCONTEXT_LLM_AZURE_EMBED_MODEL=text-embedding-3-small

# Ollama
export DOCSCONTEXT_LLM_OLLAMA_BASE_URL=http://localhost:11434
export DOCSCONTEXT_LLM_OLLAMA_CHAT_MODEL=llama3.2
export DOCSCONTEXT_LLM_OLLAMA_EMBED_MODEL=nomic-embed-text

# Indexing
export DOCSCONTEXT_INDEXING_WORKERS=4
export DOCSCONTEXT_INDEXING_CHUNK_SIZE=512

# Server
export DOCSCONTEXT_SERVER_PORT=9090

CLI

# Index a file or directory
docscontext index ./docs/ [--force] [--workers 4] [--verbose]

# Run community detection + LLM summaries
docscontext index --finalize

# Show statistics
docscontext stats
docscontext stats --json

# Start MCP + Web UI server
docscontext serve [--port 8080] [--host 127.0.0.1]

MCP Tools

Connect any MCP client to http://localhost:8080/mcp/sse.

Tool	Description
`search_documents`	Vector similarity search over chunks
`local_search`	Vector + graph walk (GraphRAG local)
`global_search`	Community summary aggregation with LLM synthesis
`query_entity`	Entity details + relationships by name
`find_relationships`	Relationship lookup by source / target / predicate
`get_graph_neighborhood`	Subgraph JSON for visualization
`get_document_structure`	LLM-generated structured summary
`list_entities`	Browse entities with type filter
`list_documents`	Browse indexed documents
`get_community_report`	Community summary + member entities
`get_chunk`	Retrieve chunk by ID
`stats`	Full index statistics

REST API

GET  /api/stats
GET  /api/documents
GET  /api/documents/{id}
POST /api/search          {"query":"...","mode":"local|global","top_k":5}
GET  /api/graph/neighborhood?entity=<name>&depth=2
GET  /api/entities
GET  /api/communities
GET  /api/communities/{id}
POST /api/upload

Architecture

Document In
    │
    ▼ Phase 1 — Text Units
  Loader (PDF/DOCX/TXT/MD) → Chunker → Embedder → SQLite

    ▼ Phase 2 — Graph Extraction  [parallel per document]
  LLM → Entities + Relationships + Claims → SQLite

    ▼ Phase 3 — Community Detection  [post-index finalization]
  Louvain algorithm → hierarchical community assignments

    ▼ Phase 4 — Community Summaries  [parallel]
  LLM → CommunityReport → embed summary → SQLite

    ▼ Phase 5 — Structured Doc
  LLM → JSON summary → SQLite

All data lives in a single SQLite file at $DATA_DIR/docscontext.db.

Supported File Types

Format	Extensions	Notes
PDF	`.pdf`	Text extraction via langchaingo (ledongthuc/pdf); scanned/image-only PDFs yield no text
Word	`.docx`	Open XML format (Office 2007+); legacy `.doc` not supported
Markdown	`.md`, `.markdown`	Heading `# Title` used as document title
Plain text	`.txt`, `.text`	UTF-8 encoding expected

Tip: For best graph quality, prefer documents with clear structure (headings, named entities, factual prose). Scanned PDFs or heavily formatted spreadsheets will produce sparse graphs.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
cmd
internal
api
chunker
community
config
crawler Package crawler discovers and fetches pages from documentation websites.	Package crawler discovers and fetches pages from documentation websites.
embedder
extractor
llm
loader
mcp
pipeline
search
store
ui

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL