docscontext

command module

v0.0.0-beta.11 Latest Latest Go to latest Published: Mar 17, 2026 License: MIT Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/RandomCodeSpace/docscontext

Links

Open Source Insights

README ¶

DocsContext

A pure Go (CGO_ENABLED=0) GraphRAG tool inspired by Microsoft GraphRAG. Ingests unstructured documents, builds a hierarchical knowledge graph with community detection, and exposes an MCP server + embedded Web UI on a single port.

Features

GraphRAG pipeline — 5-phase: load → chunk → embed → graph extraction → community detection
Knowledge graph — entity/relationship/claim extraction via LLM (JSON mode)
Louvain community detection — pure Go, hierarchical, no external dependencies
Three LLM providers — Azure OpenAI, Ollama (local), HuggingFace TGI
12 MCP tools — local search, global search, graph walk, community reports, and more
Embedded Web UI — vis-network graph explorer, semantic search, document browser
Single binary — zero CGO, cross-compiles to Linux / macOS / Windows

Install

go install github.com/RandomCodeSpace/docscontext@latest

Or build from source:

git clone https://github.com/RandomCodeSpace/docscontext.git
cd DocsContext
CGO_ENABLED=0 go build -o docscontext .

Quick Start

# 1. Create config
mkdir -p ~/.docscontext
cp config.example.yaml ~/.docscontext/config.yaml
# Edit ~/.docscontext/config.yaml — set your LLM provider

# 2. Index documents (Phases 1-2: load, chunk, embed, extract entities)
docscontext index ./your-docs/ --workers 4

# 3. Build knowledge graph (Phases 3-4: community detection + summaries)
docscontext index --finalize

# 4. Check stats
docscontext stats

# 5. Start server
docscontext serve --port 8080

Open http://localhost:8080 for the Web UI.

Configuration

Copy config.example.yaml to ~/.docscontext/config.yaml and edit:

data_dir: ~/.docscontext/data

llm:
  provider: ollama          # azure | ollama | huggingface

  ollama:
    base_url: http://localhost:11434
    chat_model: llama3.2
    embed_model: nomic-embed-text

  azure:
    endpoint: https://myresource.openai.azure.com
    api_key: ${AZURE_OPENAI_API_KEY}
    api_version: "2024-02-01"
    chat_model: gpt-4o
    embed_model: text-embedding-3-small

indexing:
  chunk_size: 512
  chunk_overlap: 50
  workers: 4

server:
  host: 127.0.0.1
  port: 8080

Environment variable overrides use the docscontext_ prefix:

docscontext_LLM_PROVIDER=azure
docscontext_LLM_AZURE_API_KEY=sk-...
docscontext_SERVER_PORT=9090

CLI

# Index a file or directory
docscontext index ./docs/ [--force] [--workers 4] [--verbose]

# Run community detection + LLM summaries
docscontext index --finalize

# Show statistics
docscontext stats
docscontext stats --json

# Start MCP + Web UI server
docscontext serve [--port 8080] [--host 127.0.0.1]

MCP Tools

Connect any MCP client to http://localhost:8080/mcp/sse.

Tool	Description
`search_documents`	Vector similarity search over chunks
`local_search`	Vector + graph walk (GraphRAG local)
`global_search`	Community summary aggregation with LLM synthesis
`query_entity`	Entity details + relationships by name
`find_relationships`	Relationship lookup by source / target / predicate
`get_graph_neighborhood`	Subgraph JSON for visualization
`get_document_structure`	LLM-generated structured summary
`list_entities`	Browse entities with type filter
`list_documents`	Browse indexed documents
`get_community_report`	Community summary + member entities
`get_chunk`	Retrieve chunk by ID
`stats`	Full index statistics

REST API

GET  /api/stats
GET  /api/documents
GET  /api/documents/{id}
POST /api/search          {"query":"...","mode":"local|global","top_k":5}
GET  /api/graph/neighborhood?entity=<name>&depth=2
GET  /api/entities
GET  /api/communities
GET  /api/communities/{id}
POST /api/upload

Architecture

Document In
    │
    ▼ Phase 1 — Text Units
  Loader (PDF/DOCX/TXT/MD) → Chunker → Embedder → SQLite

    ▼ Phase 2 — Graph Extraction  [parallel per document]
  LLM → Entities + Relationships + Claims → SQLite

    ▼ Phase 3 — Community Detection  [post-index finalization]
  Louvain algorithm → hierarchical community assignments

    ▼ Phase 4 — Community Summaries  [parallel]
  LLM → CommunityReport → embed summary → SQLite

    ▼ Phase 5 — Structured Doc
  LLM → JSON summary → SQLite

All data lives in a single SQLite file at $DATA_DIR/docscontext.db.

Supported File Types

Format	Extensions	Notes
PDF	`.pdf`	Text extraction via pdfcpu; scanned/image-only PDFs yield no text
Word	`.docx`	Open XML format (Office 2007+); legacy `.doc` not supported
Markdown	`.md`, `.markdown`	Heading `# Title` used as document title
Plain text	`.txt`, `.text`	UTF-8 encoding expected

Tip: For best graph quality, prefer documents with clear structure (headings, named entities, factual prose). Scanned PDFs or heavily formatted spreadsheets will produce sparse graphs.

License

MIT

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
cmd
internal
api
chunker
community
config
crawler Package crawler discovers and fetches pages from documentation websites.	Package crawler discovers and fetches pages from documentation websites.
embedder
extractor
llm
loader
mcp
pipeline
search
store
ui

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL