proxy

package

v0.3.1 Latest Latest Go to latest Published: Jan 26, 2026 License: MIT Imports: 27 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/evanlouie/gopilot-api

Links

Open Source Insights

Documentation ¶

Overview ¶

Package proxy provides the Anthropic token counting endpoint handler.

Package proxy provides the embeddings endpoint handler.

Package proxy provides HTTP handlers for the API proxy.

Package proxy provides the Anthropic messages endpoint handler.

Package proxy provides native messages endpoint handler.

Package proxy provides rate limiting middleware.

Package proxy provides HTTP handlers for the API proxy.

Package proxy provides token counting utilities using tiktoken.

Why Token Estimation Exists ¶

This package estimates input tokens because of a fundamental protocol mismatch between Anthropic's streaming API and OpenAI's Chat Completions API:

Anthropic: The message_start event (FIRST event) must include input_tokens
OpenAI: Token usage appears in the FINAL chunk of the stream

Since this proxy translates Anthropic-format requests to OpenAI format and sends them to GitHub Copilot, we face a temporal problem: we must emit input_tokens before we know the actual count from the upstream provider.

The solution is to estimate tokens from the request content using tiktoken, then emit that estimate in message_start. The actual token count (when available from the upstream provider) replaces our estimate in the final message_delta event.

This means clients see an estimated count initially, then the real count at the end. For most use cases this is acceptable — the estimate is close enough for UI display, and any billing/quota tracking uses the final accurate count.

Index ¶

func CountTokens(text string) int
func EstimateInputTokens(req *translate.AnthropicRequest) int
func EstimateTokensFromCountRequest(req *CountTokensRequest) int
func EstimateTokensFromCountRequestWithBeta(req *CountTokensRequest, anthropicBeta string) int
type CountTokensRequest
type CountTokensResponse
type Handler
- func NewHandler(cfg config.Config, client *copilot.Client) *Handler
type RateLimiter
- func NewRateLimiter(intervalSecs int, waitOnLimit bool, verbose bool) *RateLimiter

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CountTokens ¶

func CountTokens(text string) int

CountTokens counts tokens in a string using tiktoken. Falls back to character-based estimation if tokenizer fails.

func EstimateInputTokens ¶

func EstimateInputTokens(req *translate.AnthropicRequest) int

EstimateInputTokens counts input tokens for an Anthropic request. Uses tiktoken for accurate counting of text content and Anthropic's dimension-based formula for images.

func EstimateTokensFromCountRequest ¶

func EstimateTokensFromCountRequest(req *CountTokensRequest) int

EstimateTokensFromCountRequest counts tokens from a CountTokensRequest.

func EstimateTokensFromCountRequestWithBeta ¶

func EstimateTokensFromCountRequestWithBeta(req *CountTokensRequest, anthropicBeta string) int

EstimateTokensFromCountRequestWithBeta counts tokens, accounting for MCP/Skill tools.

Types ¶

type CountTokensRequest ¶

type CountTokensRequest struct {
	Model    string                       `json:"model"`
	Messages []translate.AnthropicMessage `json:"messages"`
	System   json.RawMessage              `json:"system,omitempty"`
	Tools    []translate.AnthropicTool    `json:"tools,omitempty"`
}

CountTokensRequest is the request body for token counting.

type CountTokensResponse ¶

type CountTokensResponse struct {
	InputTokens int `json:"input_tokens"`
}

CountTokensResponse is the response for token counting.

type Handler ¶

type Handler struct {
	// contains filtered or unexported fields
}

Handler provides HTTP handlers for the proxy.

func NewHandler ¶

func NewHandler(cfg config.Config, client *copilot.Client) *Handler

NewHandler creates a new handler.

func (*Handler) HandleCompletions ¶

func (h *Handler) HandleCompletions(w http.ResponseWriter, r *http.Request)

HandleCompletions handles chat completion requests.

func (*Handler) HandleCountTokens ¶

func (h *Handler) HandleCountTokens(w http.ResponseWriter, r *http.Request)

HandleCountTokens handles Anthropic token counting requests. This provides an estimate since we don't have access to the actual tokenizer.

func (*Handler) HandleEmbeddings ¶

func (h *Handler) HandleEmbeddings(w http.ResponseWriter, r *http.Request)

HandleEmbeddings handles embedding requests.

func (*Handler) HandleMessages ¶

func (h *Handler) HandleMessages(w http.ResponseWriter, r *http.Request)

HandleMessages handles Anthropic-compatible messages requests. Routes to native /v1/messages if the model supports it, otherwise translates to OpenAI format.

func (*Handler) HandleModels ¶

func (h *Handler) HandleModels(w http.ResponseWriter, r *http.Request)

HandleModels handles model listing requests.

func (*Handler) HandleNativeMessages ¶

func (h *Handler) HandleNativeMessages(w http.ResponseWriter, r *http.Request)

HandleNativeMessages handles Anthropic messages requests by passing them directly to Copilot's native /v1/messages endpoint without translation. This verifies that Copilot natively supports the Anthropic Messages API.

func (*Handler) HandleResponses ¶

func (h *Handler) HandleResponses(w http.ResponseWriter, r *http.Request)

HandleResponses handles OpenAI Responses API requests. This is a pass-through proxy - we forward the request to Copilot's /responses endpoint and stream the response back, fixing ID inconsistencies in the stream.

func (*Handler) HandleRoot ¶

func (h *Handler) HandleRoot(w http.ResponseWriter, r *http.Request)

HandleRoot handles the root endpoint.

type RateLimiter ¶

type RateLimiter struct {
	// contains filtered or unexported fields
}

RateLimiter provides token bucket rate limiting. It is safe for concurrent use.

Unlike a simple interval-based limiter, a token bucket properly queues requests when waitOnLimit is true, preventing bursts after waiting.

func NewRateLimiter ¶

func NewRateLimiter(intervalSecs int, waitOnLimit bool, verbose bool) *RateLimiter

NewRateLimiter creates a new rate limiter. intervalSecs is the minimum time between requests (0 disables rate limiting). waitOnLimit determines whether to wait or return 429 when rate limited.

The limiter uses a token bucket algorithm with burst=1, meaning requests are spaced evenly rather than allowing bursts after idle periods.

func (*RateLimiter) Check ¶

func (rl *RateLimiter) Check() error

Check checks the rate limit and either waits or returns an error. Returns nil if the request can proceed.

func (*RateLimiter) CheckWithContext ¶

func (rl *RateLimiter) CheckWithContext(ctx context.Context) error

CheckWithContext checks the rate limit with context support for cancellation.

func (*RateLimiter) Middleware ¶

func (rl *RateLimiter) Middleware(next http.Handler) http.Handler

Middleware wraps an http.Handler with rate limiting.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL