Documentation
¶
Overview ¶
Package proxy provides the Anthropic token counting endpoint handler.
Package proxy provides the embeddings endpoint handler.
Package proxy provides HTTP handlers for the API proxy.
Package proxy provides the Anthropic messages endpoint handler.
Package proxy provides native messages endpoint handler.
Package proxy provides rate limiting middleware.
Package proxy provides HTTP handlers for the API proxy.
Package proxy provides token counting utilities using tiktoken.
Why Token Estimation Exists ¶
This package estimates input tokens because of a fundamental protocol mismatch between Anthropic's streaming API and OpenAI's Chat Completions API:
- Anthropic: The message_start event (FIRST event) must include input_tokens
- OpenAI: Token usage appears in the FINAL chunk of the stream
Since this proxy translates Anthropic-format requests to OpenAI format and sends them to GitHub Copilot, we face a temporal problem: we must emit input_tokens before we know the actual count from the upstream provider.
The solution is to estimate tokens from the request content using tiktoken, then emit that estimate in message_start. The actual token count (when available from the upstream provider) replaces our estimate in the final message_delta event.
This means clients see an estimated count initially, then the real count at the end. For most use cases this is acceptable — the estimate is close enough for UI display, and any billing/quota tracking uses the final accurate count.
Index ¶
- func CountTokens(text string) int
- func EstimateInputTokens(req *translate.AnthropicRequest) int
- func EstimateTokensFromCountRequest(req *CountTokensRequest) int
- func EstimateTokensFromCountRequestWithBeta(req *CountTokensRequest, anthropicBeta string) int
- type CountTokensRequest
- type CountTokensResponse
- type Handler
- func (h *Handler) HandleCompletions(w http.ResponseWriter, r *http.Request)
- func (h *Handler) HandleCountTokens(w http.ResponseWriter, r *http.Request)
- func (h *Handler) HandleEmbeddings(w http.ResponseWriter, r *http.Request)
- func (h *Handler) HandleMessages(w http.ResponseWriter, r *http.Request)
- func (h *Handler) HandleModels(w http.ResponseWriter, r *http.Request)
- func (h *Handler) HandleNativeMessages(w http.ResponseWriter, r *http.Request)
- func (h *Handler) HandleResponses(w http.ResponseWriter, r *http.Request)
- func (h *Handler) HandleRoot(w http.ResponseWriter, r *http.Request)
- type RateLimiter
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CountTokens ¶
CountTokens counts tokens in a string using tiktoken. Falls back to character-based estimation if tokenizer fails.
func EstimateInputTokens ¶
func EstimateInputTokens(req *translate.AnthropicRequest) int
EstimateInputTokens counts input tokens for an Anthropic request. Uses tiktoken for accurate counting of text content and Anthropic's dimension-based formula for images.
func EstimateTokensFromCountRequest ¶
func EstimateTokensFromCountRequest(req *CountTokensRequest) int
EstimateTokensFromCountRequest counts tokens from a CountTokensRequest.
func EstimateTokensFromCountRequestWithBeta ¶
func EstimateTokensFromCountRequestWithBeta(req *CountTokensRequest, anthropicBeta string) int
EstimateTokensFromCountRequestWithBeta counts tokens, accounting for MCP/Skill tools.
Types ¶
type CountTokensRequest ¶
type CountTokensRequest struct {
Model string `json:"model"`
Messages []translate.AnthropicMessage `json:"messages"`
System json.RawMessage `json:"system,omitempty"`
Tools []translate.AnthropicTool `json:"tools,omitempty"`
}
CountTokensRequest is the request body for token counting.
type CountTokensResponse ¶
type CountTokensResponse struct {
InputTokens int `json:"input_tokens"`
}
CountTokensResponse is the response for token counting.
type Handler ¶
type Handler struct {
// contains filtered or unexported fields
}
Handler provides HTTP handlers for the proxy.
func NewHandler ¶
NewHandler creates a new handler.
func (*Handler) HandleCompletions ¶
func (h *Handler) HandleCompletions(w http.ResponseWriter, r *http.Request)
HandleCompletions handles chat completion requests.
func (*Handler) HandleCountTokens ¶
func (h *Handler) HandleCountTokens(w http.ResponseWriter, r *http.Request)
HandleCountTokens handles Anthropic token counting requests. This provides an estimate since we don't have access to the actual tokenizer.
func (*Handler) HandleEmbeddings ¶
func (h *Handler) HandleEmbeddings(w http.ResponseWriter, r *http.Request)
HandleEmbeddings handles embedding requests.
func (*Handler) HandleMessages ¶
func (h *Handler) HandleMessages(w http.ResponseWriter, r *http.Request)
HandleMessages handles Anthropic-compatible messages requests. Routes to native /v1/messages if the model supports it, otherwise translates to OpenAI format.
func (*Handler) HandleModels ¶
func (h *Handler) HandleModels(w http.ResponseWriter, r *http.Request)
HandleModels handles model listing requests.
func (*Handler) HandleNativeMessages ¶
func (h *Handler) HandleNativeMessages(w http.ResponseWriter, r *http.Request)
HandleNativeMessages handles Anthropic messages requests by passing them directly to Copilot's native /v1/messages endpoint without translation. This verifies that Copilot natively supports the Anthropic Messages API.
func (*Handler) HandleResponses ¶
func (h *Handler) HandleResponses(w http.ResponseWriter, r *http.Request)
HandleResponses handles OpenAI Responses API requests. This is a pass-through proxy - we forward the request to Copilot's /responses endpoint and stream the response back, fixing ID inconsistencies in the stream.
func (*Handler) HandleRoot ¶
func (h *Handler) HandleRoot(w http.ResponseWriter, r *http.Request)
HandleRoot handles the root endpoint.
type RateLimiter ¶
type RateLimiter struct {
// contains filtered or unexported fields
}
RateLimiter provides token bucket rate limiting. It is safe for concurrent use.
Unlike a simple interval-based limiter, a token bucket properly queues requests when waitOnLimit is true, preventing bursts after waiting.
func NewRateLimiter ¶
func NewRateLimiter(intervalSecs int, waitOnLimit bool, verbose bool) *RateLimiter
NewRateLimiter creates a new rate limiter. intervalSecs is the minimum time between requests (0 disables rate limiting). waitOnLimit determines whether to wait or return 429 when rate limited.
The limiter uses a token bucket algorithm with burst=1, meaning requests are spaced evenly rather than allowing bursts after idle periods.
func (*RateLimiter) Check ¶
func (rl *RateLimiter) Check() error
Check checks the rate limit and either waits or returns an error. Returns nil if the request can proceed.
func (*RateLimiter) CheckWithContext ¶
func (rl *RateLimiter) CheckWithContext(ctx context.Context) error
CheckWithContext checks the rate limit with context support for cancellation.
func (*RateLimiter) Middleware ¶
func (rl *RateLimiter) Middleware(next http.Handler) http.Handler
Middleware wraps an http.Handler with rate limiting.