Coregit
API Reference

Semantic Search

AI-powered code search using natural language queries.

Search code by intent, not just keywords. Semantic search uses AI embeddings to find relevant code even when the exact terms don't appear in the file.

How It Works

  1. Index your repository — code is chunked and embedded into vectors using Voyage AI voyage-code-3 (optimized for 300+ programming languages)
  2. Search with natural language — your query is embedded and matched against code vectors in Pinecone (serverless vector database, 1024-dimensional cosine similarity)
  3. Rerank — top candidates are reranked by Voyage AI rerank-2.5 (cross-attention model with instruction-following) for precision
  4. Diversify — results are diversified across files using MMR (Maximal Marginal Relevance) so you don't get 10 results from the same file
ComponentProviderModel
Code embeddingsVoyage AIvoyage-code-3 (1024D, 32K context)
Vector storagePineconeServerless (cosine, AWS us-east-1)
RerankingVoyage AIrerank-2.5 (32K context)

Code text is never stored in the vector database — only vectors and lightweight metadata. Snippets are fetched from Git storage on the fly for reranking and response.

Performance: Search results are cached in Cloudflare KV keyed by commit SHA + query. Repeated queries return instantly from cache (X-Cache: HIT). Cache auto-invalidates when new commits are pushed — the commit SHA in the cache key guarantees correctness.

Index a Repository

Before searching, you must index the repository branch.

POST /v1/repos/:slug/index

Permission: Write access required.

{
  "branch": "main"
}
FieldRequiredDescription
branchNoBranch to index (default: repo's default branch)

Response 202

{
  "message": "Reindex queued",
  "repo_slug": "my-app",
  "branch": "main"
}

Indexing is asynchronous. Poll the status endpoint to check progress.

Check Index Status

GET /v1/repos/:slug/index/status?branch=main

Permission: Read access required.

Response 200

{
  "indexed": true,
  "status": "ready",
  "branch": "main",
  "last_commit_sha": "8b7d315321...",
  "chunks_count": 42,
  "total_batches": 1,
  "processed_batches": 1,
  "indexed_at": "2026-04-06T16:02:55.687Z",
  "error": null
}
FieldDescription
indexedtrue when status is "ready"
status"not_indexed" | "pending" | "indexing" | "ready" | "failed"
chunks_countNumber of code chunks indexed
total_batchesTotal indexing batches (for large repos)
processed_batchesCompleted batches
indexed_atWhen the index was last updated
errorError message if status is "failed"
POST /v1/repos/:slug/semantic-search

Permission: Read access required.

{
  "q": "user authentication with password verification",
  "ref": "main",
  "language": "typescript",
  "top_k": 10,
  "expand_context": false
}

Fields

FieldRequiredDescription
qYesNatural language search query (max 1000 chars)
refNoBranch name or commit SHA (default: repo's default branch)
path_patternNoGlob filter on file paths (e.g., "src/*.ts")
languageNoFilter by programming language (e.g., "typescript", "python")
top_kNoNumber of results to return (default: 10, max: 50)
expand_contextNoWhen true, include ~20 lines of surrounding code before/after each snippet

Response 200

{
  "results": [
    {
      "file_path": "src/services/user-service.ts",
      "score": 0.738,
      "language": "typescript",
      "start_line": 1,
      "end_line": 63,
      "snippet": "import { db } from \"../db\";\nimport { hash, compare } from \"bcrypt\";\n\nexport class UserService {\n  async authenticate(email: string, password: string) {\n    ...\n  }\n}"
    },
    {
      "file_path": "src/auth/middleware.ts",
      "score": 0.562,
      "language": "typescript",
      "start_line": 1,
      "end_line": 43,
      "snippet": "..."
    }
  ],
  "query": "user authentication with password verification",
  "repo_slug": "my-app",
  "ref": "main"
}

Response headers:

HeaderDescription
X-CacheHIT if results came from cache, MISS if computed fresh

Results are automatically diversified across files — if multiple chunks match from the same file, later results are penalized to surface matches from different files first.

Result fields:

FieldDescription
resultsArray of matching code chunks, sorted by relevance and diversified across files
results[].file_pathPath to the file in the repository
results[].scoreRelevance score from 0 to 1 (higher is better)
results[].languageDetected programming language
results[].start_lineStart line of the matched chunk
results[].end_lineEnd line of the matched chunk
results[].snippetThe actual code content
results[].context_before(only with expand_context: true) ~20 lines of code before the snippet
results[].context_after(only with expand_context: true) ~20 lines of code after the snippet

Error Responses

StatusDescription
404Ref not found, or no blobs indexed for this version.
202Indexing is in progress. Retry after it completes.
503Semantic search is not configured on the server.

Search by Commit SHA

You can search code as it existed at any indexed commit:

{
  "q": "database connection pooling",
  "ref": "8b7d3153212bdb9affd9a969cbdb2c17608aab87"
}

This returns results matching the state of the codebase at that specific commit, not the current branch HEAD. Vectors are content-addressed — no re-indexing needed.

Important: searching by commit SHA works for any commit whose files have been indexed. Once a branch is indexed, all commits in its history become searchable — old blob vectors are never deleted. However, if a file existed only in a commit before the first indexing and has since been completely rewritten, the old version's vectors won't be in the index. In that case, results for that commit will be partial (only files whose content was seen during any indexing run).

Delete Index

DELETE /v1/repos/:slug/index

Permission: Write access required.

With no body: deletes all vectors for the repo and all DB tracking records.

With branch in body: deletes only the DB tracking record (vectors are shared and remain).

{
  "branch": "feature-x"
}

Response 200

{
  "deleted": true,
  "vectors_deleted": true
}

Auto-Indexing

Set auto_index: true when creating a repository to automatically index new commits:

{
  "slug": "my-app",
  "auto_index": true
}

When enabled, every commit triggers incremental indexing — only changed files are re-embedded. Vectors are content-addressed by blob SHA, so identical content across branches/commits is never duplicated.

SDK Examples

const git = createCoregitClient({ apiKey: "cgk_live_..." });

// 1. Trigger indexing
await git.search.triggerIndex("my-app", { branch: "main" });

// 2. Wait for indexing to complete
let status;
let attempts = 0;
do {
  const { data } = await git.search.indexStatus("my-app", "main");
  status = data?.status;
  if (status === "indexing" || status === "pending") {
    if (++attempts > 60) throw new Error("Indexing timed out");
    await new Promise(r => setTimeout(r, 2000));
  }
} while (status === "indexing" || status === "pending");

// 3. Search current branch
const { data } = await git.search.semantic("my-app", {
  q: "function that handles user login and session creation",
  ref: "main",
  top_k: 5,
});

for (const result of data.results) {
  console.log(`${result.file_path} (score: ${result.score.toFixed(2)})`);
  console.log(`  Lines ${result.start_line}-${result.end_line}`);
}

// 4. Search at a specific commit (version-aware)
const { data: oldVersion } = await git.search.semantic("my-app", {
  q: "authentication middleware",
  ref: "8b7d3153212bdb9affd9a969cbdb2c17608aab87",
});

// 5. Filter by language and path
const { data: tsResults } = await git.search.semantic("my-app", {
  q: "database connection pooling",
  ref: "feature-branch",
  language: "typescript",
  path_pattern: "src/**/*.ts",
});

// 6. Expand context around matches
const { data: withContext } = await git.search.semantic("my-app", {
  q: "error handling middleware",
  ref: "main",
  top_k: 5,
  expand_context: true,
});

for (const result of withContext.results) {
  if (result.context_before) console.log("--- before ---\n", result.context_before);
  console.log("--- match ---\n", result.snippet);
  if (result.context_after) console.log("--- after ---\n", result.context_after);
}

On this page