Semantic Search
AI-powered code search using natural language queries.
Search code by intent, not just keywords. Semantic search uses AI embeddings to find relevant code even when the exact terms don't appear in the file.
How It Works
- Index your repository — code is chunked and embedded into vectors using Voyage AI
voyage-code-3(optimized for 300+ programming languages) - Search with natural language — your query is embedded and matched against code vectors in Pinecone (serverless vector database, 1024-dimensional cosine similarity)
- Rerank — top candidates are reranked by Voyage AI
rerank-2.5(cross-attention model with instruction-following) for precision - Diversify — results are diversified across files using MMR (Maximal Marginal Relevance) so you don't get 10 results from the same file
| Component | Provider | Model |
|---|---|---|
| Code embeddings | Voyage AI | voyage-code-3 (1024D, 32K context) |
| Vector storage | Pinecone | Serverless (cosine, AWS us-east-1) |
| Reranking | Voyage AI | rerank-2.5 (32K context) |
Code text is never stored in the vector database — only vectors and lightweight metadata. Snippets are fetched from Git storage on the fly for reranking and response.
Performance: Search results are cached in Cloudflare KV keyed by commit SHA + query. Repeated queries return instantly from cache (X-Cache: HIT). Cache auto-invalidates when new commits are pushed — the commit SHA in the cache key guarantees correctness.
Index a Repository
Before searching, you must index the repository branch.
POST /v1/repos/:slug/indexPermission: Write access required.
{
"branch": "main"
}| Field | Required | Description |
|---|---|---|
branch | No | Branch to index (default: repo's default branch) |
Response 202
{
"message": "Reindex queued",
"repo_slug": "my-app",
"branch": "main"
}Indexing is asynchronous. Poll the status endpoint to check progress.
Check Index Status
GET /v1/repos/:slug/index/status?branch=mainPermission: Read access required.
Response 200
{
"indexed": true,
"status": "ready",
"branch": "main",
"last_commit_sha": "8b7d315321...",
"chunks_count": 42,
"total_batches": 1,
"processed_batches": 1,
"indexed_at": "2026-04-06T16:02:55.687Z",
"error": null
}| Field | Description |
|---|---|
indexed | true when status is "ready" |
status | "not_indexed" | "pending" | "indexing" | "ready" | "failed" |
chunks_count | Number of code chunks indexed |
total_batches | Total indexing batches (for large repos) |
processed_batches | Completed batches |
indexed_at | When the index was last updated |
error | Error message if status is "failed" |
Semantic Search
POST /v1/repos/:slug/semantic-searchPermission: Read access required.
{
"q": "user authentication with password verification",
"ref": "main",
"language": "typescript",
"top_k": 10,
"expand_context": false
}Fields
| Field | Required | Description |
|---|---|---|
q | Yes | Natural language search query (max 1000 chars) |
ref | No | Branch name or commit SHA (default: repo's default branch) |
path_pattern | No | Glob filter on file paths (e.g., "src/*.ts") |
language | No | Filter by programming language (e.g., "typescript", "python") |
top_k | No | Number of results to return (default: 10, max: 50) |
expand_context | No | When true, include ~20 lines of surrounding code before/after each snippet |
Response 200
{
"results": [
{
"file_path": "src/services/user-service.ts",
"score": 0.738,
"language": "typescript",
"start_line": 1,
"end_line": 63,
"snippet": "import { db } from \"../db\";\nimport { hash, compare } from \"bcrypt\";\n\nexport class UserService {\n async authenticate(email: string, password: string) {\n ...\n }\n}"
},
{
"file_path": "src/auth/middleware.ts",
"score": 0.562,
"language": "typescript",
"start_line": 1,
"end_line": 43,
"snippet": "..."
}
],
"query": "user authentication with password verification",
"repo_slug": "my-app",
"ref": "main"
}Response headers:
| Header | Description |
|---|---|
X-Cache | HIT if results came from cache, MISS if computed fresh |
Results are automatically diversified across files — if multiple chunks match from the same file, later results are penalized to surface matches from different files first.
Result fields:
| Field | Description |
|---|---|
results | Array of matching code chunks, sorted by relevance and diversified across files |
results[].file_path | Path to the file in the repository |
results[].score | Relevance score from 0 to 1 (higher is better) |
results[].language | Detected programming language |
results[].start_line | Start line of the matched chunk |
results[].end_line | End line of the matched chunk |
results[].snippet | The actual code content |
results[].context_before | (only with expand_context: true) ~20 lines of code before the snippet |
results[].context_after | (only with expand_context: true) ~20 lines of code after the snippet |
Error Responses
| Status | Description |
|---|---|
404 | Ref not found, or no blobs indexed for this version. |
202 | Indexing is in progress. Retry after it completes. |
503 | Semantic search is not configured on the server. |
Search by Commit SHA
You can search code as it existed at any indexed commit:
{
"q": "database connection pooling",
"ref": "8b7d3153212bdb9affd9a969cbdb2c17608aab87"
}This returns results matching the state of the codebase at that specific commit, not the current branch HEAD. Vectors are content-addressed — no re-indexing needed.
Important: searching by commit SHA works for any commit whose files have been indexed. Once a branch is indexed, all commits in its history become searchable — old blob vectors are never deleted. However, if a file existed only in a commit before the first indexing and has since been completely rewritten, the old version's vectors won't be in the index. In that case, results for that commit will be partial (only files whose content was seen during any indexing run).
Delete Index
DELETE /v1/repos/:slug/indexPermission: Write access required.
With no body: deletes all vectors for the repo and all DB tracking records.
With branch in body: deletes only the DB tracking record (vectors are shared and remain).
{
"branch": "feature-x"
}Response 200
{
"deleted": true,
"vectors_deleted": true
}Auto-Indexing
Set auto_index: true when creating a repository to automatically index new commits:
{
"slug": "my-app",
"auto_index": true
}When enabled, every commit triggers incremental indexing — only changed files are re-embedded. Vectors are content-addressed by blob SHA, so identical content across branches/commits is never duplicated.
SDK Examples
const git = createCoregitClient({ apiKey: "cgk_live_..." });
// 1. Trigger indexing
await git.search.triggerIndex("my-app", { branch: "main" });
// 2. Wait for indexing to complete
let status;
let attempts = 0;
do {
const { data } = await git.search.indexStatus("my-app", "main");
status = data?.status;
if (status === "indexing" || status === "pending") {
if (++attempts > 60) throw new Error("Indexing timed out");
await new Promise(r => setTimeout(r, 2000));
}
} while (status === "indexing" || status === "pending");
// 3. Search current branch
const { data } = await git.search.semantic("my-app", {
q: "function that handles user login and session creation",
ref: "main",
top_k: 5,
});
for (const result of data.results) {
console.log(`${result.file_path} (score: ${result.score.toFixed(2)})`);
console.log(` Lines ${result.start_line}-${result.end_line}`);
}
// 4. Search at a specific commit (version-aware)
const { data: oldVersion } = await git.search.semantic("my-app", {
q: "authentication middleware",
ref: "8b7d3153212bdb9affd9a969cbdb2c17608aab87",
});
// 5. Filter by language and path
const { data: tsResults } = await git.search.semantic("my-app", {
q: "database connection pooling",
ref: "feature-branch",
language: "typescript",
path_pattern: "src/**/*.ts",
});
// 6. Expand context around matches
const { data: withContext } = await git.search.semantic("my-app", {
q: "error handling middleware",
ref: "main",
top_k: 5,
expand_context: true,
});
for (const result of withContext.results) {
if (result.context_before) console.log("--- before ---\n", result.context_before);
console.log("--- match ---\n", result.snippet);
if (result.context_after) console.log("--- after ---\n", result.context_after);
}