Semantic Search

Search code by intent, not just keywords. Semantic search uses AI embeddings to find relevant code even when the exact terms don't appear in the file.

How It Works

Index your repository — code is chunked and embedded into vectors using an AI code embedding model (optimized for 300+ programming languages)
Search with natural language — your query is embedded and matched against code vectors in a serverless vector database (1024-dimensional cosine similarity)
Rerank — top candidates are reranked by an AI reranker (cross-attention model with instruction-following) for precision
Diversify — results are diversified across files using MMR (Maximal Marginal Relevance) so you don't get 10 results from the same file

Component	Description
Code embeddings	AI code embedding model (1024D, 32K context)
Vector storage	Serverless vector database (cosine similarity)
Reranking	AI reranker (32K context)

Code text is never stored in the vector database — only vectors and lightweight metadata. Snippets are fetched from Git storage on the fly for reranking and response.

Performance: Search results are cached in the edge cache keyed by commit SHA + query. Repeated queries return instantly from cache (X-Cache: HIT). Cache auto-invalidates when new commits are pushed — the commit SHA in the cache key guarantees correctness.

Index a Repository

Before searching, you must index the repository branch.

POST /v1/repos/:slug/index

Permission: Write access required.

{
  "branch": "main"
}

Field	Required	Description
`branch`	No	Branch to index (default: repo's default branch)

Response `202`

{
  "message": "Reindex queued",
  "repo_slug": "my-app",
  "branch": "main"
}

Indexing is asynchronous. Poll the status endpoint to check progress.

Check Index Status

GET /v1/repos/:slug/index/status?branch=main

Permission: Read access required.

Response `200`

{
  "indexed": true,
  "status": "ready",
  "branch": "main",
  "last_commit_sha": "8b7d315321...",
  "chunks_count": 42,
  "total_batches": 1,
  "processed_batches": 1,
  "indexed_at": "2026-04-06T16:02:55.687Z",
  "error": null
}

Field	Description
`indexed`	`true` when status is `"ready"`
`status`	`"not_indexed"` \| `"pending"` \| `"indexing"` \| `"ready"` \| `"failed"`
`chunks_count`	Number of code chunks indexed
`total_batches`	Total indexing batches (for large repos)
`processed_batches`	Completed batches
`indexed_at`	When the index was last updated
`error`	Error message if status is `"failed"`

Semantic Search

POST /v1/repos/:slug/semantic-search

Permission: Read access required.

{
  "q": "user authentication with password verification",
  "ref": "main",
  "language": "typescript",
  "top_k": 10,
  "expand_context": false
}

Fields

Field	Required	Description
`q`	Yes	Natural language search query (max 1000 chars)
`ref`	No	Branch name or commit SHA (default: repo's default branch)
`path_pattern`	No	Glob filter on file paths (e.g., `"src/*.ts"`)
`language`	No	Filter by programming language (e.g., `"typescript"`, `"python"`)
`top_k`	No	Number of results to return (default: 10, max: 50)
`expand_context`	No	When `true`, include ~20 lines of surrounding code before/after each snippet

Response `200`

{
  "results": [
    {
      "file_path": "src/services/user-service.ts",
      "score": 0.738,
      "language": "typescript",
      "start_line": 1,
      "end_line": 63,
      "snippet": "import { db } from \"../db\";\nimport { hash, compare } from \"bcrypt\";\n\nexport class UserService {\n  async authenticate(email: string, password: string) {\n    ...\n  }\n}"
    },
    {
      "file_path": "src/auth/middleware.ts",
      "score": 0.562,
      "language": "typescript",
      "start_line": 1,
      "end_line": 43,
      "snippet": "..."
    }
  ],
  "query": "user authentication with password verification",
  "repo_slug": "my-app",
  "ref": "main"
}

Response headers:

Header	Description
`X-Cache`	`HIT` if results came from cache, `MISS` if computed fresh

Results are automatically diversified across files — if multiple chunks match from the same file, later results are penalized to surface matches from different files first.

Result fields:

Field	Description
`results`	Array of matching code chunks, sorted by relevance and diversified across files
`results[].file_path`	Path to the file in the repository
`results[].score`	Relevance score from 0 to 1 (higher is better)
`results[].language`	Detected programming language
`results[].start_line`	Start line of the matched chunk
`results[].end_line`	End line of the matched chunk
`results[].snippet`	The actual code content
`results[].context_before`	(only with `expand_context: true`) ~20 lines of code before the snippet
`results[].context_after`	(only with `expand_context: true`) ~20 lines of code after the snippet

Error Responses

Status	Description
`404`	Ref not found, or no blobs indexed for this version.
`202`	Indexing is in progress. Retry after it completes.
`503`	Semantic search is not configured on the server.

Search by Commit SHA

You can search code as it existed at any indexed commit:

{
  "q": "database connection pooling",
  "ref": "8b7d3153212bdb9affd9a969cbdb2c17608aab87"
}

This returns results matching the state of the codebase at that specific commit, not the current branch HEAD. Vectors are content-addressed — no re-indexing needed.

Important: searching by commit SHA works for any commit whose files have been indexed. Once a branch is indexed, all commits in its history become searchable — old blob vectors are never deleted. However, if a file existed only in a commit before the first indexing and has since been completely rewritten, the old version's vectors won't be in the index. In that case, results for that commit will be partial (only files whose content was seen during any indexing run).

Delete Index

DELETE /v1/repos/:slug/index

Permission: Write access required.

With no body: deletes all vectors for the repo and all DB tracking records.

With branch in body: deletes only the DB tracking record (vectors are shared and remain).

{
  "branch": "feature-x"
}

Response `200`

{
  "deleted": true,
  "vectors_deleted": true
}

Auto-Indexing

Set auto_index: true when creating a repository to automatically index new commits:

{
  "slug": "my-app",
  "auto_index": true
}

When enabled, every commit triggers incremental indexing — only changed files are re-embedded. Vectors are content-addressed by blob SHA, so identical content across branches/commits is never duplicated.

SDK Examples

const git = createCoregitClient({ apiKey: "cgk_live_..." });

// 1. Trigger indexing
await git.search.triggerIndex("my-app", { branch: "main" });

// 2. Wait for indexing to complete
let status;
let attempts = 0;
do {
  const { data } = await git.search.indexStatus("my-app", "main");
  status = data?.status;
  if (status === "indexing" || status === "pending") {
    if (++attempts > 60) throw new Error("Indexing timed out");
    await new Promise(r => setTimeout(r, 2000));
  }
} while (status === "indexing" || status === "pending");

// 3. Search current branch
const { data } = await git.search.semantic("my-app", {
  q: "function that handles user login and session creation",
  ref: "main",
  top_k: 5,
});

for (const result of data.results) {
  console.log(`${result.file_path} (score: ${result.score.toFixed(2)})`);
  console.log(`  Lines ${result.start_line}-${result.end_line}`);
}

// 4. Search at a specific commit (version-aware)
const { data: oldVersion } = await git.search.semantic("my-app", {
  q: "authentication middleware",
  ref: "8b7d3153212bdb9affd9a969cbdb2c17608aab87",
});

// 5. Filter by language and path
const { data: tsResults } = await git.search.semantic("my-app", {
  q: "database connection pooling",
  ref: "feature-branch",
  language: "typescript",
  path_pattern: "src/**/*.ts",
});

// 6. Expand context around matches
const { data: withContext } = await git.search.semantic("my-app", {
  q: "error handling middleware",
  ref: "main",
  top_k: 5,
  expand_context: true,
});

for (const result of withContext.results) {
  if (result.context_before) console.log("--- before ---\n", result.context_before);
  console.log("--- match ---\n", result.snippet);
  if (result.context_after) console.log("--- after ---\n", result.context_after);
}

Semantic SearchCopy MarkdownOpen

On this page

Semantic Search