Build a RAG Pipeline in C# — From Zero to Hybrid Search
Retrieval-Augmented Generation (RAG) is how you make an LLM answer questions about your data without paying to fine-tune it. You take a question, find the most relevant chunks of your documents, and stuff them into the prompt as context.
This post walks through building a production-grade RAG pipeline in C# — from a flat file on disk to a working Ask my docs system with hybrid retrieval that beats simple semantic search on real-world queries.
What you're building
By the end of this post you'll have a working C# program that:
- Loads a folder of documents
- Chunks them intelligently
- Generates embeddings using Ollama (or OpenAI, your choice)
- Stores them in a vector index
- Answers questions using both dense semantic search and BM25 keyword search, fused with Reciprocal Rank Fusion
That last piece — hybrid search — is what separates toy RAG demos from systems that actually work in production.
Why naive RAG fails
The standard RAG tutorial looks like this:
- Chunk documents
- Embed chunks with OpenAI
- Store in a vector DB
- At query time, embed the question, find nearest neighbors, send to LLM
This works great for queries like "how does our authentication flow work?" — questions where semantic similarity matches.
It fails on queries like:
- "What does error code SKU-4421 mean?" — the embedding for "SKU-4421" looks like the embedding for any other product code. Dense retrieval will find documents about similar codes, not the actual one.
- "Who wrote RFC 2616?" — same problem with proper nouns.
- "What are the requirements for ISO 27001?" — dense retrieval often surfaces documents about other ISO standards.
For exact matches — codes, IDs, RFC numbers, error messages, function names — you need keyword retrieval. BM25 is the gold-standard algorithm here. Combine BM25 with dense retrieval and you get the best of both worlds.
Step 1: Install LogicGrid
dotnet add package LogicGrid.Core
dotnet add package LogicGrid.Memory
dotnet add package LogicGrid.Rag
Pull an embedding model:
ollama pull nomic-embed-text
Step 2: A minimal RAG pipeline
using LogicGrid.Memory.Embeddings;
using LogicGrid.Memory.VectorStores;
using LogicGrid.Rag;
var embedder = new OllamaEmbeddingClient("nomic-embed-text");
var store = new InMemoryVectorStore();
var pipeline = new RagPipeline(embedder, store);
// Ingest documents — chunked and embedded automatically
await pipeline.IngestAsync("./docs/handbook.md");
await pipeline.IngestAsync("./docs/architecture.md");
// Search
var hits = await pipeline.SearchAsync(
"How do I deploy the API to production?", topK: 5);
foreach (var hit in hits)
Console.WriteLine($"[{hit.Score:F3}] {hit.Document.Text.Substring(0, 100)}...");
That's the entire program. The RagPipeline handles chunking (using a sensible default — split on paragraphs, max ~500 tokens per chunk with overlap), embedding, and storage.
Step 3: Adding generation
To turn search results into answers, wrap an agent around the pipeline:
using LogicGrid.Core.Agents;
using LogicGrid.Core.Llm;
var llm = LlmClientBase.Ollama("llama3.2");
IAgent agent = new RagAgent(llm, pipeline);
var answer = await agent.RunAsync(
"How do I deploy the API to production?",
new AgentContext("deploy-question"));
Console.WriteLine(answer);
RagAgent retrieves the top chunks, formats them into a prompt with your question, and calls the LLM. You get a coherent answer with cited sources.
Step 4: Upgrade to hybrid search
The InMemoryVectorStore does dense retrieval only. For production you want hybrid. LogicGrid's HybridVectorStore is a drop-in replacement:
using LogicGrid.Memory.Search;
var store = new HybridVectorStore(new InMemoryVectorStore());
var pipeline = new RagPipeline(embedder, store);
await pipeline.IngestAsync("./docs/handbook.md");
// Hybrid search: dense + BM25 + RRF fusion
var results = await pipeline.HybridSearchAsync(
"What does error code SKU-4421 mean?", topK: 5);
foreach (var r in results)
{
Console.WriteLine(
$"RRF={r.RrfScore:F4} dense={r.DenseScore:F3} sparse={r.SparseScore:F3}");
Console.WriteLine($" {r.Document.Text.Substring(0, 100)}...");
}
You'll see scores from both retrievers. Documents that score well on both rank higher than documents that score well on only one. The fusion algorithm is Reciprocal Rank Fusion (RRF) — parameter-free, well-studied, and consistently beats either retriever alone.
What hybrid search actually does
Under the hood, HybridSearchAsync does this:
- Embeds your query and runs cosine similarity over the vector index — returns the top N candidates by semantic match.
- Tokenizes your query and runs BM25 over an inverted index — returns the top N candidates by keyword match.
- Combines the two ranked lists using RRF: each document's score is
1/(60 + rank_dense) + 1/(60 + rank_sparse). - Re-sorts by RRF score and returns the top K.
If a document appears in both lists, it gets a substantial boost. If it appears in only one, it can still rank if its position in that list is high.
The implementation is pure C# — no native dependencies, no extra NuGets. See the hybrid search docs for the full math.
Real benchmarks
We compared dense-only vs hybrid on three query types over a 10,000-document corpus of technical documentation:
| Query type | Dense only (recall@5) | Hybrid (recall@5) |
|---|---|---|
| Conceptual ("how do I X?") | 0.82 | 0.85 |
| Codes / IDs ("SKU-4421") | 0.21 | 0.94 |
| Proper nouns | 0.40 | 0.88 |
The takeaway: hybrid never hurts on conceptual queries (because the dense retriever still drives those), and it dramatically improves the hard cases.
Step 5: Production considerations
The in-memory store works fine up to a few hundred thousand vectors. Past that you want a real vector database. LogicGrid's IVectorStore interface is small — wrap your favorite (Qdrant, pgvector, Weaviate, Pinecone) and plug it in:
public sealed class QdrantVectorStore : IVectorStore
{
public async Task UpsertAsync(VectorDocument doc, CancellationToken ct = default) { /* ... */ }
public async Task<IList<VectorSearchResult>> SearchAsync(
float[] query, int topK, float minScore, CancellationToken ct = default) { /* ... */ }
// ...
}
var store = new HybridVectorStore(new QdrantVectorStore(...));
HybridVectorStore wraps any IVectorStore and adds BM25 indexing on top. You get hybrid retrieval over your existing vector DB without rewriting anything.
Step 6: Chunking strategies that matter
The default chunker splits on paragraphs and merges to ~500 tokens. That's a reasonable starting point but rarely optimal. Things to consider:
- Code-heavy docs: split on functions, not paragraphs. Code cares about scope.
- Markdown docs: respect heading boundaries. A chunk that crosses two H2s loses semantic coherence.
- Tables: never split a table across chunks. The retriever will return half a table and the LLM will hallucinate the rest.
- Overlap: keep ~10–15% overlap between adjacent chunks. Prevents the case where the answer straddles a chunk boundary.
LogicGrid lets you swap chunking strategies via IChunkingStrategy. See the chunking docs for built-in options and how to write your own.
Step 7: Evaluation
The hardest part of RAG isn't building it — it's knowing whether it's any good. A simple eval set goes a long way:
var evalQuestions = new (string Question, string ExpectedDocId)[]
{
("How do I deploy?", "deployment.md#prod"),
("What is SKU-4421?", "errors.md#sku-4421"),
// ...
};
int hits = 0;
foreach (var (q, expected) in evalQuestions)
{
var results = await pipeline.HybridSearchAsync(q, topK: 5);
if (results.Any(r => r.Document.Id == expected))
hits++;
}
Console.WriteLine($"Recall@5: {(double)hits / evalQuestions.Length:P0}");
Run this on every change to your chunking, embedding model, or retrieval strategy. You'll catch regressions before users do.
Going further
- RAG pipeline reference
- Hybrid search internals
- Document loaders — markdown, PDF, code, web
- Memory and RAG concepts
If you're new to LogicGrid, the Quickstart gets you running in five minutes.
For a comparison with other .NET frameworks, see LangChain vs Semantic Kernel vs LogicGrid.
