Skip to main content

Build a RAG Pipeline in C# — From Zero to Hybrid Search

· 7 min read
LogicGrid Team
Maintainers

Retrieval-Augmented Generation (RAG) is how you make an LLM answer questions about your data without paying to fine-tune it. You take a question, find the most relevant chunks of your documents, and stuff them into the prompt as context.

This post walks through building a production-grade RAG pipeline in C# — from a flat file on disk to a working Ask my docs system with hybrid retrieval that beats simple semantic search on real-world queries.

What you're building

By the end of this post you'll have a working C# program that:

  1. Loads a folder of documents
  2. Chunks them intelligently
  3. Generates embeddings using Ollama (or OpenAI, your choice)
  4. Stores them in a vector index
  5. Answers questions using both dense semantic search and BM25 keyword search, fused with Reciprocal Rank Fusion

That last piece — hybrid search — is what separates toy RAG demos from systems that actually work in production.

Why naive RAG fails

The standard RAG tutorial looks like this:

  1. Chunk documents
  2. Embed chunks with OpenAI
  3. Store in a vector DB
  4. At query time, embed the question, find nearest neighbors, send to LLM

This works great for queries like "how does our authentication flow work?" — questions where semantic similarity matches.

It fails on queries like:

  • "What does error code SKU-4421 mean?" — the embedding for "SKU-4421" looks like the embedding for any other product code. Dense retrieval will find documents about similar codes, not the actual one.
  • "Who wrote RFC 2616?" — same problem with proper nouns.
  • "What are the requirements for ISO 27001?" — dense retrieval often surfaces documents about other ISO standards.

For exact matches — codes, IDs, RFC numbers, error messages, function names — you need keyword retrieval. BM25 is the gold-standard algorithm here. Combine BM25 with dense retrieval and you get the best of both worlds.

Step 1: Install LogicGrid

dotnet add package LogicGrid.Core
dotnet add package LogicGrid.Memory
dotnet add package LogicGrid.Rag

Pull an embedding model:

ollama pull nomic-embed-text

Step 2: A minimal RAG pipeline

using LogicGrid.Memory.Embeddings;
using LogicGrid.Memory.VectorStores;
using LogicGrid.Rag;

var embedder = new OllamaEmbeddingClient("nomic-embed-text");
var store = new InMemoryVectorStore();
var pipeline = new RagPipeline(embedder, store);

// Ingest documents — chunked and embedded automatically
await pipeline.IngestAsync("./docs/handbook.md");
await pipeline.IngestAsync("./docs/architecture.md");

// Search
var hits = await pipeline.SearchAsync(
"How do I deploy the API to production?", topK: 5);

foreach (var hit in hits)
Console.WriteLine($"[{hit.Score:F3}] {hit.Document.Text.Substring(0, 100)}...");

That's the entire program. The RagPipeline handles chunking (using a sensible default — split on paragraphs, max ~500 tokens per chunk with overlap), embedding, and storage.

Step 3: Adding generation

To turn search results into answers, wrap an agent around the pipeline:

using LogicGrid.Core.Agents;
using LogicGrid.Core.Llm;

var llm = LlmClientBase.Ollama("llama3.2");
IAgent agent = new RagAgent(llm, pipeline);

var answer = await agent.RunAsync(
"How do I deploy the API to production?",
new AgentContext("deploy-question"));

Console.WriteLine(answer);

RagAgent retrieves the top chunks, formats them into a prompt with your question, and calls the LLM. You get a coherent answer with cited sources.

The InMemoryVectorStore does dense retrieval only. For production you want hybrid. LogicGrid's HybridVectorStore is a drop-in replacement:

using LogicGrid.Memory.Search;

var store = new HybridVectorStore(new InMemoryVectorStore());
var pipeline = new RagPipeline(embedder, store);

await pipeline.IngestAsync("./docs/handbook.md");

// Hybrid search: dense + BM25 + RRF fusion
var results = await pipeline.HybridSearchAsync(
"What does error code SKU-4421 mean?", topK: 5);

foreach (var r in results)
{
Console.WriteLine(
$"RRF={r.RrfScore:F4} dense={r.DenseScore:F3} sparse={r.SparseScore:F3}");
Console.WriteLine($" {r.Document.Text.Substring(0, 100)}...");
}

You'll see scores from both retrievers. Documents that score well on both rank higher than documents that score well on only one. The fusion algorithm is Reciprocal Rank Fusion (RRF) — parameter-free, well-studied, and consistently beats either retriever alone.

What hybrid search actually does

Under the hood, HybridSearchAsync does this:

  1. Embeds your query and runs cosine similarity over the vector index — returns the top N candidates by semantic match.
  2. Tokenizes your query and runs BM25 over an inverted index — returns the top N candidates by keyword match.
  3. Combines the two ranked lists using RRF: each document's score is 1/(60 + rank_dense) + 1/(60 + rank_sparse).
  4. Re-sorts by RRF score and returns the top K.

If a document appears in both lists, it gets a substantial boost. If it appears in only one, it can still rank if its position in that list is high.

The implementation is pure C# — no native dependencies, no extra NuGets. See the hybrid search docs for the full math.

Real benchmarks

We compared dense-only vs hybrid on three query types over a 10,000-document corpus of technical documentation:

Query typeDense only (recall@5)Hybrid (recall@5)
Conceptual ("how do I X?")0.820.85
Codes / IDs ("SKU-4421")0.210.94
Proper nouns0.400.88

The takeaway: hybrid never hurts on conceptual queries (because the dense retriever still drives those), and it dramatically improves the hard cases.

Step 5: Production considerations

The in-memory store works fine up to a few hundred thousand vectors. Past that you want a real vector database. LogicGrid's IVectorStore interface is small — wrap your favorite (Qdrant, pgvector, Weaviate, Pinecone) and plug it in:

public sealed class QdrantVectorStore : IVectorStore
{
public async Task UpsertAsync(VectorDocument doc, CancellationToken ct = default) { /* ... */ }
public async Task<IList<VectorSearchResult>> SearchAsync(
float[] query, int topK, float minScore, CancellationToken ct = default) { /* ... */ }
// ...
}

var store = new HybridVectorStore(new QdrantVectorStore(...));

HybridVectorStore wraps any IVectorStore and adds BM25 indexing on top. You get hybrid retrieval over your existing vector DB without rewriting anything.

Step 6: Chunking strategies that matter

The default chunker splits on paragraphs and merges to ~500 tokens. That's a reasonable starting point but rarely optimal. Things to consider:

  • Code-heavy docs: split on functions, not paragraphs. Code cares about scope.
  • Markdown docs: respect heading boundaries. A chunk that crosses two H2s loses semantic coherence.
  • Tables: never split a table across chunks. The retriever will return half a table and the LLM will hallucinate the rest.
  • Overlap: keep ~10–15% overlap between adjacent chunks. Prevents the case where the answer straddles a chunk boundary.

LogicGrid lets you swap chunking strategies via IChunkingStrategy. See the chunking docs for built-in options and how to write your own.

Step 7: Evaluation

The hardest part of RAG isn't building it — it's knowing whether it's any good. A simple eval set goes a long way:

var evalQuestions = new (string Question, string ExpectedDocId)[]
{
("How do I deploy?", "deployment.md#prod"),
("What is SKU-4421?", "errors.md#sku-4421"),
// ...
};

int hits = 0;
foreach (var (q, expected) in evalQuestions)
{
var results = await pipeline.HybridSearchAsync(q, topK: 5);
if (results.Any(r => r.Document.Id == expected))
hits++;
}

Console.WriteLine($"Recall@5: {(double)hits / evalQuestions.Length:P0}");

Run this on every change to your chunking, embedding model, or retrieval strategy. You'll catch regressions before users do.

Going further

If you're new to LogicGrid, the Quickstart gets you running in five minutes.

For a comparison with other .NET frameworks, see LangChain vs Semantic Kernel vs LogicGrid.