Build a RAG Pipeline in C# — From Zero to Hybrid Search

Name: LogicGrid
Author: LogicGrid

April 1, 2026 · 7 min read

Maintainers

Retrieval-Augmented Generation (RAG) is how you make an LLM answer questions about your data without paying to fine-tune it. You take a question, find the most relevant chunks of your documents, and stuff them into the prompt as context.

This post walks through building a production-grade RAG pipeline in C# — from a flat file on disk to a working Ask my docs system with hybrid retrieval that beats simple semantic search on real-world queries.

What you're building

By the end of this post you'll have a working C# program that:

Loads a folder of documents
Chunks them intelligently
Generates embeddings using Ollama (or OpenAI, your choice)
Stores them in a vector index
Answers questions using both dense semantic search and BM25 keyword search, fused with Reciprocal Rank Fusion

That last piece — hybrid search — is what separates toy RAG demos from systems that actually work in production.

Why naive RAG fails

The standard RAG tutorial looks like this:

Chunk documents
Embed chunks with OpenAI
Store in a vector DB
At query time, embed the question, find nearest neighbors, send to LLM

This works great for queries like "how does our authentication flow work?" — questions where semantic similarity matches.

It fails on queries like:

"What does error code SKU-4421 mean?" — the embedding for "SKU-4421" looks like the embedding for any other product code. Dense retrieval will find documents about similar codes, not the actual one.
"Who wrote RFC 2616?" — same problem with proper nouns.
"What are the requirements for ISO 27001?" — dense retrieval often surfaces documents about other ISO standards.

For exact matches — codes, IDs, RFC numbers, error messages, function names — you need keyword retrieval. BM25 is the gold-standard algorithm here. Combine BM25 with dense retrieval and you get the best of both worlds.

Step 1: Install LogicGrid

dotnet add package LogicGrid.Core
dotnet add package LogicGrid.Memory
dotnet add package LogicGrid.Rag

Pull an embedding model:

ollama pull nomic-embed-text

Step 2: A minimal RAG pipeline

using LogicGrid.Memory.Embeddings;
using LogicGrid.Memory.VectorStores;
using LogicGrid.Rag;

var embedder = new OllamaEmbeddingClient("nomic-embed-text");
var store    = new InMemoryVectorStore();
var pipeline = new RagPipeline(embedder, store);

// Ingest documents — chunked and embedded automatically
await pipeline.IngestAsync("./docs/handbook.md");
await pipeline.IngestAsync("./docs/architecture.md");

// Search
var hits = await pipeline.SearchAsync(
    "How do I deploy the API to production?", topK: 5);

foreach (var hit in hits)
    Console.WriteLine($"[{hit.Score:F3}] {hit.Document.Text.Substring(0, 100)}...");

That's the entire program. The RagPipeline handles chunking (using a sensible default — split on paragraphs, max ~500 tokens per chunk with overlap), embedding, and storage.

Step 3: Adding generation

To turn search results into answers, wrap an agent around the pipeline:

using LogicGrid.Core.Agents;
using LogicGrid.Core.Llm;

var llm = LlmClientBase.Ollama("llama3.2");
IAgent agent = new RagAgent(llm, pipeline);

var answer = await agent.RunAsync(
    "How do I deploy the API to production?",
    new AgentContext("deploy-question"));

Console.WriteLine(answer);

RagAgent retrieves the top chunks, formats them into a prompt with your question, and calls the LLM. You get a coherent answer with cited sources.

Step 4: Upgrade to hybrid search

The InMemoryVectorStore does dense retrieval only. For production you want hybrid. LogicGrid's HybridVectorStore is a drop-in replacement:

using LogicGrid.Memory.Search;

var store = new HybridVectorStore(new InMemoryVectorStore());
var pipeline = new RagPipeline(embedder, store);

await pipeline.IngestAsync("./docs/handbook.md");

// Hybrid search: dense + BM25 + RRF fusion
var results = await pipeline.HybridSearchAsync(
    "What does error code SKU-4421 mean?", topK: 5);

foreach (var r in results)
{
    Console.WriteLine(
        $"RRF={r.RrfScore:F4}  dense={r.DenseScore:F3}  sparse={r.SparseScore:F3}");
    Console.WriteLine($"  {r.Document.Text.Substring(0, 100)}...");
}

You'll see scores from both retrievers. Documents that score well on both rank higher than documents that score well on only one. The fusion algorithm is Reciprocal Rank Fusion (RRF) — parameter-free, well-studied, and consistently beats either retriever alone.

What hybrid search actually does

Under the hood, HybridSearchAsync does this:

Embeds your query and runs cosine similarity over the vector index — returns the top N candidates by semantic match.
Tokenizes your query and runs BM25 over an inverted index — returns the top N candidates by keyword match.
Combines the two ranked lists using RRF: each document's score is 1/(60 + rank_dense) + 1/(60 + rank_sparse).
Re-sorts by RRF score and returns the top K.

If a document appears in both lists, it gets a substantial boost. If it appears in only one, it can still rank if its position in that list is high.

The implementation is pure C# — no native dependencies, no extra NuGets. See the hybrid search docs for the full math.

Real benchmarks

We compared dense-only vs hybrid on three query types over a 10,000-document corpus of technical documentation:

Query type	Dense only (recall@5)	Hybrid (recall@5)
Conceptual ("how do I X?")	0.82	0.85
Codes / IDs ("SKU-4421")	0.21	0.94
Proper nouns	0.40	0.88

The takeaway: hybrid never hurts on conceptual queries (because the dense retriever still drives those), and it dramatically improves the hard cases.

Step 5: Production considerations

The in-memory store works fine up to a few hundred thousand vectors. Past that you want a real vector database. LogicGrid's IVectorStore interface is small — wrap your favorite (Qdrant, pgvector, Weaviate, Pinecone) and plug it in:

public sealed class QdrantVectorStore : IVectorStore
{
    public async Task UpsertAsync(VectorDocument doc, CancellationToken ct = default) { /* ... */ }
    public async Task<IList<VectorSearchResult>> SearchAsync(
        float[] query, int topK, float minScore, CancellationToken ct = default) { /* ... */ }
    // ...
}

var store = new HybridVectorStore(new QdrantVectorStore(...));

HybridVectorStore wraps any IVectorStore and adds BM25 indexing on top. You get hybrid retrieval over your existing vector DB without rewriting anything.

Step 6: Chunking strategies that matter

The default chunker splits on paragraphs and merges to ~500 tokens. That's a reasonable starting point but rarely optimal. Things to consider:

Code-heavy docs: split on functions, not paragraphs. Code cares about scope.
Markdown docs: respect heading boundaries. A chunk that crosses two H2s loses semantic coherence.
Tables: never split a table across chunks. The retriever will return half a table and the LLM will hallucinate the rest.
Overlap: keep ~10–15% overlap between adjacent chunks. Prevents the case where the answer straddles a chunk boundary.

LogicGrid lets you swap chunking strategies via IChunkingStrategy. See the chunking docs for built-in options and how to write your own.

Step 7: Evaluation

The hardest part of RAG isn't building it — it's knowing whether it's any good. A simple eval set goes a long way:

var evalQuestions = new (string Question, string ExpectedDocId)[]
{
    ("How do I deploy?", "deployment.md#prod"),
    ("What is SKU-4421?", "errors.md#sku-4421"),
    // ...
};

int hits = 0;
foreach (var (q, expected) in evalQuestions)
{
    var results = await pipeline.HybridSearchAsync(q, topK: 5);
    if (results.Any(r => r.Document.Id == expected))
        hits++;
}

Console.WriteLine($"Recall@5: {(double)hits / evalQuestions.Length:P0}");

Run this on every change to your chunking, embedding model, or retrieval strategy. You'll catch regressions before users do.

Going further

RAG pipeline reference
Hybrid search internals
Document loaders — markdown, PDF, code, web
Memory and RAG concepts

If you're new to LogicGrid, the Quickstart gets you running in five minutes.

For a comparison with other .NET frameworks, see LangChain vs Semantic Kernel vs LogicGrid.

What you're building​

Why naive RAG fails​

Step 1: Install LogicGrid​

Step 2: A minimal RAG pipeline​

Step 3: Adding generation​

Step 4: Upgrade to hybrid search​

What hybrid search actually does​

Real benchmarks​

Step 5: Production considerations​

Step 6: Chunking strategies that matter​

Step 7: Evaluation​

Going further​