PDF RAG Assistant
A retrieval-augmented generation system that ingests PDF documents, indexes them in a persistent Chroma vector store, and answers questions using hybrid search, reranking, and strict context-only prompting to reduce hallucinations. The stack pairs a FastAPI backend for ingestion and query APIs with a Streamlit chat UI, OpenAI embeddings and chat models, and optional BM25 keyword retrieval fused with semantic search.
Year
Features
Technologies
- ·Balancing retrieval breadth with precision for diverse PDFs
- ·Keeping answers strictly grounded while remaining helpful
- ·Managing first-load latency for embedding and reranker models
- ·Used hybrid fusion plus reranking to tighten context before generation
- ·Enforced strict system prompts and “I don’t know” fallbacks
- ·Structured modular code (ingestion, retriever, reranker, LLM) for clarity
Key Features
PDF upload, chunking, deduplication, and persistent Chroma indexing
LLM query expansion and hybrid retrieval with document filters
Cross-encoder reranking for top-k context selection
Streaming answers with source citations (file and page)
Anti-hallucination prompts with explicit insufficient-context handling
REST API plus interactive Streamlit frontend
Technologies
Learnings
- →
Designed end-to-end RAG pipelines from PDF ingestion to streamed responses
- →
Combined vector (MMR) and BM25 retrieval for stronger recall
- →
Applied reranking to improve context quality before LLM generation
- →
Practiced production-minded API design, config, and structured logging
Highlights
Hybrid Retrieval
Cross-Encoder Rerank
Grounded Answers
Next Project