AI / ML2026completed

PDF RAG Assistant

A retrieval-augmented generation system that ingests PDF documents, indexes them in a persistent Chroma vector store, and answers questions using hybrid search, reranking, and strict context-only prompting to reduce hallucinations. The stack pairs a FastAPI backend for ingestion and query APIs with a Streamlit chat UI, OpenAI embeddings and chat models, and optional BM25 keyword retrieval fused with semantic search.

PythonFastAPIStreamlitRAGChromaDBOpenAI
View on GitHub
2018

Year

0

Features

0

Technologies

Problem
  • ·Balancing retrieval breadth with precision for diverse PDFs
  • ·Keeping answers strictly grounded while remaining helpful
  • ·Managing first-load latency for embedding and reranker models
Solution
  • ·Used hybrid fusion plus reranking to tighten context before generation
  • ·Enforced strict system prompts and “I don’t know” fallbacks
  • ·Structured modular code (ingestion, retriever, reranker, LLM) for clarity

Key Features

PDF upload, chunking, deduplication, and persistent Chroma indexing

LLM query expansion and hybrid retrieval with document filters

Cross-encoder reranking for top-k context selection

Streaming answers with source citations (file and page)

Anti-hallucination prompts with explicit insufficient-context handling

REST API plus interactive Streamlit frontend

Technologies

PythonFastAPIStreamlitChromaDBOpenAI APILangChain-style pipelinesBM25Hugging Face (reranker)Pydantic

Learnings

  • Designed end-to-end RAG pipelines from PDF ingestion to streamed responses

  • Combined vector (MMR) and BM25 retrieval for stronger recall

  • Applied reranking to improve context quality before LLM generation

  • Practiced production-minded API design, config, and structured logging

Highlights

Hybrid Retrieval

Cross-Encoder Rerank

Grounded Answers

Next Project

MicroBankingSystem Backend

Robust backend system for banking operations with database architecture. Group project involving microservices and database design.