Session 3: RAG, Retrieval & Generation
AILLMInformation Retrieval

Session 3: RAG, Retrieval & Generation

Presenters

Masih MoloodianYasin FakharMohammad Amin Dadgar

RAG: Retrieval & Generation

The meeting focused on Retrieval Augmented Generation (RAG), a solution to address the limitation of outdated knowledge in Large Language Models (LLMs). The RAG pipeline begins with data ingestion, where emphasis is placed on maintaining data quality, implementing effective chunking strategies, and ensuring scalability through automation. The process is enhanced by using metadata to improve retrieval precision, creating a foundation for efficient information retrieval.

The retrieval component forms a crucial bridge in the RAG pipeline, operating through various mechanisms including dense vector retrieval (embedding-based), sparse retrieval (keyword-based), and hybrid approaches. The process involves converting user queries into vector embeddings, searching through an indexed knowledge base, and using similarity metrics (such as cosine similarity or Euclidean distance) to find relevant information. This can be conceptualized similar to a library search system, where queries match with tagged books to find relevant resources.

The generation phase takes the retrieved context alongside the original query to produce responses that are relevant, factually grounded, and contextually appropriate. This can be accomplished through various techniques including template-based generation, neural generation using LLMs, careful prompt engineering, and domain-specific fine-tuning. The entire pipeline's effectiveness is evaluated through multiple methods, focusing on faithfulness through embedding-based similarity, token overlap comparisons, and LLM-based evaluations that rate the alignment between the generated responses and the provided context.