Retrieval-Augmented Generation

  • Search for relevant info based on similarity (e.g. Jaccard similarity)

Lexical Retrieval

  • Jaccard similarity
    • Stop words removal and stemming
    • Down side: common word and less common word with more specific meaning are given the same emphasis
    • Upside: easy to implement and runs fast
  • Frequency-inverse document frequency (TF*IDF) and BM25 take word importance into consideration.
  • Lexical retrieval still powers most of online searches
    • Algolia
    • Elasticsearch

Neural Retrieval

  • Snippetize documents (by paragraphs, or by running windows)
  • Run snippets through embedding models, and store in vector-db.
  • Perform vector lookup to find relevant snippets.

Pipeline

  • Obtain dynamic and static context, along with user instructions.
  • Further search for relevant context (preferably with neural retrieval, through vector-db)
  • Assemble the prompt.
  • Feed the prompt to LLM.