My Vault

❯

embedding

Sep 09, 20251 min read

cse/ds

Vector Embeddings

Word embedding | Wikipedia
Any text can have embeddings, not just words. e.g. snippets in RAG.

Intro

Recommendation Systems, NLP, Computer Vision, Gen AI, LLM, etc are all based on vector embeddings.
Items put into an embedding space.
Making recommendations based on distance between vectors, e.g. cosine distance $\frac{A \cdot B}{∥ A ∥∥ B ∥} = \frac{i = 1 \sum n A _{i} B _{i}}{i \sum n A _{i}^{2} i \sum n B _{i}^{2}}$
Examples of embedding dimensions: male-female, verb tense, country-capital
Latent Space, aka Embedding Space
Word Embedding Models
- Word2Vec
- GloVe
- BERT
- GPT
- VGGNet (image)
- GoogLeNet (image)
Neural Networks to obtain embedding

Models

Traditional methods
- One-hot encoding, high dimensionality
- Bag-of-Words (BoW), also high dimensionality
- TF-IDF
- N-grams
Statistical models
- LSA/LSI, a pioneering
- pLSA
- ~~LDA, Latent Dirichlet Allocation~~ This is a topic-modelling tool!
Word Embeddings (may need pooling, either by averaging or TF-IDF weighted averaging)
- Word2Vec
- GloVe
- FastText (extends Word2Vec, with support for subwords)
Extends on word embeddings
- Doc2Vec (based on Word2Vec)
Transformers
- MiniLM (inspired by BERT)
- USE (Universal Sentence Encoder)
- BERT (or DistilBERT) — not ideal for embedding
- RoBERTa
- OpenAI embedding

Graph View

Vector Embeddings
Intro
Models

Backlinks

classification
data-science
nlp
rag

Created with Quartz v4.5.2 © 2026

GitHub
Homepage