My Vault

❯

llm

Dec 26, 20252 min read

cse

Large Language Model

Concepts

Prompt engineering
Tokens used by a model: vocabulary
Autoregression, takes a most likely next token, append it to the prompt, and run it again to get a new token.
Sampling tokens from possible outputs
Temperature, 0 will cause a semi-deterministic output, 1 will uniformly sample the choices according to probability distribution. Model output deteriorates at high temperature since the gibberish formed a pattern.
Transformer architecture
Fine-tuning process
Agent
Another way to look at LLM: not just auto-completers, but highly effective, neural network powered classifier, at each token. LLM also performs better when used in this way. (e.g. tool calling in agent)

History

Markov model of natural language introduced by Shannon in 1948. A Mathematical Theory of Communication, which also introduced the concept of information entropy.
seq2seq architecture, encoder + decoder + thought vector, recurrent design. ChallengeL thought vector is fixed and finite.
[@bahdanauNeuralMachineTranslation2016] introduced preserving all the hidden state vectors for encoder to “soft search”.
[@vaswaniAttentionAllYou2017] Attention is All You Need introduced transformer architecture, removed recurrent circuitry.
[@radfordImprovingLanguageUnderstanding2018] proposed generative pre-trained transformer - GPT architecture, basically transformer with encoder ripped off. Pre-training on unlabelled text with fine-tuning for specific tasks worked pretty well.
GPT-2 increased training set and model size, making it multitask learner.
GPT-3 saw another order-of-magnitude increase in model size and training set. [@brownLanguageModelsAre2020] - language models are few shot learners, the start of prompt-engineering.

Resources

LiteLLM, offers unified API to different models.
OpenRouter, one API key to different models.
Artificial Analysis, an independent site that rates the performance of models.

Graph View

Large Language Model
Concepts
History
Resources

Backlinks

agent
prompt

Created with Quartz v4.5.2 © 2026

GitHub
Homepage