Prompt Engineering for LLM

Prompting steps
- Context retrieval
- Snippetizing context
- Scoring and prioritizing snippets (some may need to be dropped)
- Prompt assembly
Tools like DSPy can be used to optimize prompt construction.

About the Prompt

truth-bias in the prompt content.
LLMs are all about completing a document.
Putting user content inside the system message will give users a chance to override the system message.
Criteria for prompt (for completion models)
- Should be similar to texts that the LLM is trained on
- Should contain all information needed to complete
- Should lead to a solution
- Should have a clear stop
Dos and don’ts
- Prefer dos over don’ts
- Give reason for instruction (thou shall not kill because…)
Few-shot prompting
- Usually much easier than instruction based prompting, since LLMs are good at following examples. But this is also limited.
- Does not scale well when context is big (long examples or too many examples)
- And anchor the model in an unexpected way (biased), especially biasing towards edge cases (assume them to be as common as typical cases)
- Can suggest spurious patterns, such as any sorting order — you never know what pattern is extrapolated by the LLM.
- Try to make the model “believe” that it has solved a few of the problems successfully before.

Context

Latency matters. It’s best to pick as much context as we could then whittle them, thus context items should be comparable in terms of their value.
Brainstorm with mind map to find potential context items.
Two dimensions: proximity from user and stability and context
Irrelevant information should be avoided: Chekov’s Gun fallacy, LLM will try to reason hard to make sense of all info. Use RAG for context.
Summarization is needed when context is too long
- Summarize summaries if content exceeds context window.
- Recursive summaries: summarize at sections level, then at chapter level, then book level.
- “Rumor problem”: model could misunderstand things in summarization.
- Summarization is lossy, ask for summary with the final application task in mind. Specific summaries are good but can’t be shared among different use cases.

Assembly of Prompt

Constraints
- In-context learning: the closer the information is to the end of the prompt, the more impact it has on the model.
- The lost middle phenomenon: the model can easily recall the beginning and end of the prompt, but struggles with information in the middle.
Structure
- Introduction: guiding the focus of the LLM from the very beginning.
- Valley of Meh: the content in this valley are of reduced impact.
- Context
- Refocus: necessary for longer prompts to bring the model’s attention back to the question itself. e.g. “Based on the given information, I am ready to answer the question regarding…”
- Transition: e.g. “The answer is…” In some model, this is implied by a question mark.
Chat vs Completion model
- Chat model benefit from natural multi-round interactive problem solving.
- Completion model avoids some unhelpful traits from RLHF, and allows inception, where we dictate the beginning of the answer.
Document types
- Dialogues: freeform text, transcript, marker-less, structured.
- Analytic Report: preferably in Markdown format, with an ## Idea monologue section that can be ignored (chain-of-thought prompting), the ## Conclusion section is the actual output, and ## Further Reading can be treated as a marker for end of response.
- Structured Document: XML, YAML, JSON, etc.
Elastic snippet: given limited context window, create multiple versions of a context snippet, and place the biggest snippet that fit in into the final prompt.
Relationship between (sub) prompts
- Position
- Importance, assessed with scores or tiers.
- Dependency, e.g. requirements and incompatibilities for snippets.
A prompt crafting engine: respects the constraints, uses some algorithm (e.g. additive/subtractive greedy algorithm) to pick snippets, then reconstruct the prompt according to the position.

Completion/Response

Preamble
- Structural boilerplate: can be eliminated through prompting.
- Reasoning: desirable with chain-of-thought prompting.
- Fluff: should be avoided. e.g. “Please reply in the following format: 1. result 1, result 2, …, result n; 2. Disclaimers (if any); 3. Background and explanation (if any).”
Postscript: To detect the end of the actual answer, with the use of stop sequences and ending the stream.
Recognizing Start and End
Logprob: averaged logprobs is an indicator for confidence level of the response or quality.

Techniques

Chain-of-thought (CoT)
ReAct
Reflexion, run another analysis when applying the output of LLM. Can be traditional, can be with LLM (LLM-as-judge).
Agentic usage, including tool calling and reasoning.
Frameworks such as DSPy and TextGrad can be used to improve the prompts given I/O examples.

LLM as Classifier

When used as classifier, it’s important to make sure options all start with different tokens. Otherwise, the model will favor the options sharing common prefixes, as their logprobs add up.
Calibrate the model by shifting the logprob by a constant, if needed. For example, only answer No if it’s quite certain. The constant can be found by experimenting or by minimizing the cross entropy loss, as we do in logistic-regression.

My Vault

Explorer

prompt

Prompt Engineering for LLM

About the Prompt

Context

Assembly of Prompt

Completion/Response

Techniques

LLM as Classifier

Graph View

Table of Contents

Backlinks