Scoring and prioritizing snippets (some may need to be dropped)
Prompt assembly
Tools like DSPy can be used to optimize prompt
construction.
About the Prompt
truth-bias in the prompt content.
LLMs are all about completing a document.
Putting user content inside the system message will give users a chance to
override the system message.
Criteria for prompt (for completion models)
Should be similar to texts that the LLM is trained on
Should contain all information needed to complete
Should lead to a solution
Should have a clear stop
Dos and don’ts
Prefer dos over don’ts
Give reason for instruction (thou shall not kill because…)
Few-shot prompting
Usually much easier than instruction based prompting, since LLMs are good at
following examples. But this is also limited.
Does not scale well when context is big (long examples or too many examples)
And anchor the model in an unexpected way (biased), especially biasing
towards edge cases (assume them to be as common as typical cases)
Can suggest spurious patterns, such as any sorting order — you never know
what pattern is extrapolated by the LLM.
Try to make the model “believe” that it has solved a few of the problems
successfully before.
Context
Latency matters. It’s best to pick as much context as we could then whittle
them, thus context items should be comparable in terms of their value.
Brainstorm with mind map to find potential context items.
Two dimensions: proximity from user and stability and context
Irrelevant information should be avoided: Chekov’s Gun fallacy, LLM will try
to reason hard to make sense of all info. Use RAG for context.
Summarization is needed when context is too long
Summarize summaries if content exceeds context window.
Recursive summaries: summarize at sections level, then at chapter level,
then book level.
“Rumor problem”: model could misunderstand things in summarization.
Summarization is lossy, ask for summary with the final application task in
mind. Specific summaries are good but can’t be shared among different use
cases.
Assembly of Prompt
Constraints
In-context learning: the closer the information is to the end of the
prompt, the more impact it has on the model.
The lost middle phenomenon: the model can easily recall the beginning and
end of the prompt, but struggles with information in the middle.
Structure
Introduction: guiding the focus of the LLM from the very beginning.
Valley of Meh: the content in this valley are of reduced impact.
Context
Refocus: necessary for longer prompts to bring the model’s attention back
to the question itself. e.g. “Based on the given information, I am ready to
answer the question regarding…”
Transition: e.g. “The answer is…” In some model, this is implied by a
question mark.
Chat vs Completion model
Chat model benefit from natural multi-round interactive problem solving.
Completion model avoids some unhelpful traits from RLHF, and allows
inception, where we dictate the beginning of the answer.
Analytic Report: preferably in Markdown format, with an
## Idea monologue section that can be ignored (chain-of-thought
prompting), the ## Conclusion section is the actual output, and
## Further Reading can be treated as a marker for end of response.
Structured Document: XML, YAML, JSON, etc.
Elastic snippet: given limited context window, create multiple versions of a
context snippet, and place the biggest snippet that fit in into the final
prompt.
Relationship between (sub) prompts
Position
Importance, assessed with scores or tiers.
Dependency, e.g. requirements and incompatibilities for snippets.
A prompt crafting engine: respects the constraints, uses some algorithm (e.g.
additive/subtractive greedy algorithm) to pick snippets, then reconstruct
the prompt according to the position.
Completion/Response
Preamble
Structural boilerplate: can be eliminated through prompting.
Reasoning: desirable with chain-of-thought prompting.
Fluff: should be avoided. e.g. “Please reply in the following format: 1.
result 1, result 2, …, result n; 2. Disclaimers (if any); 3. Background
and explanation (if any).”
Postscript: To detect the end of the actual answer, with the use of stop
sequences and ending the stream.
Recognizing Start and End
Logprob: averaged logprobs is an indicator for confidence level of the
response or quality.
Techniques
Chain-of-thought (CoT)
ReAct
Reflexion, run another analysis when
applying the output of LLM. Can be traditional, can be with LLM
(LLM-as-judge).
Agentic usage, including tool calling and reasoning.
Frameworks such as DSPy and TextGrad
can be used to improve the prompts given I/O examples.
LLM as Classifier
When used as classifier, it’s important to make sure options all start with
different tokens. Otherwise, the model will favor the options sharing common
prefixes, as their logprobs add up.
Calibrate the model by shifting the logprob by a constant, if needed. For
example, only answer No if it’s quite certain. The constant can be found by
experimenting or by minimizing the cross entropy loss, as we do in
logistic-regression.