There is a whole genre of meeting where smart people argue about the wording of a prompt for an hour. It is the AI-era version of arguing about variable names, except the build is on fire in the next tab. What you show the model matters more than how you phrase the request.
The cheapest accuracy win there is
Going from zero-shot to a handful of good examples, around fifteen, is the cheapest accuracy win you can make to an AI feature. Closing the gap from roughly ninety percent to ninety-nine percent is what turns a demo into something you can put in front of a customer, and examples do more of that work than clever phrasing ever will.
Good examples, not many examples
Quality beats quantity. Fifteen well-chosen examples that cover the tricky cases beat a hundred that all look the same. Past a point, more examples stop helping and just cost you tokens and latency. Pick the examples that disagree with each other in instructive ways.
- Cover the edge cases on purpose, not the happy path five times.
- Include a hard negative, so the model learns where the line is.
- Keep the format consistent, because the model copies your format faithfully, including your mistakes.
The part people skip
Once you are leaning on examples, you need a way to know when a change made things worse. That is evaluation, and it is the difference between "it feels better" and "it is two points better on the cases we care about." Build the eval before you tune the prompt. Your future self, debugging a regression at 11pm, will send a thank-you note.