Context

Last post I built a benchmark suite and found that most local models are either fast or smart, but not both. The problem with those benchmarks: they were short. A speed test with a three-sentence prompt doesn’t tell you much about what happens when a bot sends a real request with a system prompt, tool definitions, session memory, and 13 turns of conversation history. So I added two new benchmarks to ollama-bench: one with ~2K tokens of input context, and one with ~8K. Then I ran all 14 models through the full suite. ...