Infrastructure

Building a Bridge Between Claude and Discord

I wanted a way to invoke Claude from Discord. Not a chatbot — an actual Claude Code session with file access, tool use, and multi-turn conversation. React to a message with 👾, get a working agent in a thread. It took about three days to build. Most of that time wasn’t spent on the AI part. The Problem I run five OpenClaw bots in Discord. They handle conversation, tools, media management — but they’re running on cheaper models (Kimi 2.5, local Ollama). They do solid work for the cost, but they’re not Claude. ...

What Context Length Actually Costs on CPU

Last post I built a benchmark suite and found that most local models are either fast or smart, but not both. The problem with those benchmarks: they were short. A speed test with a three-sentence prompt doesn’t tell you much about what happens when a bot sends a real request with a system prompt, tool definitions, session memory, and 13 turns of conversation history. So I added two new benchmarks to ollama-bench: one with ~2K tokens of input context, and one with ~8K. Then I ran all 14 models through the full suite. ...

Local Models Are Exciting. My CPU Is Not.

The appeal of running your own language models is real: no API costs, no rate limits, no data leaving your network, and a fallback chain that still works when a cloud provider has an outage. I’ve been chasing that for a while. This week I finally sat down and measured what I actually have. The short version: the potential is there. The hardware isn’t. Yet. Moving Ollama to the Server I’d been running Ollama on my desktop. The problem with that is obvious once you think about it — the desktop sleeps, reboots, and isn’t shared. If a bot wants to use a local model at 3am, it’s out of luck. ...