Ollama

The appeal of running your own language models is real: no API costs, no rate limits, no data leaving your network, and a fallback chain that still works when a cloud provider has an outage. I’ve been chasing that for a while. This week I finally sat down and measured what I actually have. The short version: the potential is there. The hardware isn’t. Yet. Moving Ollama to the Server I’d been running Ollama on my desktop. The problem with that is obvious once you think about it — the desktop sleeps, reboots, and isn’t shared. If a bot wants to use a local model at 3am, it’s out of luck. ...

Ollama

What Context Length Actually Costs on CPU

Local Models Are Exciting. My CPU Is Not.