Streaming LLM Responses to Telegram with Reactive Draft Messages
The Problem with Waiting Large language models are slow. A typical response takes seconds — sometimes tens of seconds — to generate. During that time, the user stares at an empty chat window, wondering if anything is happening at all. Every major LLM chat interface solved this the same way: stream tokens as they arrive. ChatGPT, Claude, Gemini — they all render partial text while the model is still thinking. The experience feels responsive even when the full response takes 15 seconds. ...