Streaming LLM Responses to Telegram with Reactive Draft Messages

The Problem with Waiting Large language models are slow. A typical response takes seconds — sometimes tens of seconds — to generate. During that time, the user stares at an empty chat window, wondering if anything is happening at all. Every major LLM chat interface solved this the same way: stream tokens as they arrive. ChatGPT, Claude, Gemini — they all render partial text while the model is still thinking. The experience feels responsive even when the full response takes 15 seconds. ...

March 29, 2026 · 12 min · Alisher Alimov

Streaming Data with Spring WebFlux and NDJSON

Why Stream at All The traditional Spring MVC model serializes the entire response into memory before sending it. A controller returns a List<Message>, Jackson converts it to a JSON array, and the framework writes the complete byte buffer to the socket. This works until the dataset is large, the source is slow, or both. Consider a database query that returns 10,000 rows. The blocking approach loads all rows into a List, serializes the entire JSON array, and only then begins writing to the client. Memory usage spikes, time-to-first-byte suffers, and the client waits for the slowest row before seeing anything. ...

March 9, 2025 · 8 min · Alisher Alimov