Project-Reactor

Streaming LLM Responses to Telegram with Reactive Draft Messages

The Problem with Waiting Large language models are slow. A typical response takes seconds — sometimes tens of seconds — to generate. During that time, the user stares at an empty chat window, wondering if anything is happening at all. Every major LLM chat interface solved this the same way: stream tokens as they arrive. ChatGPT, Claude, Gemini — they all render partial text while the model is still thinking. The experience feels responsive even when the full response takes 15 seconds. ...

Reactive Telegram Client: Polling, Pipelines, and Pulsar

From Design to Implementation In the previous article, I outlined the design goals and high-level architecture of a reactive Telegram bot framework — why non-blocking I/O matters, how separating polling from processing enables horizontal scaling, and why sealed interfaces make response dispatch type-safe at compile time. That article showed simplified code to explain the concepts. This one goes deeper. We will walk through the actual implementation: how Flux.create and expand drive the polling loop with back-pressure, how atomic variables coordinate concurrent state without locks, how the channel abstraction separates read and write concerns, and how Apache Pulsar’s reactive client integrates natively with Project Reactor. ...

Streaming Data with Spring WebFlux and NDJSON

Why Stream at All The traditional Spring MVC model serializes the entire response into memory before sending it. A controller returns a List<Message>, Jackson converts it to a JSON array, and the framework writes the complete byte buffer to the socket. This works until the dataset is large, the source is slow, or both. Consider a database query that returns 10,000 rows. The blocking approach loads all rows into a List, serializes the entire JSON array, and only then begins writing to the client. Memory usage spikes, time-to-first-byte suffers, and the client waits for the slowest row before seeing anything. ...