Why Stream at All
The traditional Spring MVC model serializes the entire response into memory before sending it. A controller returns a List<Message>, Jackson converts it to a JSON array, and the framework writes the complete byte buffer to the socket. This works until the dataset is large, the source is slow, or both.
Consider a database query that returns 10,000 rows. The blocking approach loads all rows into a List, serializes the entire JSON array, and only then begins writing to the client. Memory usage spikes, time-to-first-byte suffers, and the client waits for the slowest row before seeing anything.
Streaming inverts this. Each row is serialized and flushed to the client as soon as it is available. Memory stays flat, time-to-first-byte drops to the latency of the first row, and the client can start processing while the server is still querying. This is the core value proposition of Spring WebFlux with NDJSON.
NDJSON: One JSON Value Per Line
Newline-Delimited JSON (NDJSON) is a simple format: each line is a complete, self-contained JSON value, separated by \n. No wrapping array, no commas between elements, no need to parse the entire response before processing the first item.
A regular JSON response:
[{"id":"1","text":"hello","author":"alice"},{"id":"2","text":"world","author":"bob"}]
The same data as NDJSON:
{"id":"1","text":"hello","author":"alice"}
{"id":"2","text":"world","author":"bob"}
The difference matters for streaming. A JSON array requires the closing ] before the client knows parsing is complete. An NDJSON stream can be consumed line-by-line — each line is independently valid. The client processes element 1 while element 2 is still being generated.
Spring WebFlux uses the content type application/x-ndjson for this format.
Server Side: Returning Flux
A Spring WebFlux controller that returns Flux<T> automatically streams when the client requests NDJSON. No special configuration required:
@RestController
@RequestMapping("/messages")
public class MessagesController {
private final R2dbcEntityTemplate r2dbcTemplate;
@GetMapping
public Flux<Message> getMessages(@RequestParam("author") String author) {
return r2dbcTemplate.getDatabaseClient()
.sql("select id, text, author, created_at from message where author = :author")
.bind("author", author)
.map(row -> new Message(
row.get("id", String.class),
row.get("text", String.class),
row.get("author", String.class),
row.get("created_at", ZonedDateTime.class)
))
.all();
}
}
When a client sends Accept: application/x-ndjson, Spring WebFlux serializes each Message as a separate JSON line and flushes it immediately. When the same endpoint receives Accept: application/json, it collects all elements and returns a JSON array. The controller code is identical — content negotiation handles the rest.
The Message model is a simple record:
public record Message(String id, String text, String author, ZonedDateTime createdAt) {
}
Streaming POST: Accepting Flux as Request Body
The same principle works in reverse. A controller can accept a Flux<T> as a request body, processing each element as it arrives over the wire:
@PostMapping
public Mono<Void> addMessages(@RequestBody Flux<Message> messages) {
return messages.flatMap(message -> r2dbcTemplate.insert(Message.class).using(message))
.then();
}
When the client sends an NDJSON body with Content-Type: application/x-ndjson, Spring WebFlux deserializes each line into a Message and emits it into the Flux as it arrives. The server processes and inserts rows while the client is still sending. No buffering the entire request body into memory.
This bidirectional streaming is one area where NDJSON has an advantage over Server-Sent Events. SSE is server-to-client only. NDJSON works in both directions over standard HTTP.
Client Side: Consuming Flux with WebClient
Spring’s WebClient consumes NDJSON streams with bodyToFlux. Each NDJSON line is deserialized into an object and emitted as a Reactor element:
webClient.get()
.uri(uri, uriBuilder -> uriBuilder.path("/messages")
.queryParam("author", author)
.build())
.accept(MediaType.APPLICATION_NDJSON)
.retrieve()
.bodyToFlux(Message.class)
This returns a Flux<Message> that emits elements as they arrive from the server. Back-pressure propagates automatically: if the consumer is slower than the producer, Reactor’s demand signaling pauses reading from the TCP socket, which in turn pauses the server through TCP flow control.
Sending Streaming Requests
To send a streaming NDJSON request, pass a Flux as the body instead of bodyValue:
Flux<Message> messages = Flux.fromIterable(batch);
webClient.post()
.uri(uri, uriBuilder -> uriBuilder.path("/messages").build())
.contentType(MediaType.APPLICATION_NDJSON)
.body(messages, Message.class)
.retrieve()
.bodyToMono(Void.class)
The critical distinction: .body(flux, Message.class) streams elements one by one as NDJSON lines. .bodyValue(list) serializes the entire collection as a single JSON array. The server endpoint signature determines the behavior — @RequestBody Flux<Message> deserializes NDJSON line-by-line, while @RequestBody List<Message> expects a JSON array.
Blocking vs Reactive: What Changes
The difference between a blocking servlet endpoint and a reactive WebFlux endpoint is not just the return type. The entire I/O model changes.
Blocking (Servlet + JDBC):
@GetMapping
public List<Message> getMessages(@RequestParam("author") String author) {
return jdbcTemplate.query(
"select id, text, author, created_at from message where author = ?",
(rs, rowNum) -> new Message(
rs.getString("id"),
rs.getString("text"),
rs.getString("author"),
rs.getTimestamp("created_at").toInstant().atZone(ZoneOffset.UTC)
),
author);
}
Reactive (WebFlux + R2DBC):
@GetMapping
public Flux<Message> getMessages(@RequestParam("author") String author) {
return r2dbcTemplate.getDatabaseClient()
.sql("select id, text, author, created_at from message where author = :author")
.bind("author", author)
.map(row -> new Message(
row.get("id", String.class),
row.get("text", String.class),
row.get("author", String.class),
row.get("created_at", ZonedDateTime.class)
))
.all();
}
The code looks similar, but the execution model is fundamentally different:
| Aspect | Servlet + JDBC | WebFlux + R2DBC |
|---|---|---|
| Thread model | One thread per request, blocked during query | Event loop, thread released during I/O |
| Memory | Entire result set in List |
One row at a time in Flux |
| Serialization | Full JSON array after all rows load | NDJSON, one line per row as it arrives |
| Time-to-first-byte | After last row + full serialization | After first row + single line serialization |
| Back-pressure | None — client gets the full payload | TCP flow control pauses the database query |
The reactive version does not need more threads to handle more connections. A 4-thread Netty event loop can serve thousands of concurrent streaming responses because no thread is ever blocked waiting for I/O.
Netty Configuration for Streaming
Spring WebFlux runs on Netty by default. For streaming workloads, two configuration choices matter: thread count and event loop type.
@Configuration
public class NettyConfiguration {
@Bean
public WebServerFactoryCustomizer<NettyReactiveWebServerFactory> nettyCustomizer(
@Value("${server.netty.server.threads:#{T(java.lang.Runtime).getRuntime().availableProcessors()*2}}")
int threads) {
return factory -> factory.addServerCustomizers(
server -> server.runOn(
LoopResources.create("netty-server-%d", threads, false),
true
)
);
}
}
The thread count defaults to availableProcessors * 2. For streaming, this is usually sufficient — the event loop threads are never blocked, so a small number handles thousands of concurrent streams. Increasing threads beyond CPU count rarely helps and can increase context-switching overhead.
Setting the second parameter of runOn to true enables native transport (epoll on Linux, kqueue on macOS) instead of Java NIO. Native transports reduce system call overhead and improve throughput under high concurrency.
WebClient Configuration for Long Streams
When consuming long-running NDJSON streams — such as LLM token responses that can take minutes — the WebClient needs tuned timeouts and connection pool settings:
WebClient webClient = WebClient.builder()
.clientConnector(new ReactorClientHttpConnector(
HttpClient.create(
ConnectionProvider.builder("streaming")
.maxConnections(1000)
.pendingAcquireMaxCount(10_000)
.pendingAcquireTimeout(Duration.ofSeconds(30))
.maxIdleTime(Duration.ofSeconds(5))
.build()
)
.responseTimeout(Duration.ofSeconds(60))
.runOn(LoopResources.create("http-client"))
))
.codecs(config -> config.defaultCodecs().maxInMemorySize(512 * 1024))
.build();
Key parameters:
responseTimeout— maximum time to wait for a chunk. Set this high enough for the slowest expected inter-token gap, but low enough to detect hung connections.maxConnections— total connection pool size. Each streaming response holds a connection for its entire duration, so size this for peak concurrent streams, not peak requests per second.maxIdleTime— how long to keep idle connections. Short values (5s) prevent stale connections from accumulating.maxInMemorySize— per-codec buffer limit. For NDJSON streams, individual lines are small, so 512KB is generous.
Back-Pressure in Practice
Back-pressure is the mechanism that prevents a fast producer from overwhelming a slow consumer. In a Flux-based NDJSON pipeline, it propagates automatically across the entire chain:
If the application processes elements slowly, bodyToFlux stops reading from the socket. TCP flow control kicks in, pausing the server’s writes. The server’s Flux stops requesting elements from R2DBC, which pauses the database cursor.
No element is ever produced that cannot be consumed. Memory stays bounded regardless of dataset size.
This only works when the entire chain is reactive. One blocking call — a synchronous database query, a block() invocation, a List.collect() — breaks the chain and forces buffering.
When to Use NDJSON Streaming
NDJSON streaming is not always the right choice. It adds complexity to the client (line-by-line parsing instead of a single JSON array) and requires the full reactive stack (WebFlux + R2DBC or a reactive data source).
Use streaming when:
- The response is large (thousands of items) or unbounded
- The data source is slow (LLM generation, external API calls, large database queries)
- Time-to-first-byte matters more than total throughput
- You need to propagate back-pressure to the source
- You are building service-to-service pipelines where both sides use WebFlux
Use regular JSON when:
- The response fits comfortably in memory (tens to hundreds of items)
- The client needs the complete dataset before processing (sorting, aggregation)
- Simplicity matters more than latency — JSON arrays are easier to debug and test
The same controller can serve both. Return Flux<T>, and let the Accept header decide the format. Clients that want streaming send Accept: application/x-ndjson. Clients that want the full response send Accept: application/json.