Skip to content

Possible performance issue in the generation of JSON in Spring Web Reactive [SPR-15095] #19662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
spring-projects-issues opened this issue Jan 4, 2017 · 7 comments
Assignees
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Milestone

Comments

@spring-projects-issues
Copy link
Collaborator

spring-projects-issues commented Jan 4, 2017

Daniel Fernández opened SPR-15095 and commented

During my tests with the sandbox applications I developed for testing the integration between Thymeleaf and Spring 5 Web Reactive, I found some strange results that might be the symptom of some kind of performance issue in the Spring Web Reactive side, specifically when returning large amounts of JSON.

Scenario

A web application can return a large amount of entities stored on a JDBC database, both as JSON or as an HTML <table>. These entities are quite simple objects (5 properties: 4 strings and 1 integer).

Implementation

The thymeleafsandbox-biglist-mvc and thymeleafsandbox-biglist-reactive (both containing a tag named spr15095) implement this scenario:

  • Database is an in-memory SQLite, accessed through JDBC. The executed query returns 8,715 entries, which are repeated 300 times. Total: 2.6 million entries.
  • thymeleafsandbox-biglist-mvc implements this scenario using Spring Boot 1.4.3, Spring 4 MVC and Thymeleaf 3.0.
  • thymeleafsandbox-biglist-reactive implements this scenario using Spring Boot 2.0.0, Spring 5 Reactive and Thymeleaf 3.0.

The MVC application uses Apache Tomcat, the Reactive application uses Netty.

The MVC application returns its data as an Iterator<PlaylistEntry>, whereas the Reactive application returns a Flux<PlaylistEntry>.

Thymeleaf is configured in the Reactive application to use a maximum output chunk size (i.e. size of the returned DataBuffer objects) of 8 KBytes. No explicit configuration of any kind is performed for the output chunk size of the JSON requests.

None of the applications have the Spring Boot devtools enabled. Or should not have (it is not included as a dependency).

Both applications can be easily started with mvn -U clean compile spring-boot:run

Observed JSON results

When the JSON data is requested using curl, this is the result obtained for MVC:

$ curl http://localhost:8080/biglist.json > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  343M    0  343M    0     0   224M      0 --:--:--  0:00:01 --:--:--  224M

So 343 Mbytes of JSON in little more than a second, which looks pretty good. But when sending the same request to the Reactive application:

$ curl http://localhost:8080/biglist.json > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  343M    0  343M    0     0  21.5M      0 --:--:--  0:00:15 --:--:-- 21.3M

Same 343 Mbytes of JSON, but the average download rate goes down from 224MB/sec to less than one tenth of this, 21.5MB/sec!

Both JSON outputs have been checked to be exactly the same.

Observed HTML results

These applications allow us to check the same figures for HTML output using Thymeleaf. This should give us an idea of whether the difference observed at the JSON side is entirely to be blamed on the reactiveness of the Netty setup.

In this case Thymeleaf is used to generate HTML output for the same 2.6 Million database entries, using a complete HTML template with a <table> containing the data.

For the MVC application:

$ curl http://localhost:8080/biglist.thymeleaf > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  460M    0  460M    0     0  33.7M      0 --:--:--  0:00:13 --:--:-- 33.7M

Whereas for the Reactive application, using the Thymeleaf data-driven operation mode (Thymeleaf subscribes to the Flux<PlaylistEntry> itself):

$ curl http://localhost:8080/biglist-datadriven.thymeleaf > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  460M    0  460M    0     0  24.4M      0 --:--:--  0:00:18 --:--:-- 24.5M

So note how this time 460 MBytes of HTML are being returned (exactly the same output in both cases), but the difference between MVC and Reactive is much less in the case of Thymeleaf generating HTML: from 33.7 MBytes/sec to 24.4 Mbytes/sec.

So the difference observed in MVC vs Reactive at the HTML side is much, much smaller than what could be observed at JSON.

Conclusions

If I'm not missing anything important, there might be some performance issues affecting the production of JSON in Spring Web Reactive.

These issues might be specific to Netty, but Tomcat has also been tested and, though the results improve, they don't improve a lot... so there might be something else.


Affects: 5.0 M4

Reference URL: https://github.com/thymeleaf/thymeleafsandbox-biglist-reactive/tree/spr15095

Issue Links:

Referenced from: commits 6b9b023

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Jan 5, 2017

Rossen Stoyanchev commented

This is a follow-up to the work in M4 around #19510 where we ensured that RxNetty and Reactor Netty flush at a minimum at an 8K boundary. Let's use this issue and example application to further refine the default approach to flushing. First find out the reason for the slower rendering of JSON, then experiment with alternatives for flushing proactively, e.g. after each element, by time, or other.

We also have #19547 to consider at the same time for RC1 which takes the opposite approach of only flushing at the end by default and providing control for flushing more frequently. That may be worthwhile for the common case with more extensive rendering of say 2.6 million elements requiring more explicit flushing control.

Sébastien Deleuze as discussed I'm assigning to you.

@spring-projects-issues
Copy link
Collaborator Author

spring-projects-issues commented Feb 2, 2017

Sébastien Deleuze commented

Daniel Fernández Thanks again for such detailed issue, that really helps!

Based on some initial tests with serializing List (it seems we do not support Iterable yet) and on other thoughts I had while implementing Jackson2JsonEncoder, and while there me may some remaining optimizations to perform related to flushing, I think the root cause of this performance issue is that for Flux we currently call Jackson serialization for each element in order to be able to "simulate" a non-blocking behavior with Jackson current blocking API.

I think we should be able to improve largely the performances by keeping this behavior only for real streams (see #19671). My proposal is basically to differentiate infinite and not infinite streams. Finite stream should be serialized as arrays via a single call to Jackson by using flux.toIterable() and keep calling Jackson for each element only for streaming use cases. Streaming use case could be SSE or server to server streaming enabled by a hint (to be created) that could be specified via #19670.

I am going to work on this on implementing Jackson single invocation for non streaming scenario (and keep current behavior for SSE), we will add then non-sse streaming support via #19671 and #19670.

@spring-projects-issues
Copy link
Collaborator Author

Sébastien Deleuze commented

More information about the performances on my laptop:

With master: 25 Mbytes/sec
With this commit that invokes flux.collectList() then performs a single Jackson invocation: 80 Mbytes/sec

That's better but not optimal yet because (I think) we wait to have collected the whole list before starting to serialize.

I will try to use an Iterable or Iterator instead (currently it does not work not sure yet why at Jackson level).

@spring-projects-issues
Copy link
Collaborator Author

Sébastien Deleuze commented

With toIterable() I reach 110 MBytes, which seems not unreasonable given the overhead coming for flux small element management, see this WIP branch.

I think that's a good first step that we will be able eventually to optimize later with flushing management options.

@spring-projects-issues
Copy link
Collaborator Author

Sébastien Deleuze commented

I have created a pull request to be reviewed.

@spring-projects-issues
Copy link
Collaborator Author

Rossen Stoyanchev commented

I am wondering if toIterateble is appropriate to use. It says "Transform this Flux into a lazy Iterable blocking on next calls." Note the "blocking" part. So while it may be more optimal in the tests we are trying on a fast network could it not lead to blocking?

@spring-projects-issues
Copy link
Collaborator Author

Sébastien Deleuze commented

Fixed with this commit, we now use collectList() + a single Jackson invocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

2 participants