Description
Brian Tarbox opened BATCH-1902 and commented
DefaultFieldSet creates a SimpleDateFormat (SDF) object as a member variable. This means it can not be overridden. SDF eventually calls TimeZone which calls getDefaultInAppContext. This is a static, synchronized and very slow method. This results in extremely slow reads.
To fix the problem in my own code means I have to basically make a copy of the entire DefaultFieldSet class. If the SDF were injected then I could change the slow behavior without having to copy the whole class.
I spoke with Gary Gregory (Spring Batch In Action) and he liked the idea of this change.
Affects: 2.1.8
Reference URL: http://stackoverflow.com/questions/12984345/java-7-calendar-getinstance-timezone-gettimezone-got-synchronized-and-slow-any
0 votes, 5 watchers
Activity
spring-projects-issues commentedon Mar 27, 2013
Markus Bernhardt commented
We analyzed a small batch which reads and writes 1 million rows with YourKit and found this initializer costs roughly 6% of the overall execution time of our batch.
Couldn't that at least be replaced with something more performant like:
So only one SimpleDateFormat get initialized per worker thread.
The change also doesn't change the behaviour or interface of the class.
BTW the same problem existst for the NumberFormat instance in that class. His initializer costs another 3% of the overall execution time of our batch. Perhaps you could change that accordingly:
And last but not leats, one of the constructors does a unneccessary double initialization, which costs in our batch another 3% of the overall execution.
What do you think about those changes?
spring-projects-issues commentedon Mar 27, 2013
Brian Tarbox commented
I think those are fine changes and would likely address the problem at hand.
If these changes are deemed to be less intrusive than letting the user of the class inject/override the date stuff then I'd be fine with the changes.
spring-projects-issues commentedon Mar 27, 2013
Michael Minella commented
A couple thoughts:
InitializingBean#afterPropertiesSet
). We could require this via a constructor or add logic to the DefaultFieldSetFactory to prevent it from being instantiated if an alternative is going to be injected.spring-projects-issues commentedon Mar 27, 2013
Brian Tarbox commented
I don't have the exact numbers handy anymore (entered the bug a while ago), but I recall the system was spinning at 99% cpu traceable directly to this.
WRT to solutions...in my own local copy of DefaultFieldSet I went for simple and took advantage of the fact that the SimpleDateFormater in question always uses a hardcoded format of "yyyy-MM-dd", and so did the following for the standard case:
and then added a setter for DateFormat in case someone wanted a different format.
This solved the problem for us.
spring-projects-issues commentedon Mar 27, 2013
Michael Minella commented
I see. With regards to the solution you used, that wouldn't work on the general case since SimpleDateFormat isn't thread safe. For the setter, that is already there per my previous comment.
I definitely think we can update the DefaultFieldSet to not create the formatter by default and have the factory inject it if there is not another to be injected.
spring-projects-issues commentedon Mar 27, 2013
Markus Bernhardt commented
Hi Michael,
I found this ticket, because we are evaluating at the moment to switch our batches to Spring Batch. I ported a small batch that reads roughly 1.5 million rows (350MB) fixed length data in EBCDIC from the file system, does some not too complicated computations and writes them back. I have implemented one prototype based on the Spring Batch samples, one in plain old java.
So far so good. The problem is, the plain old java solution does the job in under 6 seconds. The Spring Batch based solution took in the first version 28 Minutes. So we fired up YourKit and looked into it. We found lots of database transactions. After a little while we found that we were reading the data in chunks of 1 row. So for every row the application context was stored. By increasing the chunk size to 100,000 rows, this problem was solved and the overall executing time sank to a little over 4 minutes.
Still ways to slow. So we looked again into it with yourkit and found, that 12% of the overall runtime, thats 30 seconds, are spent initializing this two formatters. Only the FormatterLineAggregator.doAggregate is worse performance wise in our code.
Regarding your points:
spring-projects-issues commentedon Mar 27, 2013
Brian Tarbox commented
Michael,
You are right of course about SimpleDateFormat not being thread safe...we did a quick/dirty that worked for us.
I'll mention that we also had to create our own DefaultFieldSetFactory that was called by the file tokenizer. We had to do this to have a place to do the injection of the NumberFormat object (i.e. the SimpleDateFormat).
To Markus's point we also had to create a new BufferedReaderFactory to give the FlatFileItemReader a large buffer with which to read files. We were reading files over the network which the roundtrip time dominated and the default size of a buffered reader (1024 bytes I think....) was way too small.
I'm hoping that the end resolution to this jira issue will address all of these highly-related issues. Thanks.
spring-projects-issues commentedon Mar 9, 2014
Jimmy Praet commented
The way the framework currently allows you to override the default DateFormat and NumberFormat through DefaultFieldSetFactory.setDateFormat() and DefaultFieldSetFactory.setNumberFormat() is also quite risky. As both of these SimpleDateFormat and NumberFormat classes aren't thread safe, you should pay attention as a user to define these beans with scope="step".
The javadoc of DefaultFieldSetFactory is also incorrect: it states that the NumberFormat defaults to the default locale, and the DateFormat defaults to yyyy/MM/dd. But in fact the NumberFormat defaults to Locale.US and the DateFormat defaults to yyyy-MM-dd.
DefaultFieldSet
#4548[-]DefaultFieldSet creates a new SimpleDateFormat as a class member; newer JVMs cause SDF to get stuck in TimeZone.getDefaultInAppContext which is slow [BATCH-1902][/-][+]DefaultFieldSet creates a new SimpleDateFormat as a class member; newer JVMs cause SDF to get stuck in TimeZone.getDefaultInAppContext which is slow[/+]Reduce formatter initializations in `DefaultFieldSet`
Reduce formatter initializations in `DefaultFieldSet`
[-]DefaultFieldSet creates a new SimpleDateFormat as a class member; newer JVMs cause SDF to get stuck in TimeZone.getDefaultInAppContext which is slow[/-][+]Performance issue in DefaultFieldSet due to the usage of SimpleDateFormat[/+]Reduce formatter initializations in `DefaultFieldSet`