Performance patch and a few enhancements#568
Closed
bpasteur wants to merge 1 commit intouclouvain:masterfrom
Closed
Performance patch and a few enhancements#568bpasteur wants to merge 1 commit intouclouvain:masterfrom
bpasteur wants to merge 1 commit intouclouvain:masterfrom
Conversation
…itional enhancements
* added error checking to the thread loops
* allow setting the number of threads to use (the default is the number of processors) in the library code
* added parameters to opj_compress and opj_decompress to allow passing in the number of threads to use
* added parameters to opj_compress to suppress warning pop-ups for unknown tag types (to enable performance testing scrips)
* added additional timing prints to opj_compress and opj_decompress
* added a check for tiled tif files and bailing with an error since they are not yet supported
|
I didn't write the OpenMP patch. Some time ago I posted an optimisation patch which used some of the ideas from Taubman/Marcellin to speed up the encoder (specifically the T1). My patch did not use OpenMP though. The link to google groups above is me discussing my patch, not the OpenMP one. |
Author
|
Sorry Carl (and Aaron), I misread Aaron's comment in his T1 optimize pull request. I have corrected my comment above. |
Collaborator
Collaborator
|
I finally decided to implement multi-threading decoding my own way in PR #786 |
Collaborator
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I am using the OpenJpeg libraries to convert between large TIF and JP2 files. A while back I found Aaron Boxer's OpenMP patch (https://code.google.com/p/openjpeg/issues/detail?id=372). It had a memory leak which led to poor performance with large files. After some small changes to correct the memory leak I am seeing a substantial performance boost. I am very interested in getting this patch accepted into the main branch so the changes can be maintained going forward.
The performance gains I am seeing are well worth incorporating this patch into the code base. This patch allows you to take advantage of all the CPU's in the system. With the current trunk I am seeing 20 to 30 percent CPU utilization, with this patch (using the same number of threads as CPU cores) I am seeing 80 to 90 percent CPU utilization. In a system with a fast CPU, a lot of cores, and a lot of memory you could scale up the number of threads to really take advantage of the system resources. Storing large files and creating large mosaics are good candidates for the jp2 files and the performance numbers appear to be better for these larger files. A chart with some performance numbers comparing the main branch with the patch is listed at the bottom of this post.
I included some additional enhancements along with the performance patch:
As for the performance numbers, I am running on a virtual Windows 7 64 bit machine, 4 processors and 6 GB of memory. There are no OpenMP enhancements in the code that loads and converts the BMP, PNG, and TIF files into an image in memory before being processed into a .jp2 file. Because of this those times are not included in the timing analysis. Likewise with decompression the actual writing of the decompressed file is excluded from the timing analysis. As best as I can tell performance number calculations are all over the map. I am categorizing the performance numbers here as % faster (original time - new time) / new time. This gives a good indication of the performance I am actually seeing. The original times are also included in the list for any alternative formulas.
From what I've seen the smaller files do not get as much benefit from the threading. My guess is that the times are so short the overhead of managing the threads eats into the performance gains. There is also a lot of variance in times with the small files, likely due to the normal system usage noise. The BMP files show the smallest gains compared to PNG and TIF files. Generally speaking when creating tiled JP2 files the performance gains are less for the smaller tile sizes. In some cases negative gains are seen using the BMP files. The best performance gains seen are with large TIF files, compression giving the best results.