-
Notifications
You must be signed in to change notification settings - Fork 785
Open
Description
We've been using opengrok for many years and I've come to accept the 'warning' messages that show up in the logs. That being said, at what point are they "bugs" or things you guys want to know and I/we should report them?
Examples:
2018-11-28 11:40:16.756 WARNING [org.opengrok.indexer.index] - ERROR addFile(): /source/fakefolder Use Data Extract/trunk/docs/etg_mbr.txt.gz
java.lang.IllegalArgumentException: startOffset must be non-negative, and endOffset must be >= startOffset; got startOffset=2147483647,endOffset=-2147483611
2018-11-28 11:36:15.474 WARNING [org.opengrok.indexer.util] - Non-zero exit status -1 from command [/usr/bin/svn, log, --non-interactive, --xml, -v, /source/fakefolder] in directory /n/source/fakefolder
2018-11-28 11:36:15.475 WARNING [org.opengrok.indexer.history] - An error occurred while creating cache for /source/fakefolder(SubversionRepository)
org.opengrok.indexer.history.HistoryException: Failed to get history for: "/source/fakefolder" Exit code: -1
2018-11-28 11:36:15.474 SEVERE [org.opengrok.indexer.util] - Failed to read from process: /usr/bin/svn
java.io.IOException: An error occurred while parsing the xml output
at org.opengrok.indexer.history.SubversionHistoryParser.processStream(SubversionHistoryParser.java:195)
2018-11-28 11:22:59.605 WARNING [org.opengrok.indexer.util] - Non-zero exit status 1 from command [/usr/bin/svn, log, --non-interactive, --xml, -v, -l1, /source/fakefolder@] in directory /source/fakefolder
Activity
vladak commentedon Nov 29, 2018
In general, it never hurts to submit a issue for a problem however be prepared you will need to do your homework w.r.t. investigation.
The Subversion might be a local problem.
The
IllegalArgumentException
could be a bug. What version are you running at the moment ? Could you share the contents of the file ?tulinkry commentedon Nov 29, 2018
for the svn you can go to the directory
/n/source/fakefolder
and run the command/usr/bin/svn log --non-interactive --xml -v /source/fakefolder
to see what went wrongShooter3k commentedon Dec 3, 2018
Thanks for the suggestion. It would seem we're getting authentication errors most likely because I'm running the opengrok indexer on an account that does not have access to the SVN repository.
vladak commentedon Dec 5, 2018
There is a (clunky) way how to pass username/password to Subversion process - using the
OPENGROK_SUBVERSION_USERNAME
/OPENGROK_SUBVERSION_PASSWORD
environment variables.vladak commentedon Dec 5, 2018
Can you try to get more info about the
IllegalArgumentException
problem ? (i.e. share the contents of the file that seem to cause this)[-]When should someone report issues?[/-][+]IllegalArgumentException when adding a file[/+]Shooter3k commentedon Dec 7, 2018
(FYI: I accidentally closed and reopened this issue)
The file contains secret information about our company but it's a 3GB text file inside of a 300MB .gz file.
My assumption is the size of the file is causing issues.
Is there anything I could check without share the actual file itself?
vladak commentedon Dec 12, 2018
Is there a stack trace in the log associated with the
IllegalArgumentException
?The logger used in this case should log one I believe as it is called like this from
IndexDatabase#indexParallel()
:This is because
IllegalArgumentException
extendsRuntimeException
.The exception likely comes from one of the analyzers -
addFile()
callsAnalyzerGuru#populateDocument()
that performs:In your case it could be
GZIPAnalyzer
or the analyzer for the contents therein.vladak commentedon Dec 12, 2018
Also, maybe worth trying to bisect the original file (assuming the exception is caused by the contents and not the compressed image) and see if you could find the spot which causes the problem.
Shooter3k commentedon Dec 12, 2018
Unfortunately, someone should have never checked a 300MB compressed (3GB uncompressed) text file like this into our repo. I have no desire to get opengrok to index the file but if you guys need me to debug it for future development, I will. I was planning to either ignore the file or delete it
Here is the stack trace.
vladak commentedon Dec 14, 2018
2147483647
is2^31-1
, i.e. 2 GiB short by 1 byte andabs(-2147483611)
is 2GiB short by 37 bytes so probably overflow of a 2 GiBsigned int
value by 37 bytes. This might be caused by huge token processed byPlainFullTokenizer
.If you run OpenGrok before 1.1-rc80 the chances are you are bumping into the issue fixed in cset 3e49081 - normally the Java lexer classes generated from the .lex descriptions should be changed not to accept too long tokens.
3 remaining items
vladak commentedon Dec 14, 2018
In the meantime we could limit the maximum size of files to 2 GiB. Maybe time to revisit #534.
vladak commentedon Dec 14, 2018
Actually, limiting on input file size cannot work given that how GZip analyzer works - it is based on streams.