Open
Description
I'm using gitpython to collect diff information between a commit and his parent.
Generally, the following code works fine when the number of diffs to retrieve is small:
diffs = c.parents[0].diff(c, create_patch=True)
Conversely, when the number of diffs is huge (https://git.eclipse.org/c/papyrus/org.eclipse.papyrus.git/commit/?id=f5f817279baa2008450aa32b18e576c2fcda02bb), that code is not able to produce an output after 24h (at least).
Is there another way I could use to retrieve the diff information between two commits?
Below you can find the code to replicate this behaviour:
from git import *
REPO_PATH = ""C:/Users/.../org.eclipse.papyrus"" (you can clone it from here: https://git.eclipse.org/c/papyrus/org.eclipse.papyrus.git/)
BRANCH = "2.0.0"
def main():
repo = Repo(REPO_PATH, odbt=GitCmdObjectDB)
reference = [r for r in repo.references if r.name == BRANCH][0]
for c in repo.iter_commits(rev=reference):
if c.hexsha == 'f5f817279baa2008450aa32b18e576c2fcda02bb':
diffs = c.parents[0].diff(c, create_patch=True)
print str(len(diffs))
break
if __name__ == "__main__":
main()
Activity
Byron commentedon Aug 21, 2016
Unfortunately, I cannot reproduce the issue despite of the fabulous reproduction script. This is what I did:
git clone http://git.eclipse.org/gitroot/papyrus/org.eclipse.papyrus.git
time python reproduce.py
The latter produced this output:
It appears there is something else going on. Maybe you are not using the latest version ? Maybe it's something related to windows particularly. In any case, we will have to dig deeper to find a solution for this one.
The actual script I ended up using is behind the fold.
For completeness, here is the memory usage when trying to show the diff in the WEB-GUI - it took a long time to load as well.

valeriocos commentedon Aug 22, 2016
I've updated gitpython to the last version (2.0.8), however the problem is still there. As you said, it may depend on Windows-related stuff.
I found a workaround that seems to work fine, below the code.
Byron commentedon Aug 23, 2016
Thanks for the feedback, and for posting the workaround !
Given that the project is not tested on Windows anymore, and is supporting Windows only on a 'best-effort' basis, I believe there is nothing that can be done here to fix this particular case.
Thus I am closing this issue. If you disagree or would like to contribute some sort of fix, please let me know in the comments.
ankostis commentedon Oct 11, 2016
I can definitely reproduce this.
git.diff
code has been retrofitted on #519 to use threads when reading stream, but STILL I've seen a case where it blocked with particularly big streams.Maybe using additionally
queues
might solve the problem for good. See http://eyalarubas.com/python-subproc-nonblock.html and http://stackoverflow.com/a/4896288/548792