Skip to content

Printing the text in a failed llm_parse_json #629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 23, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 9 additions & 13 deletions paperqa/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,29 +13,25 @@
def llm_parse_json(text: str) -> dict:
"""Read LLM output and extract JSON data from it."""
# fetch from markdown ```json if present
text = text.strip().split("```json")[-1].split("```")[0]
# split anything before the first {
text = "{" + text.split("{", 1)[-1]
# split anything after the last }
text = text.rsplit("}", 1)[0] + "}"
ptext = text.strip().split("```json")[-1].split("```")[0]
# split anything before the first { after the last }
ptext = ("{" + ptext.split("{", 1)[-1]).rsplit("}", 1)[0] + "}"

# escape new lines within strings
def replace_newlines(match: re.Match) -> str:
def escape_newlines(match: re.Match) -> str:
return match.group(0).replace("\n", "\\n")

# Match anything between double quotes
# including escaped quotes and other escaped characters.
# https://regex101.com/r/VFcDmB/1
pattern = r'"(?:[^"\\]|\\.)*"'
text = re.sub(pattern, replace_newlines, text)
ptext = re.sub(pattern, escape_newlines, ptext)
try:
return json.loads(text)
return json.loads(ptext)
except json.JSONDecodeError as e:
raise ValueError(
"Failed to parse JSON. Your model may not "
"be capable of supporting JSON output. Try "
"a different model or with "
"`Settings(prompts={'use_json': False})`"
f"Failed to parse JSON from text {text!r}. Your model may not be capable of"
" supporting JSON output or our parsing technique could use some work. Try"
" a different model or specify `Settings(prompts={'use_json': False})`"
) from e


Expand Down