Skip to content

Add schema validation to PyDict -> Document #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 21, 2023

Conversation

GodTamIt
Copy link
Contributor

@GodTamIt GodTamIt commented Jul 8, 2023

This change addresses the issue mentioned here: #47

Document.extend() and Document.from_dict() support an optional schema field that, when provided, validates the provided dictionary against the schema.

This additionally fixes the issue where all numeric values are first parsed as I64 and upon failure, parsed as F64. This can cause problems for fields that are unsigned (U64) but are parsed as I64 instead, or fields that are float (F64) but is parsed as I64. Now, when a schema is provided, values will be parsed according to the schema's field specification.


Ok(value)
}

fn extract_value_single_or_list(any: &PyAny) -> PyResult<Vec<Value>> {
if let Ok(values) = any.downcast::<PyList>() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, but I think this would benefit from using .extract::<Vec<Value>>() or at least .extract::<Vec<&PyAny>>() instead of .downcast::<PyList>() to handle any sequence and not just lists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this needlessly create a temporary Vec? I'm transforming/mapping the elements inside, so maybe it's not worth collecting until the very end?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this needlessly create a temporary Vec? I'm transforming/mapping the elements inside, so maybe it's not worth collecting until the very end?

Yes, that would be the cost for supporting arbitrary sequences (like 1d NumPy arrays) instead of just lists. It is a trade-off and since it isn't really part of this PR it is probably best left to do elsewhere if done at all.

(Alternatively, this could use any.iter() to just iterate of the sequence, but .iter() also works for e.g. strings which would need to be checked before doing this.)

@GodTamIt GodTamIt force-pushed the godtamit-value-type-upstream branch 2 times, most recently from 27caa50 to 432ce44 Compare July 10, 2023 21:33
@GodTamIt GodTamIt requested a review from adamreichold July 10, 2023 21:34
@wallies
Copy link
Collaborator

wallies commented Jul 20, 2023

@GodTamIt do you want to rebase again now that the 0.20.1 PR has been merged

@GodTamIt GodTamIt force-pushed the godtamit-value-type-upstream branch from 432ce44 to a3abbee Compare July 21, 2023 12:36
@GodTamIt GodTamIt requested a review from adamreichold July 21, 2023 13:12
@GodTamIt GodTamIt force-pushed the godtamit-value-type-upstream branch from 67337d2 to 5b8e97e Compare July 21, 2023 14:07
@GodTamIt
Copy link
Contributor Author

Thanks! This should be squash-merged

@cjrh cjrh merged commit b377f57 into quickwit-oss:master Jul 21, 2023
@GodTamIt GodTamIt deleted the godtamit-value-type-upstream branch July 21, 2023 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants