Add schema validation to PyDict -> Document #88

GodTamIt · 2023-07-08T23:13:18Z

This change addresses the issue mentioned here: #47

Document.extend() and Document.from_dict() support an optional schema field that, when provided, validates the provided dictionary against the schema.

This additionally fixes the issue where all numeric values are first parsed as I64 and upon failure, parsed as F64. This can cause problems for fields that are unsigned (U64) but are parsed as I64 instead, or fields that are float (F64) but is parsed as I64. Now, when a schema is provided, values will be parsed according to the schema's field specification.

src/document.rs

adamreichold · 2023-07-09T10:00:02Z

src/document.rs

+
+    Ok(value)
+}
+
 fn extract_value_single_or_list(any: &PyAny) -> PyResult<Vec<Value>> {
    if let Ok(values) = any.downcast::<PyList>() {


Not related to this PR, but I think this would benefit from using .extract::<Vec<Value>>() or at least .extract::<Vec<&PyAny>>() instead of .downcast::<PyList>() to handle any sequence and not just lists.

Would this needlessly create a temporary Vec? I'm transforming/mapping the elements inside, so maybe it's not worth collecting until the very end?

Would this needlessly create a temporary Vec? I'm transforming/mapping the elements inside, so maybe it's not worth collecting until the very end?

Yes, that would be the cost for supporting arbitrary sequences (like 1d NumPy arrays) instead of just lists. It is a trade-off and since it isn't really part of this PR it is probably best left to do elsewhere if done at all.

(Alternatively, this could use any.iter() to just iterate of the sequence, but .iter() also works for e.g. strings which would need to be checked before doing this.)

src/document.rs

wallies · 2023-07-20T23:53:40Z

@GodTamIt do you want to rebase again now that the 0.20.1 PR has been merged

src/document.rs

src/schemabuilder.rs

GodTamIt · 2023-07-21T15:08:32Z

Thanks! This should be squash-merged

adamreichold reviewed Jul 9, 2023

View reviewed changes

src/document.rs Show resolved Hide resolved

adamreichold reviewed Jul 9, 2023

View reviewed changes

src/document.rs Outdated Show resolved Hide resolved

GodTamIt force-pushed the godtamit-value-type-upstream branch 2 times, most recently from 27caa50 to 432ce44 Compare July 10, 2023 21:33

GodTamIt requested a review from adamreichold July 10, 2023 21:34

adamreichold reviewed Jul 21, 2023

View reviewed changes

src/document.rs Outdated Show resolved Hide resolved

adamreichold reviewed Jul 21, 2023

View reviewed changes

src/document.rs Show resolved Hide resolved

adamreichold reviewed Jul 21, 2023

View reviewed changes

src/schemabuilder.rs Outdated Show resolved Hide resolved

Add schema validation to PyDict -> Document

a3abbee

GodTamIt force-pushed the godtamit-value-type-upstream branch from 432ce44 to a3abbee Compare July 21, 2023 12:36

Address comments

261f690

GodTamIt requested a review from adamreichold July 21, 2023 13:12

Add documentation about new functionality

5b8e97e

GodTamIt force-pushed the godtamit-value-type-upstream branch from 67337d2 to 5b8e97e Compare July 21, 2023 14:07

adamreichold approved these changes Jul 21, 2023

View reviewed changes

cjrh merged commit b377f57 into quickwit-oss:master Jul 21, 2023

GodTamIt mentioned this pull request Jul 21, 2023

Document.from_dict doesn't has type info. #47

Closed

GodTamIt deleted the godtamit-value-type-upstream branch July 21, 2023 22:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add schema validation to PyDict -> Document #88

Add schema validation to PyDict -> Document #88

Uh oh!

GodTamIt commented Jul 8, 2023

Uh oh!

Uh oh!

adamreichold Jul 9, 2023

Uh oh!

GodTamIt Jul 10, 2023

Uh oh!

adamreichold Jul 11, 2023

Uh oh!

Uh oh!

wallies commented Jul 20, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GodTamIt commented Jul 21, 2023

Uh oh!

Uh oh!

Add schema validation to PyDict -> Document #88

Add schema validation to PyDict -> Document #88

Uh oh!

Conversation

GodTamIt commented Jul 8, 2023

Uh oh!

Uh oh!

adamreichold Jul 9, 2023

Choose a reason for hiding this comment

Uh oh!

GodTamIt Jul 10, 2023

Choose a reason for hiding this comment

Uh oh!

adamreichold Jul 11, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wallies commented Jul 20, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GodTamIt commented Jul 21, 2023

Uh oh!

Uh oh!