-
-
Notifications
You must be signed in to change notification settings - Fork 328
Description
Specification section
?
What is unclear?
Please help us.
Pydantic v2 started converting Python's Optional[str]
type to {"anyOf":[{"type":"string"}, {"type":"null"}]}
Json Schema instead of an optional string property.This breaks many existing tools that use JsonSchemas, but the maintainer claims that JsonSchema is designed this way. pydantic/pydantic#7161
Please help us get clarity whether this is really what Json Schema spec design intends.
I want to ask whether this is indeed the intention of JsonSchema design and if it's not the case, then hopefully the maintainers can be persuaded to restore the previous behavior.
Problem background:
Javascript has null
and undefined
types.
Python has None
singleton type. It's automatically used in some cases. For example, when function does not return anything, the actual returned value is None
.
Let's look at this simple JsonSchema that has an optional field:
{
"title": "Something",
"type": "object",
"properties": {
"requiredProp": {"type": "string"},
"optionalProp": {"type": "string"},
"required": [ "requiredProp"]
}
Now let's try to represent such schema using Python:
class Something:
requiredProp: str
optionalProp: Optional[str]
For this type, Pydantic v2 produces the following JsonSchema:
{
"title": "Something",
"type": "object",
"properties": {
"requiredProp": {
"title": "Requiredprop",
"type": "string"
},
"optionalProp": {
"title": "Optionalprop",
"anyOf": [
{"type": "string"},
{"type": "null"}
]
}
},
"required": ["requiredProp", "optionalProp"]
}
Notice that the "optionalProp" is required and it's type declaration is {"anyOf":[{"type":"string"}, {"type":"null"}]}
.
And if we slightly change the class to add the default value:
class Something:
requiredProp: str
optionalProp: Optional[str] = None
some_obj = Something(requiredProp="foo")
The generated schema becomes
{
"title": "Something",
"type": "object",
"properties": {
"requiredProp": {
"title": "Requiredprop",
"type": "string"
},
"optionalProp": {
"title": "Optionalprop",
"anyOf": [
{"type": "string"},
{"type": "null"}
]
}
},
"required": ["requiredProp"]
}
The optionalProp
type declaration still remains {"anyOf":[{"type":"string"}, {"type":"null"}]}
.
So it's not possible to generate a normal optional string property.
Is it the intention of JsonSchema that programming languages that do not have the undefined
/null
duality of Javascript cannot adhere to simple JSON schemas with simple optional properties?
Would it be OK to treat Python's None
as Javascript's undefined
in cases of optional function/constructor parameters or are these types considered to be fundamentally different?
Proposal
I propose to clarify that in non-JS languages optional properties with the default None
/NULL
/nil
value can be treated as Javascript's undefined
and can be described using JsonSchema's optional property mechanism.
Do you think this work might require an [Architectural Decision Record (ADR)]? (significant or noteworthy)
No
Activity
gregsdennis commentedon Feb 21, 2025
I think @Julian is probably the best person to comment on Python-specific things.
I do have a question: would you consider this to be valid data?
Specifically, is a null value interpreted by your code the same as the property just being absent?
I'd guess that the Pydantic folks might think it is valid if the property is optional (null and absence are the same), whereas maybe you don't.
As far as JSON Schema is concerned, a property with a null value is distinct from the absence of that property. This is the design intent of JSON Schema.
Julian commentedon Feb 21, 2025
(The JSON Schema spec doesn't cover schema generation from a language's types, so a bit of this discussion will always be groundless. But nevertheless, yes, opinions below.)
What you're asking is mostly about "shortcomings" in the typing annotation system in Python really more than anything else I think. And I put "shortcomings" in quotes here because the case where this matters -- at least when it comes to classes -- is one I would call a bad idea in Python, so I don't personally cry too hard about it not being possible.
Optional[str]
in Python, as it seems has been pointed out in the ticket there, is simply shorthand forstr | None
.There is no way to express the concept of "might not exist" as part of a normal class. E.g. for your example:
I disagree that even this expresses the JSON / JSON Schema notion of "optionalProp may not be present". That notion is expressed by
typing.NotRequired
in the case of dicts, and in the case of classes it.. does not exist (and above I called it a bad idea, I think it is for any use case other than using class syntax to generate schemas).Specifically, for classes it really would look like:
but there's no shorthand for that, and clearly it's untenable for multiple such properties -- and again I think for normal Python classes it's ridiculous to design one which sometimes doesn't have an attribute (but this is the direct parallel to dicts not having a key).
I short I'd disagree with that both from a JSON Schema perspective and from a Python developer's perspective, though one not really familiar with Pydantic's norms.
I'm not saying this solves the upstream problem, just that "treat None specially" seems very wrong. To me my first guess would be an annotation a la
NotRequired
for non-TypedDict
s is the right shape of solution.jdesrosiers commentedon Feb 21, 2025
As Julian said, the spec doesn't cover how schemas map to a language's type system, but I can share my opinion.
First, a couple things the keep in mind. Remember that JSON Schema describes JSON, not JavaScript and
undefined
is not a feature of JSON. The absence of a value is effectively the same concept, but you can't assign something to be undefined ({ "foo": undefined }
) like you can in JavaScript. Also, as Greg pointed out,null
isn't the same as undefined. In JSONnull
is a value, not an indicator of the absence of a value as it is in most languages. In the same way thatboolean
is a type with two possible values (true
andfalse
),null
is a type with one possible value (null
). When a JSON Schema says a property isnull
, it means it must preset with the valuenull
.IMO, that makes JSON's
null
a JSON-specific concept that should be avoided unless you're specifically trying model JSON that hasnull
s and you definitely shouldn't equate it to common concepts ofnull
/nil
/None
/etc. Since there's no concept in Python that translates to JSON's concept ofnull
, I wouldn't expect it to ever generate schemas that usenull
. I think it makes the most sense to equate Python'sNone
with the absence of a value in JSON.So,
Something(requiredProp="foo", optionalProp=None)
should be considered equivalent to{ "requiredProp": "foo" }
.I don't think using a
None
default value should make any difference to the generated schema. The instance created frominstance1 = Something(requiredProp="foo", optionalProp=None)
andinstance2 = Something(requiredProp="foo")
are indistinguishable. Bothinstance1.optionalProp
andinstance2.optionalProp
haveNone
. Therefore the JSON representation should be the same as well.This approach also has the benefit of resulting in simpler and more idiomatic JSON Schemas.
Again, there's no official correct or incorrect way to do this. This is just my recommendation.
Julian commentedon Feb 21, 2025
(Responding again just in case you didn't know the below Jason, but if you did and still think your way obviously all fine to disagree:
Python's
None
serializes asnull
, and it's very common to haveNone
wherever you'd like in Python as a real value, so the equivalence is there already / I think that ship has long sailed, which is why I disagreed (strongly) with:jdesrosiers commentedon Feb 21, 2025
Thanks for the correction Julian! It's been a while since I've written Python and didn't remember that correctly. That means that Python's
None
is equivalent to JSON'snull
. In that case, generating schemas from Python using JSONnull
is logically sound.However, JSON that uses
null
to represent absent values is not idiomatic JSON and makes schemas unnecessarily complex, awkward, and renders some JSON Schema keywords unusable. That's the problem that originally motivated this question. So, I still think it would be best to equateNone
with not-present even thoughnull
isn't technically wrong. I recognize that that could cause some friction in the Python ecosystem that serializesnull
s by default. If it's not too hard to get around that, you could make the lives of the users of your JSON and JSON Schemas much easier.gregsdennis commentedon Feb 28, 2025
@Ark-kun does the above answer your questions?