Skip to content

JSON Support #590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 17, 2022
Merged

JSON Support #590

merged 14 commits into from
Jun 17, 2022

Conversation

gingerwizard
Copy link
Collaborator

@gingerwizard gingerwizard commented May 25, 2022

Relies on ClickHouse/ClickHouse#37482

This support comes in a few steps:

Insert time

  • Insert structs as JSON.
  • Insert map[string]interface{} as JSON or as a subfield of a struct.
  • Insert JSON as string. Forces inference to the server.
  • Insert typed maps

Query time (Users left to serialize to string here)

  • Response into struct.
  • Response to map[string]interface{} or as sub-field of struct.
  • Response to typed map

Maps and structs can be interleaved as required.

Changes are significant as we need names with the sub-columns at query and insert time.

cc @CurtizJ

@gingerwizard
Copy link
Collaborator Author

This is now ready for review.

We support all of the above-noted functionality. Some known limitations:

  • We don't support inserting structs and maps which have pointers inside them. Whilst easy to add it complicates the code a little. I'd like to refactor and get more feedback before we support.
  • if inserting structs or maps, types must be consistent within a batch. Outside of batch we obviously can't check and defer to CH.
  • Within a batch, we don't allow mixed formats e.g. strings with maps/structs. Maps and structs are equivalent (i.e. they are marshaled to tuples so can be mixed).
  • []interface{} must be of a consistent type at insert time - we don't downcast types or coerce types like inside CH - mainly as we don't know the context outside of the batch. For example, strings and ints cant be mixed inside a []interface{}. If users need flexible type conversion, insert as a string. In future we may do more coercion client-side within the batch but this flexible typing is more of a wider client issue.
  • Dimensions and types within an []interface{} must be the same e.g. an interface{} slice with a struct and list is not supported. This is the same behaviour as CH where lists of objects with different dimensions are not supported.
  • At query time we do best-effort filling structs passed. Any data for which there is no field is currently ignored.

This feature is experimental and requires 22.6.1 of CH

cc @genzgd

@ernado
Copy link
Collaborator

ernado commented Jun 9, 2022

Why not something more generic, like

type Named struct {
   Name string
   Column Interface
}

Or I'm missing something important?

@gingerwizard
Copy link
Collaborator Author

gingerwizard commented Jun 9, 2022

@ernado, do you mean to wrap each column here in the named struct? seems possible but the JSON column is like any other column - when AppendRow is called its called on the Column itself. We would need either an exception here to access the name or move the AppendRow to the Named struct. This seemed a bigger change but I'm open to the idea.

We need the name of the columns (unlike other formats) as this information needs to be encoded in the data - since JSON is encoded as tuples.

We also need to name of the columns at scan time - this is done in the tuple as this is the format its returned in. Ultimately although this led to a bit of replication of the name property I maybe thought it would have less impact...might be wrong.

Copy link
Contributor

@genzgd genzgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing amount of work. At some point it probably needs another pass to reduce some duplication but LGTM for now

@gingerwizard gingerwizard merged commit 65eec68 into v2 Jun 17, 2022
@ernado
Copy link
Collaborator

ernado commented Jun 17, 2022

Great work!

@gingerwizard gingerwizard deleted the json_v2 branch June 20, 2022 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New type Object('JSON') is not supported
3 participants