Skip to content

Efficient bulk imports via pq.CopyIn #218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kyleconroy opened this issue Dec 25, 2019 · 11 comments
Closed

Efficient bulk imports via pq.CopyIn #218

kyleconroy opened this issue Dec 25, 2019 · 11 comments
Labels
📚 postgresql enhancement New feature or request

Comments

@kyleconroy
Copy link
Collaborator

lib/pq has support for efficiently inserting bulk items using COPY. sqlc currently can not generate code that exposes this functionality.

See this StackOverflow answer for a great overview of the problem space.

@dharmjit
Copy link

Hi @kyleconroy, Any update on this.

@kyleconroy
Copy link
Collaborator Author

Nope, no update as of yet. It's low on my list, as I'm focusing on MySQL / SQLite support right now.

@fr3fou
Copy link

fr3fou commented Apr 12, 2021

Hi! Any updates on this?

@aitva
Copy link
Contributor

aitva commented Apr 29, 2021

I found a way to do bulk insert and update with sqlc and Postgres:

-- name: CreateAuthors :exec
INSERT INTO authors
SELECT unnest(@ids::bigint[]) AS id,
  unnest(@names::text[]) AS name,
  unnest(@bios::text[]) AS bio;

-- name: UpdateAuthors :exec
UPDATE authors AS a
SET name = tmp.name, bio = tmp.bio
FROM (
  SELECT unnest(@ids::bigint[]) AS id,
  unnest(@names::text[]) AS name,
  unnest(@bios::text[]) AS bio
) AS tmp
WHERE a.id = tmp.id;

It uses arrays to serialize multiple rows into 1 parameter and unnest to deserialize them in Postgres. You can see the generated code in the playground.

It felt a bit hacky, so I made a benchmark to ensure that it is working as expected:

BenchmarkCreateAuthor-6              100          12656602 ns/op
BenchmarkCreateAuthors10-6           102          13837750 ns/op
BenchmarkCreateAuthors100-6           74          14967301 ns/op
BenchmarkCreateAuthors1000-6          30          38486604 ns/op
BenchmarkUpdateAuthor-6              100          11739085 ns/op
BenchmarkUpdateAuthors10-6            94          13677888 ns/op
BenchmarkUpdateAuthors100-6           66          15819271 ns/op
BenchmarkUpdateAuthors1000-6          28          56070949 ns/op

BenchmarkCreateAuthor and BenchmarkUpdateAuthor measure the time it takes to insert or update an author. The BenchmarkCreateAuthorsN and BenchmarkUpdateAuthorsN uses the above code to group N authors into a query.

By looking at BenchmarkCreateAuthor and BenchmarkCreateAuthors1000, we can see that it takes the same amount of time to create 100 authors with an insert and 30000 authors with the above method.

If you want to try it, you can find the code of the benchmark here.

@kyleconroy
Copy link
Collaborator Author

Now that pgx support has landed, we'll also need support their Copy Protocol.

@nd291195
Copy link

nd291195 commented Nov 4, 2021

@aitva Does this solution also work for fields which are Nullable? Because right now when I am creating my inserts this way I get []int64 for ::bigInt[] but I need to have []sql.NullInt64 because this field is nullable? Any ideas on how this could work?

@Jille
Copy link
Contributor

Jille commented May 9, 2022

https://docs.sqlc.dev/en/latest/howto/insert.html#using-copyfrom has landed, so I think we can close this.

@ovadbar
Copy link
Contributor

ovadbar commented Dec 6, 2022

@aitva We have been using the json_populate_recordset function in order to do bulk inserts which does allow for null values. There are other json functions you may want to use like json_to_record. Unfortunately the models generated by sqlc do not work well with these methods as instead of using NullInt64 you would want int64 with omit empty

@giulianopz
Copy link

giulianopz commented Mar 11, 2024

@Jille @kyleconroy, please consider reopening this issue to support the COPY instruction for bulk inserts with lib/pq, as originally mentioned above.
The documentation says:

The :copyfrom command requires either pgx/v4 or pgx/v5.

@Jille
Copy link
Contributor

Jille commented Mar 11, 2024

Let's make that a new feature request. Would you mind coming that?

@giulianopz
Copy link

Let's make that a new feature request. Would you mind coming that?

Sure, I've just opened a new issue for it: #3264.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📚 postgresql enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants