Ecto: Upsert

Created on 30 Jun 2016  路  4Comments  路  Source: elixir-ecto/ecto

There was previous discussion on adding the UPSERT feature to Ecto in issue #652 and PR #746 by @gjaldon. I'm not sure if this the best place to revisit the topic, but I figured I'd open this issue to leave my notes and start a discussion. I can move this to the mailing list if it's more appropriate there.

Adapters

UPSERT is handled differently (or not at all) by different databases. The PostgreSQL wiki has a brief overview of how UPSERTs are handled in other systems.

PostgreSQL

The ON CONFLICT syntax was added in 9.5+. It doesn't look like there is an easy way to emulate this behavior pre 9.5, so possibly raise an error if not available.

There are two options when a unique constraint is violated: DO NOTHING or DO UPDATE.

INSERT INTO table_name ( column_name [, ...] ) VALUES ( values [, ...] )

  ON CONFLICT [ conflict_target ] DO NOTHING

-- or

  ON CONFLICT conflict_target DO UPDATE SET ...

Conflict Target

PostgreSQL requires the conflict_target be set when performing an update. The conflict_target is used to determine which unique constraint to check when identifying a potential row to update. In addition to checking for columns that belong to a unique index, it also allows index expressions and predicates.

It seems that the simple case for an upsert would be to specify a list of columns to check, but there will also be times where you may need to specify an index expression or predicate as well.

conflict_target: [:col_a, :col_b]
conflict_target: "(lower(col_a), col_b) WHERE col_c = false"

Conflict Action

Once a candidate row is found on conflict, it can either be ignored or updated. The update statement seems to allow for arbitrary fields to be modified, but it might be safe to assume we can just update the columns with the supplied values. The new values are available in the EXCLUDED table.

ON CONFLICT (col_a) DO UPDATE SET col_b = EXCLUDED.col_b, col_c = EXCLUDED.col_c

I think we'd have to specify the list of columns to update.

conflict_target: [:col_a], columns: [:col_b, :col_c]

There is also an optional [ WHERE condition ] which will only update conflicting rows that pass this expression. It may be safe to assume that the user wants to update every row that is in conflict.

MySQL

MySQL has the ON DUPLICATE KEY UPDATE syntax for UPSERTS.

The primary difference between MySQL and PostgreSQL is that you don't specify the conflict_target for MySQL. It will be inferred automatically, although the docs warn about potential issues when there may be multiple unique constraints that match.

Similarly to PostgreSQL, MySQL can run an arbitrary UPDATE statement on conflict. If we limit to just updating the specified columns, We can access the new values with the VALUES() function.

INSERT INTO table (col_a, col_b, col_c) VALUES (1,2,3)
  ON DUPLICATE KEY UPDATE col_b = VALUES(col_b), col_c = VALUES(col_c);

SQLite

It doesn't look like SQLite supports ON CONFLICT ... UPDATE, but it does support the REPLACE option. This deletes and replaces the conflicting row. I don't think this is a suitable replacement for UPSERT based on some issues raised in this Stack Overflow answer.

MS-SQL

TODO

Mongo

TODO

Interface

One example of an interface where UPSERT is available is the activerecord-import gem (for Ruby).

When performing an UPSERT, it looks like there are at least three options to specify:

  • :on_conflict - either :update or :ignore.
  • :conflict_target - which columns to check for conflict (only for PostgreSQL). Can specify either a list of atoms for column names or a string to handle arbitrary index expressions.
  • :update_columns - a list of atoms that specifies which columns to update on conflict

I'm relatively new to Elixir and Ecto, so I'm not sure what would be an idiomatic way to organize these settings.

Repo.insert(%User{username: "bloblaw", fullname: "Bob Loblaw"},
  on_conflict: :update,
  conflict_target: [:username],
  update_columns: [:fullname])

Test Cases

To add this feature to Ecto, what functions need to be modified and what scenarios should be tested?

For Repo.insert and Repo.insert!,

  • When no conflict found, on_conflict: :update inserts a new record
  • When no conflict found, on_conflict: :ignore inserts a new record
  • When there is a conflict, on_conflict: :ignore does not insert or update any rows
  • When there is a conflict, on_conflict: :update updates only the columns specified

For Repo.insert_all,

  • When one of two rows to be inserted is in conflict, on_conflict: :ignore only inserts one row
  • When one of two rows to be inserted is in conflict, on_conflict: :update inserts one row and updates the other

Should Repo.insert_or_update be modified?

Should the Repo.update function include an option to insert if doesn't exist? I can't think of a scenario where this would happen, and the implementation would generate an INSERT statement anyways. @ericmj mentions this in a past comment.

Advanced

Most helpful comment

I'd really love to see proper upserts in Ecto.

All 4 comments

This is an excellent summary. Thank you @atsheehan.

I would prefer this to be a separate function from insert - the logic would be quite different, and I don't think a lot of code (on the Repo level) would end up being shared.

I think we could determine a lot of the required options for doing proper upserts from changesets. Every insert changeset will copy all the fields from the schema into the changes to insert everything. With upsert we could do something different - generate an insert from the changest + values in the schema (exactly as insert does it), but generate the action based on the changes from changeset.

In a similar way the conflict targets could be inferred from the unique_constraint values.

I would imagine an interface like:

Repo.upsert(changeset, update: :changes)
Repo.upsert(changeset) # exactly as above
Repo.upsert(changeset, update: [:foo, :bar, :baz])
Repo.upsert(changeset, :ignore)

To be honest, I'm not sure we even need the :ignore option for single inserts - it would end up being ignored anyway. The real power of the ON CONFLICT IGNORE shows up when you insert multiple documents and any of the could fail - but you don't want the whole thing to fail. So I think we should also explore doing upsert in terms of insert_all.

I find the Repo.insert_or_update function quite unfortunate, and I never was a big fan. I can see it's uses, but I don't think it should be provided out-of-the box. It's not going away anytime soon, nor I think we should change it to use upsert-type facilities, because of backwards compatibility.

Thanks for the feedback @michalmuskala.

I'm not sure if the conflict_target will always be able to be inferred, but I think if we use the following rules it should handle most cases:

  • if primary key has been set, use primary key as conflict_target
  • else, use first unique constraint as conflict_target

The edge case would be when there are multiple unique constraints that could conflict, PostgreSQL requires us to choose the right one. MySQL will automatically choose one (and doesn't allow us to specify it), although that may not be the intended behavior.

I reviewed some of the code for how insert is handled, and it seems like there will be a few changes necessary to add an upsert function.

  • Add functions Ecto.Repo.upsert and Ecto.Repo.Schema.upsert.
  • Functions on Ecto.Changeset to infer the conflict target and action.
  • Functions in Ecto.Adapter, Ecto.Adapters.SQL to define the adapter API for generating upsert calls.
  • Functions in Ecto.Adapters.Postgres.Connection and Ecto.Adapters.MySQL.Connection to generate the SQL for an upsert.

Does this seem appropriate? I assume there will be overlap with the way inserts are handled and maybe some of these functions are not necessary. It might be easier to refactor once there is a rough implementation available though.

I don't know if I know enough to fully implement this, but I could get started on a proof of concept if that would be helpful.

I'd really love to see proper upserts in Ecto.

@atsheehan , thank you for your tests, they were awesome :+1: BTW, could you please review my PR?
We have postgres 9.4 now, but we need 9.5 to test upsert properly ...

Was this page helpful?
0 / 5 - 0 ratings