Ecto: Introduce insert_all

Created on 23 Dec 2015 · 11Comments · Source: elixir-ecto/ecto

insert_all will allow multiple values to be inserted at once:

@spec insert_all(source :: Ecto.Schema.t | binary, values :: [map | Keyword.t], opts :: Keyword.t) :: {count, list | nil}

If a binary is given, values will be inserted as is in the table. If a schema was given, values will be properly dumped before insertion. In both cases, missing values in the table/schema won't be filled automatically.

For example, imagine the following schema:

schema "posts" do
  field :a
  field :b
  field :c
end

The following insert:

Repo.insert_all Post, [%{a: 1}, %{b: 2}]

will generate the following SQL

INSERT INTO "posts" (a, b) VALUES (1, DEFAULT) (DEFAULT, 2)

notice how the field c was ignored in both cases.

This feature is required to implement many to many support (#1169).

Prefixes

insert_all should also allow the prefix syntax: {prefix, source}

Return

The function will return a tuple the number of entries added, similar to update_all and delete_all. The second element of the tuple is nil unless the :returning option is given, as in #460.

Advanced

Source

josevalim

Most helpful comment

Yes, but that comes with significant performance tradeoffs. It's also an all or nothing proposition right now.

I understand that insert_all is not meant to work within the ecosystem of schemas, that's fine. But there's no clear path to getting my data out of the schema ecosystem and into a form that is compatible with insert_all. Right now I've got:

  {:ok, current_time} = Ecto.DateTime.cast(:erlang.timestamp |> :calendar.now_to_datetime)
  models = models
      |> Enum.map(&Map.from_struct/1)
      |> Enum.map(&Map.drop(&1, [:__meta__, :id]))
      |> Enum.map(&Map.merge(&1, %{inserted_at: current_time, updated_at: current_time}))

Which does the job but is a very specific solution that requires me to know all sorts of implementation details like :__meta__ being a field in the struct (and to remove :id because my table is configured to autogenerate on insert).

What I'm looking for might even be a simple Ecto.Schema.to_source/1 or something like that, but there should be some guidance on "How do I convert an Ecto.Schema struct to a source for insert_all?"

hamiltop on 21 Feb 2016

👍2

All 11 comments

tslater on 24 Dec 2015

I have added basic support for insert_all, passing only sources, no schemas. If we want to support schemas, we would go into deeper discussions like: should we dump fields? What about autogenerate? And we can side-step this discussion for now until we have more use-cases.

josevalim on 26 Dec 2015

Is there a way to do all those things from the caller?

Given a list of changesets, how do I use insert_all? I would think that something along the lines of:

  |> autogenerate_fields
  |> to_source

Would be extremely useful. I think sidestepping default behavior is good, but providing some "blessed paths" to explicitly including that behavior would be very useful.

This might already exist and I just haven't found it.

hamiltop on 21 Feb 2016

Enum.map(changesets, &Repo.insert/1)

If you have changesets, we have constraints, associations, embeds, prepare_changes hooks and a whole bunch of things that may run. It does not make sense to support them in insert_all exactly because insert may perform multiple operations behind the scenes.

josevalim on 21 Feb 2016

👎1 👍1

Yes, but that comes with significant performance tradeoffs. It's also an all or nothing proposition right now.

  {:ok, current_time} = Ecto.DateTime.cast(:erlang.timestamp |> :calendar.now_to_datetime)
  models = models
      |> Enum.map(&Map.from_struct/1)
      |> Enum.map(&Map.drop(&1, [:__meta__, :id]))
      |> Enum.map(&Map.merge(&1, %{inserted_at: current_time, updated_at: current_time}))

What I'm looking for might even be a simple Ecto.Schema.to_source/1 or something like that, but there should be some guidance on "How do I convert an Ecto.Schema struct to a source for insert_all?"

hamiltop on 21 Feb 2016

👍2

I cannot think of anything shorter than this:

fields = Thing.__schema__(:fields) -- [:id]
extra = %{inserted_at: Ecto.DateTime.utc, updated_at: Ecto.DateTime.utc}
structs = Enum.map structs, & &1 |> Map.take(fields) |> Map.merge(extra)
Repo.insert_all Thing, structs

In one way or the other you will have to ignore some fields and add your own and I don't think we could have a single function that would cleanly automate that.

josevalim on 21 Feb 2016

Yeah. You're probably right. A silver bullet function isn't really doable.

If I wasn't clear, this isn't so much about feasibility, but about avoiding confusion. The first thing I did when trying insert_all was to try inserting a bunch of changesets. When that didn't work, I went to the documentation and saw they weren't supported but there wasn't a clear direction to go. That's the bug here.

Perhaps documenting something like what you have above as an example would solve that. Maybe just a simple source = Map.take(model, Model.__schema__(:fields)) and a note about how depending on the schema you may need to handle the auto generated ID and the timestamps (perhaps an example of adding timestamps? Somehow I missed utc/0.)

It's entirely possible a different direction is going to be better. I just wanted to bring up some friction I encountered.

hamiltop on 21 Feb 2016

I will keep an eye open then. The reason I don't have to have an "official" from changeset to data for insert_all is because it will lead people to use insert_all only for performance without fully understand the drawbacks it brings compared to insert. To me, the case for insert_all is exactly when we are not working with structured data at all.

josevalim on 21 Feb 2016

I'm not sure I understand the drawbacks. Or at least, the ones I see are
purely due to Ecto's architecture. Ignoring associations (that is a can of
worms for sure), wouldn't a theoretical API like:

|> validate_all
|> prepare_all
|> autogenerate_all
|> insert_all

Wouldn't that provide all the functionality of Enum.map(models, &Repo.insert/1)? The only downside I see is the fact that everything
succeeds and fails together (one bad piece of data breaks everything).

validate_all and prepare_all are actually pretty easy to build.
autogenerate_all might be adapter dependent which makes it trickier.

Again, ignoring associations, is there anything I'm missing?

On Sun, Feb 21, 2016, 1:22 PM José Valim [email protected] wrote:

Closed #1167 https://github.com/elixir-lang/ecto/issues/1167.

—
Reply to this email directly or view it on GitHub
https://github.com/elixir-lang/ecto/issues/1167#event-558664360.

hamiltop on 21 Feb 2016

If you take the hardest part (associations and embeds), sure, the rest is easier. :) prepare_all also means you need to wrap your code inside transactions. At least as it is in changesets today.

josevalim on 21 Feb 2016

Ah, prepare is meant to be part of the transaction? OK, that's hard to get
around. I concede. :)

I'm working on some bulk insert performance tuning, so I'll probably have
more feedback.

On Sun, Feb 21, 2016, 1:40 PM José Valim [email protected] wrote:

If you take the hardest part (associations and embeds), sure, the rest is
easier. :) prepare_all also means you need to wrap your code inside
transactions. At least as it is in changesets today.

—
Reply to this email directly or view it on GitHub
https://github.com/elixir-lang/ecto/issues/1167#issuecomment-186923600.

hamiltop on 21 Feb 2016

Was this page helpful?

0 / 5 - 0 ratings