Hi all,
Since quite some time, I am trying to help with plugging CoackroachDB into Ecto. This database implements the postgres protocol so on theory, it's the same as integrating with Postgres.
Of course, the devil is in the detail but with the help of @fishcakez and the CockroachDB team, we have made some pretty good progress and right now it pretty much works.
We have to use my fork of Postgrex for the time being but that should be solved pretty soon and apart from not supporting jsonb and arrays types, I believe pretty much else is working fine.
The big remaining caveat however is that CockroachDB does not support savepoints and after discussing it a bit with them, it does not look like they will be able to provide that feature any time soon.
I am honestly not very clear in which scenario savepoints help beyond what we get by opening and rollbacking a transaction for each test but even if it's a little bit more limited than the current Sandbox, I would like to explore how to provide an alternate version that does not use savepoints.
Would there be an easy way to pass a flag to the current Sandbox to do so or does this warrant a brand new Sandbox implementation ?
Can someone familiar with the inner working of the sandbox provides some pointers as how feasible this would be ?
Thanks in avance !
@tlvenn if you don't support savepoints, then you are unable to test anything that uses transactions inside the sql sandbox. How big of a problem do you think this is?
I am not sure I follow what you are saying @josevalim. I am obviously missing something here but it looks like I gonna need a little bit more help, so please bear with me.
When a test run, the sandbox would begin a transaction (the outer transaction). Multi that are executed within that test will detect that a transaction has been started already and simply execute. If a transaction is requested by the service being tested, given that the sandbox is proxying the commands and an outer transaction already exist, it can noop on it.
At this stage, either all things work, the assertion are ok, the test is passing and therefore the sandbox rollbacks its outer transaction in some kind of teardown phase to leave the database in a clean state or an error is emitted at some point and the transaction is rollbacked just the same and consequently the test fails.
Nested transactions would require savepoints but otherwise I dont understand why we need them.
@tlvenn there issue is that, because the sql sandbox runs inside a transaction, the only way we can emulate transactions inside the sandbox is by using savepoints. So transactions during development does not require savepoints... but any transaction inside the sandbox requires savepoints in order to be emulated properly.
I get that but what is preventing another kind of sandbox/transaction manager/orchestrator, you name it, to start a transaction when a connection is checked out for each test and to ignore the begin command when the service being tested start its transaction ?
In the same manner if a rollback command is issued by the service, the manager can proxy it and rollback its own transaction. Basically the service transaction is delegated to/managed by the manager and the manager makes sure to rollback his transaction before putting the connection back to the pool.
I get that but what is preventing another kind of sandbox/transaction manager/orchestrator, you name it, to start a transaction when a connection is checked out for each test and to ignore the begin command when the service being tested start its transaction ?
In the same manner if a rollback command is issued by the service, the manager can proxy it and rollback its own transaction. Basically the service transaction is delegated to/managed by the manager.
Does CockroachDB support transactions inside transactions?
It does not @OvermindDL1 but I never start a transaction within an existing transaction in what I am explaining above.
It does not @OvermindDL1 but I never start a transaction within an existing transaction in what I am explaining above.
But that is what snapshots emulate for testing. When testing you need to be able to roll the database back to how it was before testing was ever started, even if the tests themselves spawn transactions (which many/most do), and you need some way to roll back to before that point to how it was before the tests started.
I get that but what is preventing another kind of sandbox/transaction manager/orchestrator, you name it, to start a transaction when a connection is checked out for each test and to ignore the begin command when the service being tested start its transaction ?
If you do that, you change the semantics of the code. Imagine you are testing this code:
Repo.insert!(%Post{})
Repo.transaction(fn ->
# ...
Repo.rollback!(:done)
end)
We need to do undo whatever is done inside Repo.transaction but still keep the results of Repo.insert!. For that, you need savepoints or nested transactions.
Yes Jose you are correct but this is an edge case, like nested transactions are. And that is something the sandbox/manager can detect easily and raise upon.
In most cases, mutations will be wrapped by a single transaction in your service and in that case, I dont see why what I am saying cant work. It's obvious that the current sql sandbox is more powerful and can cover a lot more grounds but that is not what i am debating here.
I clearly stated that I had an understanding that if it was feasible it was likely to be more limited than the current solution but I do believe it would provide most of the value for people who want to use CockroachDB with Elixir / Ecto.
Yes Jose you are correct but this is an edge case,
It is not an edge case. Pretty much any test that invokes something that uses a transaction, such as Ecto.Multi or a changeset with associations, will require nested transactions for testing because the test itself is usually preceeded with data insertion and followed by data assertions.
Anyway, if you really want to implement this, you can keep track of this in the database connection process. Every time a transaction starts, you need to keep track of the operations performed. If another transaction is started and you have operations before or after it, then you need to say it is not supported.
Again, I think it will happen more than you think, but you understand the trade-offs so you should definitely give it a try it if you are willing to.
It is not an edge case. Pretty much any test that invokes something that uses a transaction, such as Ecto.Multi or a changeset with associations, will require nested transactions for testing because the test itself is usually preceeded with data insertion and followed by data assertions.
It's an edge case because unless you are trying to recover on the rollback which does not happen so often, I dont see the issue. If you rollback you usually want to test that an error has been emitted and it does not depend on the data insertion and if there is no rollback, there is no issue with data assertions given the transaction is only rollbacked by the manager once the test is finished.
And if you wanted to get fancy, I guess it would not be so hard to keep those commands that represents the data insertion in an ETS table and to replay them in a new transaction once your code issue an explicit rollback.
In any case, thanks for the pointers Jose, i guess I will give it a try soon.
Keep in mind that, because anyone can rescue a rollback, you will really need to track it inside the connection. You probably don't need an ETS table though, the connection process state should suffice.
Shall we close this then? :)
You probably don't need an ETS table though, the connection process state should suffice.
Ya true
Shall we close this then? :)
Sure, thanks for the constructive feedbacks.
Just for reference if someone find this topic and is interested by the follow up, I have published my sandbox at https://github.com/jumpn/cockroachdb_sandbox
It implements a log replay strategy to emulate implicit savepoints created by the original sandbox.
I had to override the existing sandbox because even though on the surface it looks pluggable, it is unfortunately not.
@tlvenn can you please provide a PR that makes the sandbox configurable in Ecto? Overriding the SQL sandbox is going to be an issue if someone wants to use both cockroach and postgres at the same time. Thank you!
Hi @josevalim , technically there is nothing specific about my sandbox implementation that would prevent it for working on Postgresql, I actually tested that it's working on both. Now that being said, my sandbox has a dependency of my postgrex fork which only works with CockroachDB , so that could definitely lead to some issues should someone try to use them both.
I will see if I can make it so my Postgrex fork could actually work with Postgresql just fine but in any case, I totally agree that override Ecto Sandbox is not desirable at all.
I am happy to try to come up with a PR to remove hardcoded dependencies on the Ecto Sandbox implementation, i will need some guidance though on exactly how to do that.
Code that will need to be updated are things like that:
sandbox? = repo.config[:pool] == Ecto.Adapters.SQL.Sandbox
I was thinking that we could introduce a behaviour for the Sandbox but I am not exactly sure how it will help test if a given module implements that behaviour ? What is the pattern in Elixir to solve this case of use case ? It does not look like I could use Behaviour as some kind of marker interface.
Thanks !
The best way is to avoid the sandbox? checks and introduce specific functions. So the whole code:
sandbox? = repo.config[:pool] == Ecto.Adapters.SQL.Sandbox
if sandbox? do
Ecto.Adapters.SQL.Sandbox.checkin(repo)
Ecto.Adapters.SQL.Sandbox.checkout(repo, sandbox: false)
end
migrated =
try do
migrator.(repo, migrations_path(repo), :up, opts)
after
sandbox? && Ecto.Adapters.SQL.Sandbox.checkin(repo)
end
could be rewritten to:
pool = repo.config[:pool]
if function_exported?(pool, :unboxed_run, 1) do
pool.unboxed_run(fn -> migrator.(repo, migrations_path(repo), :up, opts) end)
else
migrator.(repo, migrations_path(repo), :up, opts)
end
This way the whole unboxed_run logic can be specific per sandbox.
Ha neat, I didnt know about Kernel.function_exported?
Oky, got all the pieces I need I think now, thanks @josevalim , I will submit a PR pretty soon.
Most helpful comment
The best way is to avoid the
sandbox?checks and introduce specific functions. So the whole code:could be rewritten to:
This way the whole
unboxed_runlogic can be specific per sandbox.