Timescaledb: UPSERTs not working correctly

Created on 25 Jun 2017  Â·  8Comments  Â·  Source: timescale/timescaledb

My system often inserts records that are duplicated (data with the same timestamp and same value in partitioning column). This leads to duplication in data (waste storage and require data cleaning after queries)

My database setup:

CREATE TABLE conditions (
  time        TIMESTAMPTZ       NOT NULL,
  location    TEXT              NOT NULL,
  temperature DOUBLE PRECISION  NULL,
  humidity    DOUBLE PRECISION  NULL
);
SELECT create_hypertable('conditions', 'time', 'location', 2);

When I insert 2 rows with same values:

INSERT INTO conditions(location, time, temperature, humidity)
  VALUES ('office', '2017-06-25 03:34:16+00', 70.0, 50.0);
INSERT INTO conditions(location, time, temperature, humidity)
  VALUES ('office', '2017-06-25 03:34:16+00', 70.0, 50.0);

=> The conditions table will have 2 records with exactly same data.

Is there any way to make timescaledb silently ignore the second insert? If not, can you please suggest me an efficient way to remove duplicated rows regularly?

Many thanks!

bug limitation

All 8 comments

Have you thought about create a unique constraint on the table?
CREATE UNIQUE INDEX ON conditions (time, location, temperature, humidity);

In addition to the index you can try to use "ON CONFLICT DO NOTHING" which should be supported.

@GoSteven @sisu-frank-kutzey I tried exactly what you guys said before creating this issue but it didn't work. The database still throws errors when I append ON CONFLICT DO NOTHING on INSERT command. This happens on the table with hypertable-enabled only.

Hi @tobernguyen (and thanks others for following up).

This appears to be an issue with the way we intersect in the insert path. As folks point out, one way to support this with postgres would be something like the following:

ALTER TABLE conditions ADD CONSTRAINT row_unique UNIQUE (time, location, temperature, humidity);

INSERT INTO conditions(time, location, temperature, humidity)
  VALUES (...) ON CONFLICT ON CONSTRAINT row_unique DO NOTHING;

However, it appears there is a bug where the constraint is not being propagated down into our subtables that comprise the table. (One of the reason is that those partitioned subtables also maintain their own constraints to all in insert/query planning, so can't all just have identical constraints as their parent.)

Now, we can observe this proper UPSERT behavior works if you add the constraint and perform the insert directly on the internal chunk (e.g., _timescaledb_internal._hyper_1_1_chunk, rather than on the hypertable conditions), but that is obviously not what we want.

Will follow-up when we push a bug fix. Apologies.

We just merged a PR into master with support for UPSERTS, i.e., ON CONFLICT DO UPDATE or ON CONFLICT DO NOTHING. (https://github.com/timescale/timescaledb/pull/137)

This should be part of the 0.3.0 release, which should go out early next week.

@tobernguyen Does this change resolve your issues? If so, we can close this :)

I will check this soon. Thank you guys for your hard work.

On Fri, Aug 18, 2017, 03:09 RobAtticus notifications@github.com wrote:

@tobernguyen https://github.com/tobernguyen Does this change resolve
your issues? If so, we can close this :)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/timescale/timescaledb/issues/100#issuecomment-323180503,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJjTJQ3JPXe7cILizcoLPoHcr8O8sjuBks5sZJ4UgaJpZM4OEhWI
.

Was this page helpful?
0 / 5 - 0 ratings