Timescaledb: Exporting data

Created on 27 Jun 2017  路  10Comments  路  Source: timescale/timescaledb

I know I can do a backup with timescale db (here), but my use case is different. I want to export data from a table, with the COPY FROM instruction in sql :

COPY (SELECT * FROM sensor_climate) TO '/data_export/sensor_climate.bak' With CSV quote

and then insert those data in another database:

COPY sensor_climate FROM '/data_export/sensor_climate.bak' DELIMITER ',' CSV;

But it does not work with an hypertable ! There is only a generic error message, and it works perfectly with other tables (wich are non-hypertables).

[Edit] The error message I get:

fuldatabase=# copy sensor_climate from '/shared_volume/sensor_climate.bak' csv quote;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Also, the server tells me:

Server process (PID xx) was terminated by signal 11

If you see a better way to do what I want, please tell me. However, if timescaledb is built for IOT, it is a really important feature being able to export data easily.

For example I have some sensors in a location A with a timescaledb A, and other sensors in a different location B with timescaledb B. I need to export data periodically on a timescaledb C. Actually, it seems that I can't do that.

bug

Most helpful comment

Hello,
I've just realized that the backup and restore procedure for single tables is different then the procedure for backing up and restoring the whole database. Thank you for submitting this issue. We will work on updating our documentation and streamlining the process. For backup/restore of single tables the following procedure should be used:

Backup

Backup the table schemas:

pg_dump -s -d old_db -N _timescaledb_internal | grep -v _timescaledb_internal > schema.sql

The above command backs up all tables. To filter tables you can add the --table option to pg_dump. (I believe the above is the biggest difference between what you ran.)

Backup the table data:

psql -d old_db -c "\COPY (SELECT * FROM foo) TO foo_data.csv DELIMITER ',' CSV"

Restore

Restore the schema:

psql -d new_db < schema.sql

Recreate the hypertables:

psql -d new_db -c "SELECT create_hypertable('foo', <options...>)"

Restore the data:

psql -d new_db -c "\COPY foo FROM foo_data.csv CSV"

(Your copy command above with the custom delimiters should also work)

All 10 comments

Is there more information about the error in the postgres server log? That does indeed seem to be a bug. We made our sample datasets on our website using the same process you are trying, so it seems to have worked in our case but maybe something has changed since then.

Those are the logs of our architecture, using docker-compose (and the timescaledb docker image). We use psycopg2 with python in the service consumer to make a copy_from (which executes a the sql statement COPY ... FROM etc).

consumer                | AFTER Connection postrges
consumer                | Traceback (most recent call last):
consumer                |   File "consume_data.py", line 45, in <module>
consumer                |     channel.start_consuming()
consumer                |   File "/usr/local/lib/python3.5/site-packages/pika/adapters/blocking_connection.py", line 1681, in start_consuming
consumer                |     self.connection.process_data_events(time_limit=None)
consumer                |   File "/usr/local/lib/python3.5/site-packages/pika/adapters/blocking_connection.py", line 656, in process_data_events
consumer                |     self._dispatch_channel_events()
consumer                |   File "/usr/local/lib/python3.5/site-packages/pika/adapters/blocking_connection.py", line 469, in _dispatch_channel_events
consumer                |     impl_channel._get_cookie()._dispatch_events()
consumer                |   File "/usr/local/lib/python3.5/site-packages/pika/adapters/blocking_connection.py", line 1310, in _dispatch_events
consumer                |     evt.body)
consumer                |   File "consume_data.py", line 32, in callback
consumer                |     cur.copy_from(my_file, filename.decode("utf-8"), sep=u'\x02')
consumer                | psycopg2.DatabaseError: server closed the connection unexpectedly
consumer                |       This probably means the server terminated abnormally
consumer                |       before or while processing the request.
consumer                |
postgres                | LOG:  server process (PID 69) was terminated by signal 11
postgres                | DETAIL:  Failed process was running: COPY sensor_climate FROM stdin WITH DELIMITER AS '?' NULL AS '\N'
postgres                | LOG:  terminating any other active server processes
postgres                | WARNING:  terminating connection because of crash of another server process
postgres                | DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
postgres                | HINT:  In a moment you should be able to reconnect to the database and repeat your command.
postgres                | WARNING:  terminating connection because of crash of another server process
postgres                | DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
postgres                | HINT:  In a moment you should be able to reconnect to the database and repeat your command.
postgres                | LOG:  all server processes terminated; reinitializing
postgres                | LOG:  database system was interrupted; last known up at 2017-06-28 07:49:56 UTC

To be sure the problem was not from psycopg2, we directly executed the following sql statement copy from, in psql.

database=# COPY sensor_climate FROM '/shared_volume/sensor_climate.bak' CSV quote e'\x01' delimiter e'\x02';
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
````

**NB**: We used the delimeter trick (weird e'\x01') because on other tables (which are not hypertables) we have JSON fields, so we need to use different separators (see [here](http://adpgtech.blogspot.fr/2014/09/importing-json-data.html)). **the issue does come from here**. We tried manually with a .csv file on sensor_climate table, the issue is the same.

In short, we used the same procedure as [here](http://docs.timescale.com/tutorials/tutorial-hello-nyc). We just did not used : 

``` sql
CREATE DATABASE nyc_data;
\c nyc_data
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

and directly imported the schema. It seems to work because we have access to timescaledb specific comand like "last" without errors.

Would it be possible for you to email us your dataset so we can debug (happy to sign an NDA if needed). Alternatively, this type of error usually causes a core dump to be created (If you let us know your OS I can tell you where.), if you send us the core dump that would help a lot. Also, is this on release 0.1.0?

I am waiting for the agreement of my boss to send you our dataset.
My OS is debian 8, but I am using the docker version of timescaleDB. Today I tried with the last version (0.1.0), and it seems another error occured.

postgres                | ERROR:  No hypertable for relid 16642
postgres                | CONTEXT:  COPY sensor_climate, line 1: "12017-06-30 08:12:16.893723ExternalTemperature_EXToutedoorEnvironmentclimate掳C21.7115climate..."
postgres                | STATEMENT:  COPY sensor_climate FROM stdin WITH DELIMITER AS '' NULL AS '\N'

In fact, the pg_dump command for exporting the schema created a quite different file than with the 0.0.12 version.

To make sure the table generated from the import of the schema was an hypertable I used "select last(time, time) from sensor_climate group by id;". It worked, so I am assuming we are indeed working on an hypertable.

Hello,
I've just realized that the backup and restore procedure for single tables is different then the procedure for backing up and restoring the whole database. Thank you for submitting this issue. We will work on updating our documentation and streamlining the process. For backup/restore of single tables the following procedure should be used:

Backup

Backup the table schemas:

pg_dump -s -d old_db -N _timescaledb_internal | grep -v _timescaledb_internal > schema.sql

The above command backs up all tables. To filter tables you can add the --table option to pg_dump. (I believe the above is the biggest difference between what you ran.)

Backup the table data:

psql -d old_db -c "\COPY (SELECT * FROM foo) TO foo_data.csv DELIMITER ',' CSV"

Restore

Restore the schema:

psql -d new_db < schema.sql

Recreate the hypertables:

psql -d new_db -c "SELECT create_hypertable('foo', <options...>)"

Restore the data:

psql -d new_db -c "\COPY foo FROM foo_data.csv CSV"

(Your copy command above with the custom delimiters should also work)

Thanks a lot ! I will try it tomorrow.

Just a little thing, the command for dumping only the schema is not :

pg_dump -d old_db -N _timescaledb_internal | grep -v _timescaledb_internal > schema.sql

but

pg_dump -s -d old_db -N _timescaledb_internal | grep -v _timescaledb_internal > schema.sql

right ? Because the first command will also dump data :)

Yeah you are absolutely right. Thanks!

Ok, It works perfectly, I am closing. Thank you !

This information is now reflected in the docs here:

http://docs.timescale.com/latest/using-timescaledb/backup

Was this page helpful?
0 / 5 - 0 ratings