Clickhouse: Import all csv files from a s3 directory

Created on 4 Nov 2019 · 19Comments · Source: ClickHouse/ClickHouse

Make sure to check documentation https://clickhouse.yandex/docs/en/ first. If the question is concise and probably has a short answer, asking it in Telegram chat https://telegram.me/clickhouse_en is probably the fastest way to find the answer. For more complicated questions, consider asking them on StackOverflow with "clickhouse" tag https://stackoverflow.com/questions/tagged/clickhouse

If you still prefer GitHub issues, remove all this text and ask your question here.

question

Source

manojfim

Most helpful comment

Not sure, probably it works in test releases

like this

insert into mytable
select * from s3('http://s3auth:s3auth@host/bucket/test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')

That is not correct. One shall use this syntax:

select * from s3('http://host/bucket/test.csv', 'access_key_id'. 'secret_access_key'. 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')

excitoon on 22 Jan 2020

👍2

All 19 comments

Is there a out of the box solution for importing all the csv files in a s3 directory?

manojfim on 4 Nov 2019

There is no out of the box solution right now (engine URL does not support auth, engine S3 is not released).
You should use local files.

den-crane on 4 Nov 2019

Is this will be supported in future?
On Nov 4, 2019, 8:59 AM -0500, Denis Zhuravlev notifications@github.com, wrote:

There is no out of the box solution right now (engine URL does not support auth, engine S3 is not released).
You should use local files.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.

manojfim on 4 Nov 2019

Not sure, probably it works in test releases

like this

insert into mytable
select * from s3('http://s3auth:s3auth@host/bucket/test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')

den-crane on 4 Nov 2019

Is S3 import/export function released ?

manojfim on 21 Jan 2020

Yes.

alexey-milovidov on 21 Jan 2020

@alexey-milovidov thanks for the confirmation. Can you please point me towards documentation or an example of inserting all csv files from s3 directory.

manojfim on 21 Jan 2020

I forget that:

s3 table function is implemented and released;
support for wildcards in file, hdfs, url is implemented and released;
support for wildcards in s3 is not implemented.

alexey-milovidov on 21 Jan 2020

Sorry for incomplete info.

This task is assigned to @stavrolia

alexey-milovidov on 21 Jan 2020

@alexey-milovidov Thanks you!

If you have an Working example of importing a file from S3, can you please give me.

What version of clickhouse has this feature.

manojfim on 21 Jan 2020

@alexey-milovidov Thanks you!

If you have an Working example of importing a file from S3, can you please give me.

What version of clickhouse has this feature.

You can import a file with select * from s3('URL', 'format', 'columns') - where URL means URL, format means format of your file and columns means the columns you need.

stavrolia on 21 Jan 2020

Thanks, @stavrolia !!

Since I am doing insert into click house should I do insert into tablename select * from s3('URL', 'format', 'columns')

where should i give secretkey and accesskey?

manojfim on 21 Jan 2020

Not sure, probably it works in test releases

like this

insert into mytable
select * from s3('http://s3auth:s3auth@host/bucket/test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')

That is not correct. One shall use this syntax:

select * from s3('http://host/bucket/test.csv', 'access_key_id'. 'secret_access_key'. 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')

excitoon on 22 Jan 2020

👍2

Not sure, probably it works in test releases
like this

insert into mytable
select * from s3('http://s3auth:s3auth@host/bucket/test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')

That is not correct. One shall use this syntax:

select * from s3('http://host/bucket/test.csv', 'access_key_id'. 'secret_access_key'. 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')

thanks for the steps. i was able to submit the request using above format. however, it stops me with following error post request is submitted
DB::Exception: Unable to connect to endpoint: While executing S3

has anyone seen this error message?

hilesha on 1 Apr 2020

Not sure, probably it works in test releases
like this
insert into mytable
select * from s3('http://s3auth:s3auth@host/bucket/test.csv', 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')

That is not correct. One shall use this syntax:
select * from s3('http://host/bucket/test.csv', 'access_key_id'. 'secret_access_key'. 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')

thanks for the steps. i was able to submit the request using above format. however, it stops me with following error post request is submitted
DB::Exception: Unable to connect to endpoint: While executing S3
has anyone seen this error message?

Yes I am getting the same error.

Crazylearner30 on 5 May 2020

It means, literally, that ClickHouse server cannot establish connection to S3.

To have more context on this error, you may:

check clickhouse-server.log for more detailed error message;
try to connect to s3 from the same machine using another tools;

(expert level below)

use tcpdump and check what packets were sent and received;
use clickhouse-local --query "select * from s3('http://host/bucket/test.csv', 'access_key_id'. 'secret_access_key'. 'CSV', 'column1 UInt32, column2 UInt32, column3 UInt32')" and run it inside strace -f -e trace=network to check exact contents of the packets.

alexey-milovidov on 5 May 2020

Hello @alexey-milovidov,

Thanks for the information. I have got the insert command working for single file. Is there a way to insert or select all the files in the s3 directory using wildcard?

Thanks

manojfim on 12 May 2020

@manojfim
Yes,
select * from s3('http://host/bucket/*.csv/...

den-crane on 12 May 2020

This is the exact command i was able to select from single file

select *
from s3('http://s3.amazonaws.com/bucketname/test/file.csv',
'accesskey','secretkey','CSV', 'column1 String');

manojfim on 2 Sep 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings