Clickhouse: How can I using Json-related-format to import multi-level nested Json data?

Created on 19 Aug 2020  Â·  10Comments  Â·  Source: ClickHouse/ClickHouse

Each line of my json data looks like:

{
    "id": 1, 
    "source": "china",
    "sentences":[
          { content:"I loved apples",
            words: [ {"content": "I", "stem": "i", weight: 5}, 
                     {"content": "loved", "stem": "love", weight: 10}, 
                     {"content": "apples", "stem": "apple", weight: 1}]}, 
         { content:"My parents have many apples",
            words: [ {"content": "My", "stem": "my", weight: 6}, 
                     {"content": "parentes", "stem": "parent", weight: 5}, 
                     ......
                     {"content": "apples", "stem": "apple", weight: 1}]}
    ]
}

"sentences" is an array, and "words" is an array too.

How can I load this json data to table with JsonEachRow?
Because I want to each domain in my json schema like id, source, sentences.content, sentences.words.content, sentences.words.stem, sentences.words.weight is stored separatly. So it will use the benefits of column storage.

comp-formats question question-answered

Most helpful comment

with
    '{
        "id": 1, 
        "source": "china",
        "sentences": [
              { "content": "I loved apples",
                "words": [ {"content": "I", "stem": "i", "weight": 5}, 
                         {"content": "loved", "stem": "love", "weight": 10}, 
                         {"content": "apples", "stem": "apple", "weight": 1}]}, 
             { "content": "My parents have many apples",
                "words": [ {"content": "My", "stem": "my", "weight": 6}, 
                         {"content": "parentes", "stem": "parent", "weight": 5},
                         {"content": "apples", "stem": "apple", "weight": 1}]}
        ]
    }' as root,
    JSONExtractArrayRaw(root, 'sentences') as sentences,
    arrayMap(s -> JSONExtractArrayRaw(s, 'words'), sentences) as words
select
    arrayMap(s -> JSONExtractString(s, 'content'), sentences) as sentences_content,
    arrayMap(s -> arrayMap(w -> JSONExtractString(w, 'content'), s), words) as words_content
create table data (
    root String,
    sentences Array(String) alias JSONExtractArrayRaw(root, 'sentences'),
    sentences_content Array(String) materialized arrayMap(s -> JSONExtractString(s, 'content'), sentences),
    ...
)
engine = ...

FYI.

All 10 comments

For complex JSON structures, it is good to import them as strings and use JSON functions (JSONExtract-).

@hczhcz

For complex JSON structures, it is good to import them as strings and use JSON functions (JSONExtract-).

You mean that take the entire json object as a string?
But in the way, it is stored as a single column and we cannot take advantage of column storage in clickhouse?

@hexiaoting
Simply materialize what you extract from json.

Import them as JSONAsString, parse them with JSON functions (can be in MV)

@hczhcz @filimonov

clickhouse-client -q "create table json_as_string(field String) Engine = Memory"
cat xxx| clickhouse-client -q "insert into json_as_string format JSONAsString"

now all my json data is stored in json_as_string table as a string column.

But How can I extract the values of "sentences.content" and "sentences.words.content" into another table json_data?

create table json_data {
    id Int32;
    source String;
    sentences_content Array(String);
    sentences_words_content Array(Array(String));
    sentences_words_stem Array(Array(String));
    sentences_words_weight Array(Array(Int32));
}

select JSONExtractRaw(field, 'sentences')) from json_as_string; returns a String type not Array. Do I need to transform the String to Array type???
And I tried using "select JSONExtract(field, 'sentences', Array(String))", but it returns null.

with
    '{
        "id": 1, 
        "source": "china",
        "sentences": [
              { "content": "I loved apples",
                "words": [ {"content": "I", "stem": "i", "weight": 5}, 
                         {"content": "loved", "stem": "love", "weight": 10}, 
                         {"content": "apples", "stem": "apple", "weight": 1}]}, 
             { "content": "My parents have many apples",
                "words": [ {"content": "My", "stem": "my", "weight": 6}, 
                         {"content": "parentes", "stem": "parent", "weight": 5},
                         {"content": "apples", "stem": "apple", "weight": 1}]}
        ]
    }' as root,
    JSONExtractArrayRaw(root, 'sentences') as sentences,
    arrayMap(s -> JSONExtractArrayRaw(s, 'words'), sentences) as words
select
    arrayMap(s -> JSONExtractString(s, 'content'), sentences) as sentences_content,
    arrayMap(s -> arrayMap(w -> JSONExtractString(w, 'content'), s), words) as words_content
create table data (
    root String,
    sentences Array(String) alias JSONExtractArrayRaw(root, 'sentences'),
    sentences_content Array(String) materialized arrayMap(s -> JSONExtractString(s, 'content'), sentences),
    ...
)
engine = ...

FYI.

@hczhcz Thanks a lot . That's what I want. ^^

Bookmarking this for later review.

@hexiaoting How did you make it work?

Can you post your DDL's here?

@ramazanpolat

create table json1(
    root String,
    id String
        materialized JSONExtractString(root, '_id'),
    source String
        materialized JSONExtractString(root, 'source') ,
    sentences Array(String)
        alias JSONExtractArrayRaw(root, 'sentences'),
    sentences_content Array(String)
        materialized arrayMap(s -> JSONExtractString(s, 'content'), sentences),
    words Array(Array(String))
        materialized arrayMap(s -> JSONExtractArrayRaw(s, 'words'), sentences),
    words_content Array(Array(String))
        materialized arrayMap(s -> (arrayMap(k->JSONExtractString(k, 'content'), s)), words),
    words_stem Array(Array(String))
        materialized arrayMap(s -> (arrayMap(k->JSONExtractString(k, 'stem'), s)), words)
    ) engine = MergeTree order by publish_time;

cat $file.json | clickhouse-client -q "insert into json1(root) format JSONAsString "
Was this page helpful?
0 / 5 - 0 ratings

Related issues

goranc picture goranc  Â·  3Comments

jimmykuo picture jimmykuo  Â·  3Comments

igor-sh8 picture igor-sh8  Â·  3Comments

innerr picture innerr  Â·  3Comments

bseng picture bseng  Â·  3Comments