Each line of my json data looks like:
{
"id": 1,
"source": "china",
"sentences":[
{ content:"I loved apples",
words: [ {"content": "I", "stem": "i", weight: 5},
{"content": "loved", "stem": "love", weight: 10},
{"content": "apples", "stem": "apple", weight: 1}]},
{ content:"My parents have many apples",
words: [ {"content": "My", "stem": "my", weight: 6},
{"content": "parentes", "stem": "parent", weight: 5},
......
{"content": "apples", "stem": "apple", weight: 1}]}
]
}
"sentences" is an array, and "words" is an array too.
How can I load this json data to table with JsonEachRow?
Because I want to each domain in my json schema like id, source, sentences.content, sentences.words.content, sentences.words.stem, sentences.words.weight is stored separatly. So it will use the benefits of column storage.
For complex JSON structures, it is good to import them as strings and use JSON functions (JSONExtract-).
@hczhcz
For complex JSON structures, it is good to import them as strings and use JSON functions (JSONExtract-).
You mean that take the entire json object as a string?
But in the way, it is stored as a single column and we cannot take advantage of column storage in clickhouse?
@hexiaoting
Simply materialize what you extract from json.
Import them as JSONAsString, parse them with JSON functions (can be in MV)
@hczhcz @filimonov
clickhouse-client -q "create table json_as_string(field String) Engine = Memory"
cat xxx| clickhouse-client -q "insert into json_as_string format JSONAsString"
now all my json data is stored in json_as_string table as a string column.
But How can I extract the values of "sentences.content" and "sentences.words.content" into another table json_data?
create table json_data {
id Int32;
source String;
sentences_content Array(String);
sentences_words_content Array(Array(String));
sentences_words_stem Array(Array(String));
sentences_words_weight Array(Array(Int32));
}
select JSONExtractRaw(field, 'sentences')) from json_as_string; returns a String type not Array. Do I need to transform the String to Array type???
And I tried using "select JSONExtract(field, 'sentences', Array(String))", but it returns null.
with
'{
"id": 1,
"source": "china",
"sentences": [
{ "content": "I loved apples",
"words": [ {"content": "I", "stem": "i", "weight": 5},
{"content": "loved", "stem": "love", "weight": 10},
{"content": "apples", "stem": "apple", "weight": 1}]},
{ "content": "My parents have many apples",
"words": [ {"content": "My", "stem": "my", "weight": 6},
{"content": "parentes", "stem": "parent", "weight": 5},
{"content": "apples", "stem": "apple", "weight": 1}]}
]
}' as root,
JSONExtractArrayRaw(root, 'sentences') as sentences,
arrayMap(s -> JSONExtractArrayRaw(s, 'words'), sentences) as words
select
arrayMap(s -> JSONExtractString(s, 'content'), sentences) as sentences_content,
arrayMap(s -> arrayMap(w -> JSONExtractString(w, 'content'), s), words) as words_content
create table data (
root String,
sentences Array(String) alias JSONExtractArrayRaw(root, 'sentences'),
sentences_content Array(String) materialized arrayMap(s -> JSONExtractString(s, 'content'), sentences),
...
)
engine = ...
FYI.
@hczhcz Thanks a lot . That's what I want. ^^
Bookmarking this for later review.
@hexiaoting How did you make it work?
Can you post your DDL's here?
@ramazanpolat
create table json1(
root String,
id String
materialized JSONExtractString(root, '_id'),
source String
materialized JSONExtractString(root, 'source') ,
sentences Array(String)
alias JSONExtractArrayRaw(root, 'sentences'),
sentences_content Array(String)
materialized arrayMap(s -> JSONExtractString(s, 'content'), sentences),
words Array(Array(String))
materialized arrayMap(s -> JSONExtractArrayRaw(s, 'words'), sentences),
words_content Array(Array(String))
materialized arrayMap(s -> (arrayMap(k->JSONExtractString(k, 'content'), s)), words),
words_stem Array(Array(String))
materialized arrayMap(s -> (arrayMap(k->JSONExtractString(k, 'stem'), s)), words)
) engine = MergeTree order by publish_time;
cat $file.json | clickhouse-client -q "insert into json1(root) format JSONAsString "
Most helpful comment
FYI.