Elasticsearch: [ML Data Visualizer] Upload failed due to missing timestamp

Created on 7 May 2020 · 14Comments · Source: elastic/elasticsearch

Elasticsearch version (bin/elasticsearch --version): 7.6.2

Plugins installed: []

JVM version (java -version): ESS (GCP)

OS version (uname -a if on a Unix-like system): ESS (GCP)

Description of the problem including expected versus actual behavior:

I'm trying to use the Data visualizer to upload a file and I get this error:

File could not be read
[illegal_argument_exception] Could not find a timestamp in the sample provided

Steps to reproduce:

Please try uploading the attached file.
This error is weird because this file does have a timestamp, and also because i can upload other files that are having no timestamp at all and that work fine.
Thank you in advance for your answer.

no_timestamp_issue.txt

:ml >bug

Source

blookot

👍1

Most helpful comment

Thank you @droberts195 !
From what I know, excel us reads & writes CSV files with a , delimiter whereas in france it uses ;
In Excel (in france) the "save as" with type "CSV UTF-8 (delimited by commas)" actually outputs a csv delimited by a semi-colon.
When reading a CSV, Excel Fr also expects a ";" delimiter
So I guess if Elasticsearch wants to be French friendly, supporting the semi-colon would be great ;)

blookot on 20 May 2020

👍2

All 14 comments

Pinging @elastic/ml-core (:ml)

elasticmachine on 7 May 2020

I have the same issue and I am very interested by the fix!

fbaligand on 7 May 2020

This error is weird because this file does have a timestamp

Your timestamp format is yyyyMMdd, which is not one of the ones that is detected out-of-the-box. You need to override the timestamp using the timestamp_format=yyyyMMdd URL argument on the find_file_structure endpoint - see https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-find-file-structure.html. Then the structure will be detected.

Run this and it works fine (replace the username, password and ES server hostname as appropriate):

curl -u elastic:password -s -H "Content-Type: application/json" -XPOST "http://localhost:9200/_ml/find_file_structure?pretty&timestamp_format=yyyyMMdd" -T no_timestamp_issue.txt

However, it sounds like you're accessing that endpoint via Kibana, and you cannot do that until elastic/kibana#38868 is implemented.

also because i can upload other files that are having no timestamp at all and that work fine.

These would be NDJSON or delimited files. Currently we require a timestamp in semi-structured text files because the lines are grouped into messages by assuming that the first line of every message has the timestamp.

We have an enhancement request open, #55219, to allow the user to say there is no timestamp, in which case semi-structured text files would be assumed to have one line per message. However, to take advantage of that through the UI would also require that elastic/kibana#38868 was implemented.

Basically you should upvote elastic/kibana#38868.

droberts195 on 7 May 2020

Thanks for your detailed answer.
I just upvoted for kibana issue that you mentioned

fbaligand on 8 May 2020

👍1

Thank you @droberts195
However, I'm able to upload other files with no timestamp, and it works just fine. there is no timestamp identified.
See this example below.

5k.txt

blookot on 13 May 2020

See this example below.

That file has a delimited format (CSV). Currently we require a timestamp in semi-structured text files because the lines are grouped into messages by assuming that the first line of every message has the timestamp. But we do not require a timestamp in NDJSON or delimited files.

droberts195 on 13 May 2020

I see...
good catch @droberts195
Sometimes (like in my first example), spaces and semi colons can be replaced by commas and convert it to csv! cc @fbaligand
waiting for the enhancement ;)

blookot on 13 May 2020

OK, so now, I understand more the problem.

ML Data Visualizer does not recognize a CSV file when delimiter is ";" (and not ",").

For example, I get the "[illegal_argument_exception] Could not find a timestamp in the sample provided" error when I upload this file:

a;b;c
d;e;f
g;h;i
a;b;c
d;e;f
g;h;i

If I retry with "," delimiter, it works fine:

a,b,c
d,e,f
g,h,i
a,b,c
d,e,f
g,h,i

That is sad because Microsoft Excel uses ";" delimiter when it generates a CSV export.

So, if I understand well, it will be fixed when https://github.com/elastic/kibana/issues/38868 will be fixed?

fbaligand on 19 May 2020

ML Data Visualizer does not recognize a CSV file when delimiter is ";" (and not ",").

That's not quite true, it can sometimes recognize semi-colon separated files, but only with 4 fields per row: see https://github.com/elastic/elasticsearch/blob/fd554d95e462232ee9799c82ba8faea11ac481a9/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/filestructurefinder/FileStructureFinderManager.java#L283

So a workaround could be to add an extra column to your file so it has 4 fields per row.

The reason it requires more semi-colons than commas to detect a delimited format is that we found semi-colons appear more often in files that a human would classify as semi-structured text log files than commas do (at least in the sample files we looked at when developing the feature).

Of course, if you override the delimiter then any number of fields per row is enough: see https://github.com/elastic/elasticsearch/blob/fd554d95e462232ee9799c82ba8faea11ac481a9/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/filestructurefinder/FileStructureFinderManager.java#L482-L485

And you are correct that in order to specify an exact delimiter in the UI when the initial analysis failed you need elastic/kibana#38868.

/cc @elastic/ml-ui

droberts195 on 19 May 2020

Thanks for your answer and your tip!

I just tested with 4 columns, and it works fine!

fbaligand on 19 May 2020

blookot on 20 May 2020

👍2

Yes! Elasticsearch is a French people friend ;)

fbaligand on 20 May 2020

supporting the semi-colon would be great

It's a bit misleading to say it's "not supported" today. The semi-colon delimiter is not detected with fewer than 4 fields.

If you explicitly say the separator is semi-colon with any number of fields it works.

curl -u elastic:password -s -H "Content-Type: application/json" -XPOST "localhost:9200/_ml/find_file_structure?pretty&explain&format=delimited&delimiter=%3B" -T- <<EOF            
a;b;c
d;e;f
g;h;i
a;b;c
d;e;f
g;h;i
EOF

This file works fine without any hint:

a;b;c;1
d;e;f;1
g;h;i;1
a;b;c;1
d;e;f;1
g;h;i;1

The difference compared to commas is that you need a minimum of 4 fields for auto-detection. For commas it's 2.

I am actually wondering if commas should be 3.

Consider this file:

1,1
2,2
3,3
4,4
5,5

Is it:

Two columns saved from English Excel?
One column saved from French Excel (with French decimal separator)?

Correct answer: don't know.

At the moment the file structure finder decides 1. Maybe it shouldn't make a decision at all. Or maybe it needs to have an idea of the user's locale, but that's not trivial. We couldn't use the Elasticsearch server locale - I imagine in Cloud for example that doesn't vary across regions. It would have to be passed in as a URL argument to find_file_structure.

In the short term we are back to the user having to give a hint. All roads lead to elastic/kibana#38868.

droberts195 on 21 May 2020

Thanks for this detailed answer and especially for the hint to specify explicitly the delimiter.
As you say, all roads lead to kibana issue you mentioned.
Hope it will be soon implemented :)

fbaligand on 21 May 2020

Was this page helpful?

0 / 5 - 0 ratings