I'm using PRESTO 0.198 and it connects to underlying Mongo database.
I found that some attributes/columns of Mongo collection are not reflected in Presto
Main collection record
db.vessel_port_visits.findOne({fleetId : "59ca77facaee7c2ccbd281a5"})
{
"_id" : ObjectId("5b16b5279e7964763270f818"),
"number" : "9456135",
"exitTimeMillis" : 1326276054000,
"entryTimeMillis" : 1325980799000,
"groupIds" : [
"59ca77facaee7c2ccbd281a5"
]
}
The resultant presto schema is
>> desc mongodb.databaseName.collection
Column | Type | Extra | Comment
--------------------------+-----------+-------+---------
number | varchar | |
exittimemillis | double | |
entrytimemillis | double | |
(3 rows)
I cannot see the column 'groupIds' which is present in Mongo collection.
1 more thing to note is that groupIds is mostly empty array ie [] in most of the mongo records.
This is causing the queries which include 'groupIds' attribute to fail.
Is there a way to refresh the schema in PRESTO
I am seeing an similar issue. Presto isn't reading all of the fields in my collection. In my case, it's not reading an embedded object or a field that only has null's. Is there a way to include these fields so presto can query against them?
Hi, did you solve the problem ? I am having similar problems with reading entities from json data type.
Any help?
Hi, from our POV after some investigation:
Hope it will be helpful.
Hi,After reading carefully,i have some questions as below:
At 2019-07-05 16:55:55, "pomykalamichal" notifications@github.com wrote:
Hi, from our POV after some investigation:
_id of mongo documents has to be ObjectId data type (if it is string presto will not be able to read the data )
_schema of mongo DB is created based on first document inserted into collection. It means that if after some time the document content will change it is required to add those variables into _schema manually. _schemat will not be updated automatically,
sometimes _schema is not created automatically or sometimes new collections are not added automatically also. It means that changes should be added manually by editing of existing mongo _schema.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
Trying to answer of your question:
In etc/catalog/mongodb.properties you can provide information which collection contains db schema definition i.e. mongodb.schema-collection parameter. But it is not required. Required is that _schema collection should be first on the list of all collections, if it is not provided in properties.
https://prestosql.io/docs/current/connector/mongodb.html

From our POV we have not need to restart presto server after mongodb _schema collection content changes.
Is not support mongodb-connector for alter table?
Here is the information what kind of data types are supported : https://prestodb.github.io/docs/current/connector/mongodb.html#table-definition-label
If there are any other way to dynamic,like conf.d plugin?
I am not able to answer this question.
I am seeing an similar issue. Presto isn't reading all of the fields in my collection. In my case, it's not reading an embedded object or a field that only has null's. Is there a way to include these fields so presto can query against them?
Are you able to resolve the issue for embedded object or a field that only has null's. ??
Hi, one additional hint. We define almost each field as a varchar. Then we start to use presto json parse function to read the data. https://prestodb.io/docs/current/functions/json.html
json_parse(string) → json
json_extract_scalar(json, json_path) → varchar
json_extract(json, json_path) → json
Exemplary query:
json_extract_scalar(json_parse(testdb.steps), '$["step-xyz"].details.stepId') = '1033'
Most helpful comment
I am seeing an similar issue. Presto isn't reading all of the fields in my collection. In my case, it's not reading an embedded object or a field that only has null's. Is there a way to include these fields so presto can query against them?