I just discovered that there's a single collection on my MongoDB that I can't query with prestodb (server and CLI on version 0.157).
All other collections can be queried normaly, but this one collection can't be queried directly or joined with anything else. It is not the biggest nor the most complex of the collections, but apparently presto is having trouble with types.
java.lang.IllegalArgumentException: base is empty
at com.facebook.presto.spi.type.TypeSignature.checkArgument(TypeSignature.java:370)
at com.facebook.presto.spi.type.TypeSignature.<init>(TypeSignature.java:48)
at com.facebook.presto.spi.type.TypeSignature.parseTypeSignature(TypeSignature.java:101)
at com.facebook.presto.spi.type.TypeSignature.parseRowTypeSignature(TypeSignature.java:199)
at com.facebook.presto.spi.type.TypeSignature.parseTypeSignature(TypeSignature.java:104)
at com.facebook.presto.spi.type.TypeSignature.parseTypeSignature(TypeSignature.java:94)
at com.facebook.presto.mongodb.MongoSession.buildColumnHandle(MongoSession.java:199)
at com.facebook.presto.mongodb.MongoSession.loadTableSchema(MongoSession.java:185)
at com.facebook.presto.mongodb.MongoSession.access$000(MongoSession.java:76)
at com.facebook.presto.mongodb.MongoSession$1.load(MongoSession.java:128)
at com.facebook.presto.mongodb.MongoSession$1.load(MongoSession.java:123)
(...) very long stack trace
I've searched a few issues related to collection naming, but this one follows the same conventions (it actually has no underscores, some of the others do and it works).
What are the most likely causes for this? Maybe an inconsistent schema? (I have clues that a certain field in MongoDB has date and sometimes integer, but that would require a lot of digging to confirm)
If you haven't specified the table schema, could you print the result of db.getCollection(tableName).find().first() ?
I needed to redact some stuff for safety reasons, which kind of defeats the purpose?
db.getCollection("tableName").findOne()
{
"_id" : ObjectId("56cdcb6b59fc36000954b092"),
"name" : "test",
"email" : "[email protected]",
"password" : "SHA256-HEX-STRING",
"birth_date" : ISODate("1987-04-18T00:00:00Z"),
"gender" : "FEMALE",
"interests" : [ ],
"facebook_id" : "",
"created_at" : ISODate("2016-02-24T15:25:31.601Z"),
"first_login" : false,
"platform_version" : 1,
"artwork" : {
"url" : "",
"ratings" : {
},
"views" : 0,
"posting_date" : ISODate("2016-08-09T11:43:17.858Z")
},
"followers" : [
{
"follower_id" : ObjectId("57a9d2c2130bef01a1a05852"),
"datetime" : ISODate("2016-10-24T10:46:48.655Z")
}
]
}
I have confirmed my suspicions that one of those fields (artwork.posting_date) is sometimes integer and sometimes ISODate. Fixing it is going to take some time, but when I do I'll try again.
I think there's bug at guessing field types of empty document at artwork.ratings. It should be ignored as there's no clue to guess type. You can find guessed types by db.getCollection("_schema").findOne({"table" : "tableName"}).
A workaround is changing the artwork's type manually. It probably is row(url varchar, rating row(), views bigint, posting_date timestamp). You might change it into row(url varchar, views bigint, posting_date timestamp) or row(url varchar, views bigint, ratings map(varchar,varchar), posting_date timestamp)
Yes, there are some mappings like row(). So in general, it is best to keep documents without field: {}?
This is going to be hard to fix, updating the schema is probably easier. Will look into it.
Fixed it by manually adjusting the _schema collection. The guy who made this document made this field a simple dictionary on Python, so it has no way to define a consistent schema. I just defined it as varchar.
Thanks for the directions. :-)
Most helpful comment
Fixed it by manually adjusting the _schema collection. The guy who made this document made this field a simple dictionary on Python, so it has no way to define a consistent schema. I just defined it as varchar.
Thanks for the directions. :-)