I have an App Engine Standard Maven project, where a procedure need to create a Table on BigQuery.
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-bigquery</artifactId>
<version>0.32.0-beta</version>
</dependency>
I'm using the sample code provided here https://github.com/GoogleCloudPlatform/google-cloud-java/tree/master/google-cloud-bigquery#creating-a-table
TableId tableId = TableId.of(datasetId, "my_table_id");
// Table field definition
Field stringField = Field.of("StringField", LegacySQLTypeName.STRING);
// Table schema definition
Schema schema = Schema.of(stringField);
// Create a table
StandardTableDefinition tableDefinition = StandardTableDefinition.of(schema);
Table createdTable = bigquery.create(TableInfo.of(tableId, tableDefinition));
The difference with my need, compared to this example, is that I have a json file which contains the schema. Here is an example:
[
{
"mode": "REQUIRED",
"name": "identifier",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "code",
"type": "STRING"
},
{
"mode": "REQUIRED",
"name": "description",
"type": "STRING"
}
]
I'm unable to find an existing method which load the table schema from a json file, instead of creating it manually from Schema/FieldList/Field classes.
Something like
Schema schema = Schema.parseJson(jsonSchema);
Is there way to load the json file or do I need to build a custom parser?
While I'm waiting for a reply, I wrote a custom deserializer based on Gson library. It is working, but if there is an already built-in method, I'll be more than happy to use it
public static void main(String[] args) {
// TODO Load schema from file
String jsonSchema = "[{\"mode\":\"REQUIRED\",\"name\":\"identifier\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"code\",\"type\":\"STRING\"},{\"mode\":\"REQUIRED\",\"name\":\"description\",\"type\":\"STRING\"}]";
// Json schema uses "fields"
// com.google.cloud.bigquery.Field uses "subFields"
// FIXME Unable to use @SerializedName policy
jsonSchema = jsonSchema.replace("\"fields\"", "\"subFields\"");
// Deserialize schema with custom Gson
Field[] fields = getGson().fromJson(jsonSchema, Field[].class);
Schema schema = Schema.of(fields);
System.out.println(schema.toString());
}
public static Gson getGson() {
JsonDeserializer<LegacySQLTypeName> typeDeserializer = (jsonElement, type, deserializationContext) -> {
return LegacySQLTypeName.valueOf(jsonElement.getAsString());
};
JsonDeserializer<FieldList> subFieldsDeserializer = (jsonElement, type, deserializationContext) -> {
Field[] fields = deserializationContext.deserialize(jsonElement.getAsJsonArray(), Field[].class);
return FieldList.of(fields);
};
return new GsonBuilder()
.registerTypeAdapter(LegacySQLTypeName.class, typeDeserializer)
.registerTypeAdapter(FieldList.class, subFieldsDeserializer)
.create();
}
Hi @NicolaSpreafico,
Unfortunately there is currently no a built-in way to import a json schema in bigquery java client (as you discovered, there are already pending feature requests for it).
Thanks a lot for the custom deserializer sample, I looks like a good workaround for now.
I'll have to close this one as a duplicate of https://github.com/GoogleCloudPlatform/google-cloud-java/issues/2553 (keeping only #2553 open for tracking).
This is a relatively old feature request, which hasn't been addressed yet, because of lack of resources (and multiple other items with higher priority we need to address first).
Thank you, I subscribed to the still open issue
here's a scala implementation using the bq dto object (so it works with nested fields too). im sure you can adapt it to something similar in java:
json -> schema
import com.google.api.client.json.jackson2.JacksonFactory
import com.google.api.services.bigquery.model.{TableFieldSchema, TableSchema}
import com.google.cloud.bigquery.Schema
object BqJsonToBqSchema {
private val parser = new JacksonFactory()
private def dtoTableSchemaToBqSchema(dtoSchema: TableSchema): Schema = {
val fromPbMethod =
classOf[Schema]
.getDeclaredMethods
.toIterable
.find(method => method.getName == "fromPb")
.get
fromPbMethod.setAccessible(true)
fromPbMethod.invoke(null, dtoSchema).asInstanceOf[Schema]
}
def convert(jsonSchemaString: String): Schema = {
val fieldsListDto =
parser
.createJsonParser(jsonSchemaString)
.parseArray(classOf[java.util.ArrayList[TableFieldSchema]], classOf[TableFieldSchema])
.asInstanceOf[java.util.ArrayList[TableFieldSchema]]
val schemaDto = new TableSchema()
schemaDto.setFields(fieldsListDto)
dtoTableSchemaToBqSchema(schemaDto)
}
}
schema -> json
import com.google.cloud.bigquery.{Schema => BqSchema}
import com.google.api.services.bigquery.model.{TableSchema => BqTableSchema}
import org.codehaus.jackson.map.ObjectMapper
object BqSchemaToBqJson {
private def bqSchemaToJsonString(schema: BqSchema): String = {
val toPbMethod =
classOf[BqSchema]
.getDeclaredMethods
.toIterable
.find(method => method.getName == "toPb")
.get
toPbMethod.setAccessible(true)
val tableSchema = toPbMethod.invoke(schema).asInstanceOf[BqTableSchema]
val objMapper = new ObjectMapper()
objMapper.writeValueAsString(tableSchema.getFields)
}
def convert(bqSchema: BqSchema): String = {
bqSchemaToJsonString(bqSchema)
}
}
Here's a Java 8 implementation of @marengaz's json -> schema code:
public class BqJsonToBqSchema {
private static final Method SCHEMA_FROM_PB = Arrays.stream(Schema.class.getDeclaredMethods())
.filter(method -> "fromPb".equals(method.getName())).findFirst().get();
static {
SCHEMA_FROM_PB.setAccessible(true);
}
public static Schema readBigQuerySchema(byte[] data) throws IOException {
List<TableFieldSchema> fieldsList = (List<TableFieldSchema>) JSON_FACTORY
.createJsonParser(new String(data, Charsets.UTF_8))
.parseArray(ArrayList.class, TableFieldSchema.class);
TableSchema tableSchema = new TableSchema().setFields(fieldsList);
try {
return (Schema) SCHEMA_FROM_PB.invoke(null, tableSchema);
} catch (IllegalAccessException | InvocationTargetException e) {
throw new RuntimeException(e);
}
}
}
Most helpful comment
here's a scala implementation using the bq dto object (so it works with nested fields too). im sure you can adapt it to something similar in java:
json -> schema
schema -> json