As the title states, when using the sqladmin v1beta4 sql.instances.export functionality, upon export, records containing NULL values have their NULLs represented as "N by default -- which plays hell on every CSV parser & BigQuery.
This was done using the googleapis v18.0.0 module.
Example Output:
# I have replaced all not-null values with categorical representations
# The, "N, values are the NULLs
"2017-03-27 18:37:54","N,"city","country","N,"N,lat,long,int,"message","ja-jp","Message","IE","Windows","11","URL","URL","URL","URL","HASH"
Here is my code:
'use strict';
(function() {
var google = require('googleapis');
var sqladmin = google.sqladmin('v1beta4');
module.exports = function exportCloudSqlToGCS(config) {
google.auth.getApplicationDefault(function(err, authClient) {
if (err) {
console.log('Authentication failed because of ', err);
return;
}
if (authClient.createScopedRequired && authClient.createScopedRequired()) {
var scopes = [
// Needed for ability to import / export.
'https://www.googleapis.com/auth/cloud-platform',
// Needed to login to the MySQL database successfully.
'https://www.googleapis.com/auth/sqlservice.admin'
];
authClient = authClient.createScoped(scopes);
}
var request = {
// Project ID of the project that contains the instance to be exported.
project: config.gcloudProject, // 'my-project-name'
// Cloud SQL instance ID. This does not include the project ID.
instance: config.gcloudInstance, // 'my-cloud-sql-db-name'
resource: {
// Contains details about the export operation.
exportContext: {
// This is always sql#exportContext.
kind: "sql#exportContext",
// The file type for the specified uri (e.g. SQL or CSV)
fileType: config.exportFileType, // CSV
/**
* The path to the file in GCS where the export will be stored.
* The URI is in the form gs://bucketName/fileName.
* If the file already exists, the operation fails.
* If fileType is SQL and the filename ends with .gz, the contents are compressed.
*/
uri: config.gcsBucketFilePath,
/**
* Databases from which the export is made.
* If fileType is SQL and no database is specified, all databases are exported.
* If fileType is CSV, you can optionally specify at most one database to export.
* If csvExportOptions.selectQuery also specifies the database, this field will be ignored.
*/
databases: config.databases, // ['myDatabase']
// Options for exporting data as SQL statements.
// sqlExportOptions: {
// /**
// * Tables to export, or that were exported, from the specified database.
// * If you specify tables, specify one and only one database.
// */
// tables: config.tables,
// // Export only schemas?
// schemaOnly: config.schemaOnly
// },
csvExportOptions: {
// The select query used to extract the data.
selectQuery: config.selectQuery // "SELECT * from myTable"
}
}
},
// Auth client
auth: authClient
};
// Kick off export with requested arguments.
sqladmin.instances.export(request, function(err, result) {
if (err) {
console.log(err);
} else {
console.log(result);
}
});
});
};
}).call(this);
I believe the proper NULL representation should be "\N" or at least an empty string, or simply, no value at all: value,,value,value,,value,value,value
This was likely a simple typo -- but it is a pain. I've worked around it by wrapping every field in a simple IFNULL( fieldName, '') as fieldName -- but that's rather tedious to maintain.
I'd like to add, if you guys are accepting feature/enhancement requests, adding to the csvExportOptions object the ability to set fieldDelimiters, rowDelimiters, encoding, and other various options would be a huge win.
Thanks.
Hi @bllevy,
Thanks for reporting this. That does sound annoying. We will look into it.
@lukesneeringer I would appreciate any updates on this. Thank you for your time.
What's more annoying is that SQL instance can't even correctly import CSV that it exported: all null values are converted to 0 for integers and "N for strings. I'm using web console for import and export.
Export generates the following query:
SELECT * FROM `test`.`test`
INTO OUTFILE '...'
CHARACTER SET 'utf8'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '"'
Probably we just need to change escape symbol from " to \
Any status update or timescales for this bug fix?
Allowing the "Escaped By" value to be configurable would be the best solution but in the short term could Poma's suggestion of changing the escape character to \ be implemented? It would significant improve the compatible of exported files with other systems like BigQuery.
Reported as a bug here as well https://issuetracker.google.com/issues/64579566
Has anyone managed to get a response from Google as to when they will fix the import/export via csv from CloudSQL?
You could think they have deliberately crippled the functionality!
hey, any updates of this bug, there are like more than 20 new builds of SQL admin API after posting this bug ands still not fixed, thats very frustrating!
any one found a work around ?
Alright folks, so here's the deal with this one. Sadly, I can reproduce the issue with the REST API directly:
https://cloud.google.com/sql/docs/mysql/admin-api/v1beta4/instances/export
That means there's really nothing I can do about it in this library :( Any fix for the issue will need to come from here:
https://issuetracker.google.com/issues/64579566
When those issues are resolved, it will "just work" in the client library. I apologize, and wish there were something we could do about it here. I'd suggest jumping on that issue, watching any updates, and leaving any feedback there.
In the meantime, I will go nudge the PM who owns Cloud SQL to fix it. We shall see :)
@JustinBeckwith thanks for the update! I've since written an ETL pipeline from Cloud SQL to GBQ -- but this method could still have some merits for my team in the future. I'll keep my eyes open to it. :)
Most helpful comment
Any status update or timescales for this bug fix?
Allowing the "Escaped By" value to be configurable would be the best solution but in the short term could Poma's suggestion of changing the escape character to \ be implemented? It would significant improve the compatible of exported files with other systems like BigQuery.