Aws-cdk: Default data location for AWS Glue tables

Created on 10 Jun 2020  路  3Comments  路  Source: aws/aws-cdk

The Question

I started rolling out a Glue table with the following CDK structs:

const trailTable = new Table(this, 'TrailTable', {
    bucket: <ref to bucket>,
    database: <ref to db>,
    tableName: 'table',
    columns: [{
        name: 'user',
        type: Schema.STRING,
    }],
    dataFormat: DataFormat.JSON,
});

To my surprise the Glue table pointed towards my bucket using this URL: s3://<bucket>/data.

Looking into the documentation of CDK, indeed /data is the default location for the Glue data to be discovered from (this is the s3Prefix property). But little explanation is given why this is the default. Is this done to follow certain guidelines or is this a randomly chosen path?

I would have expected the default to be just empty; no nested folder in the bucket but just the root. Defining the blank s3Prefix seems to be out of place to achieve this behavior:

const trailTable = new Table(this, 'TrailTable', {
    ...,
    s3Prefix: '',
});

Proposed solution

Remove the default that points to /data for the s3Prefix and use empty string instead

OR

Provide documentation that explains why /data is chosen as default

@aws-cdaws-glue efforsmall feature-request in-progress

Most helpful comment

Marking this as a feature request to use an empty string as the default data location, which seems like a more reasonable default.

All 3 comments

@sam-goodwin - As the original author of this, mind if I pick your brain?

Marking this as a feature request to use an empty string as the default data location, which seems like a more reasonable default.

There was no intelligent reasoning. I agree it should be changed since it鈥檚 too opinionated.

Was this page helpful?
0 / 5 - 0 ratings