We currently have the azurerm_storage_data_lake_gen2_filesystem
resource for initialising ADLS Gen2 filesystems, but lack the ability to manage paths and ACLs with the provider.
An example of where this would be useful is provisioning ADLS Gen 2 accounts and Azure Databricks via azurerm
provider and then configuring Databricks mounts using the databricks
provider from databrickslabs. For this scenario we need to be able to create the paths in the ADLS account and set the appropriate ACLs to allow access.
Some initial thoughts on what the new resources might look like are below.
#
# Existing resource types for context
#
resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "West Europe"
}
resource "azurerm_storage_account" "example" {
name = "examplestorageacc"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
account_tier = "Standard"
account_replication_type = "LRS"
account_kind = "StorageV2"
is_hns_enabled = "true"
}
resource "azurerm_storage_data_lake_gen2_filesystem" "example" {
name = "example"
storage_account_id = azurerm_storage_account.example.id
}
#
# New resource types
#
# create a path item (folder)
# future - allow create files by uploading contents?
resource "azurerm_storage_data_lake_gen2_path" "example" {
storage_account_id = azurerm_storage_account.example.id
filesystem_name = azurerm_storage_data_lake_gen2_filesystem.name
path = "my-path"
}
# set ACLs on a path
resource "azurerm_storage_data_lake_gen2_path_acl" "example" {
storage_account_id = azurerm_storage_account.example.id
filesystem_name = "example"
path = azurerm_storage_data_lake_gen2_path.example.path
ace {
scope = "default"
type = "user"
id = "[email protected]"
permissions = "rwx"
}
ace {
type = "user"
id = "stuart@contosocom"
permissions = "rwx"
}
ace {
scope = "default"
type = "group"
id = "[email protected]"
permissions = "r--"
}
ace {
type = "group"
id = "[email protected]"
permissions = "r--"
}
ace {
type = "other"
permissions = "--"
}
ace {
type = "mask"
permissions = "rwx"
}
# resulting ACL string: "default:user:[email protected]:rwx,user:[email protected]:rwx,default:group:[email protected]:r--,group:[email protected]:other::---;mask:rwx"
}
UPDATE: After discussing with @tombuildsstuff the plan is to combine the path and ACL resources into a single resource:
resource "azurerm_storage_data_lake_gen2_path" "example" {
storage_account_id = azurerm_storage_account.example.id
filesystem_name = azurerm_storage_data_lake_gen2_filesystem.name
path = "my-path"
ace {
scope = "default"
type = "user"
id = "[email protected]"
permissions = "rwx"
}
ace {
type = "user"
id = "stuart@contosocom"
permissions = "rwx"
}
ace {
scope = "default"
type = "group"
id = "[email protected]"
permissions = "r--"
}
ace {
type = "group"
id = "[email protected]"
permissions = "r--"
}
ace {
type = "other"
permissions = "--"
}
ace {
type = "mask"
permissions = "rwx"
}
# resulting ACL string: "default:user:[email protected]:rwx,user:[email protected]:rwx,default:group:[email protected]:r--,group:[email protected]:other::---;mask:rwx"
}
The plan is to make the ace
list optional and computed to allow you to specify just a path that should be created and have it inherit the permissions from the parent folder's permissions.
Quick note to say that I've started work on this...
This is turning out to be a bigger task than I first thought (and I've had a couple of things that have taken some time away from working on it). My plan is to split it up into smaller pieces of work that are potentially mergeable as I'm keen not to end up with a load of WIP that goes stale if I haven't managed to finish before my team's next engagement starts
My plan is:
Once 1 is done it opens up a scenario where you can create a Data Lake and set up directories with appropriate permissions using RBAC at the file system level for Databricks and other compute to be able to access and feels like a starting scenario to enable. Adding in 2 would allow you to set permissions with ACLs at a more granular level (folders within the file system) and feels like a fairly compelling scenario.
@stuartleeks thanks for this!
Directories on ACL (including default ACL) should cover 99.99% of use-cases that I see in our environments, all of which are the ACL + default ACL on the top-level directory (not /, but one below it).
Thanks @alex-goncharov - good feedback to get!
I've got an initial PR on the storage SDK that is used and corresponding WIP PR on the provider that delivers part 1 (directory creation). If they are both ok and can be merged then I might get time to do part 2 (the ACL support for directories) ๐ค
I've pushed an update to the PR that adds part 2, i.e. support for ACLs :-)
I see that this PR in go-autorest
is merged and looks like it has been released. Does that mean that we can proceed with updating Giovanni
to use the latest version of go-autorest
and then update the Giovanni
version in azurerm
? ๐ค
Fixed via #7521
This has been released in version 2.37.0 of the provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. As an example:
provider "azurerm" {
version = "~> 2.37.0"
}
# ... other configuration ...
I'm going to lock this issue because it has been closed for _30 days_ โณ. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error ๐ค ๐ , please reach out to my human friends ๐ [email protected]. Thanks!
Most helpful comment
Quick note to say that I've started work on this...