尝试使用Terraform部署Databricks工作区
由于之前只是翻译了文件,实际上并没有做过,所以我按照这篇文章的步骤进行部署。Git的步骤将被忽略。
这里也有说明步骤的。
请注意,本文是使用Mac进行的操作。
这是将应用部署至AWS的过程。采用了客户管理VPC并且未使用PrivateLink的部署配置。
这是将应用部署至AWS的过程。采用了客户管理VPC并且未使用PrivateLink的部署配置。
准备好
Terraform的安装
在终端中执行以下命令。
brew tap hashicorp/tap
brew install hashicorp/tap/terraform
执行以上命令时,
==> Installing terraform from hashicorp/tap
Error: Your Command Line Tools are too outdated.
Update them from Software Update in System Preferences.
If that doesn't show you any updates, run:
sudo rm -rf /Library/Developer/CommandLineTools
sudo xcode-select --install
Alternatively, manually download them from:
https://developer.apple.com/download/all/.
You should download the Command Line Tools for Xcode 13.4.
如果出现以下错误,请执行以下操作来更新命令行工具。
sudo rm -rf /Library/Developer/CommandLineTools
sudo xcode-select --install
安装和配置AWS CLI。
我会使用这个GUI安装程序来安装。
参考此设定,请获取AWS访问密钥,并在执行aws configure时指定。
Terraform的设定 (Terraform de
创建一个工作目录并进入该目录。
mkdir normal_workspace
cd normal_workspace
在接下来的过程中,我将创建一些文件。
vars.tf -> 变量.tf
这是一个定义变量的文件。请根据需要更新要部署到的AWS区域(region)和VPC的CIDR(cidr_block)。
variable "databricks_account_username" {}
variable "databricks_account_password" {}
variable "databricks_account_id" {}
variable "tags" {
default = {}
}
variable "cidr_block" {
default = "10.4.0.0/16"
}
variable "region" {
default = "ap-northeast-1"
}
// See https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/string
resource "random_string" "naming" {
special = false
upper = false
length = 6
}
locals {
prefix = "demo-${random_string.naming.result}"
}
初始化.tf
使用必要的Databricks提供程序和AWS提供程序来初始化Terraform。
terraform {
required_providers {
databricks = {
source = "databricks/databricks"
version = "1.0.0"
}
aws = {
source = "hashicorp/aws"
version = "3.49.0"
}
}
}
provider "aws" {
region = var.region
}
// Initialize provider in "MWS" mode to provision the new workspace.
// alias = "mws" instructs Databricks to connect to https://accounts.cloud.databricks.com, to create
// a Databricks workspace that uses the E2 version of the Databricks on AWS platform.
// See https://registry.terraform.io/providers/databricks/databricks/latest/docs#authentication
provider "databricks" {
alias = "mws"
host = "https://accounts.cloud.databricks.com"
username = var.databricks_account_username
password = var.databricks_account_password
}
跨账户角色.tf
我会在您的AWS帐户上创建所需的IAM跨帐户角色和相关策略。
请注意,以下的”time_sleep.wait_for_cross_account_role”资源是为了等待IAM角色的创建而设置的。
// Create the required AWS STS assume role policy in your AWS account.
// See https://registry.terraform.io/providers/databricks/databricks/latest/docs/data-sources/aws_assume_role_policy
data "databricks_aws_assume_role_policy" "this" {
external_id = var.databricks_account_id
}
// Create the required IAM role in your AWS account.
// See https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role
resource "aws_iam_role" "cross_account_role" {
name = "${local.prefix}-crossaccount"
assume_role_policy = data.databricks_aws_assume_role_policy.this.json
tags = var.tags
}
// Create the required AWS cross-account policy in your AWS account.
// See https://registry.terraform.io/providers/databricks/databricks/latest/docs/data-sources/aws_crossaccount_policy
data "databricks_aws_crossaccount_policy" "this" {}
// Create the required IAM role inline policy in your AWS account.
// See https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy
resource "aws_iam_role_policy" "this" {
name = "${local.prefix}-policy"
role = aws_iam_role.cross_account_role.id
policy = data.databricks_aws_crossaccount_policy.this.json
}
resource "time_sleep" "wait_for_cross_account_role" {
depends_on = [aws_iam_role_policy.this, aws_iam_role.cross_account_role]
create_duration = "20s"
}
// Properly configure the cross-account role for the creation of new workspaces within your AWS account.
// See https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_credentials
resource "databricks_mws_credentials" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
role_arn = aws_iam_role.cross_account_role.arn
credentials_name = "${local.prefix}-creds"
depends_on = [time_sleep.wait_for_cross_account_role]
}
vpc.tf 的含义是什么?
我将使用Terraform指示您在您的AWS账户上创建Databricks所需的VPC。
// Allow access to the list of AWS Availability Zones within the AWS Region that is configured in vars.tf and init.tf.
// See https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/availability_zones
data "aws_availability_zones" "available" {}
// Create the required VPC resources in your AWS account.
// See https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "3.2.0"
name = local.prefix
cidr = var.cidr_block
azs = data.aws_availability_zones.available.names
tags = var.tags
enable_dns_hostnames = true
enable_nat_gateway = true
single_nat_gateway = true
create_igw = true
public_subnets = [cidrsubnet(var.cidr_block, 3, 0)]
private_subnets = [cidrsubnet(var.cidr_block, 3, 1),
cidrsubnet(var.cidr_block, 3, 2)]
manage_default_security_group = true
default_security_group_name = "${local.prefix}-sg"
default_security_group_egress = [{
cidr_blocks = "0.0.0.0/0"
}]
default_security_group_ingress = [{
description = "Allow all internal TCP and UDP"
self = true
}]
}
// Create the required VPC endpoints within your AWS account.
// See https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest/submodules/vpc-endpoints
module "vpc_endpoints" {
source = "terraform-aws-modules/vpc/aws//modules/vpc-endpoints"
version = "3.2.0"
vpc_id = module.vpc.vpc_id
security_group_ids = [module.vpc.default_security_group_id]
endpoints = {
s3 = {
service = "s3"
service_type = "Gateway"
route_table_ids = flatten([
module.vpc.private_route_table_ids,
module.vpc.public_route_table_ids])
tags = {
Name = "${local.prefix}-s3-vpc-endpoint"
}
},
sts = {
service = "sts"
private_dns_enabled = true
subnet_ids = module.vpc.private_subnets
tags = {
Name = "${local.prefix}-sts-vpc-endpoint"
}
},
kinesis-streams = {
service = "kinesis-streams"
private_dns_enabled = true
subnet_ids = module.vpc.private_subnets
tags = {
Name = "${local.prefix}-kinesis-vpc-endpoint"
}
}
}
tags = var.tags
}
// Properly configure the VPC and subnets for Databricks within your AWS account.
// See https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_networks
resource "databricks_mws_networks" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
network_name = "${local.prefix}-network"
security_group_ids = [module.vpc.default_security_group_id]
subnet_ids = module.vpc.private_subnets
vpc_id = module.vpc.vpc_id
}
根桶.tf
我将在您的AWS帐户中创建Databricks所需的S3根存储桶。
// Create the S3 root bucket.
// See https://registry.terraform.io/modules/terraform-aws-modules/s3-bucket/aws/latest
resource "aws_s3_bucket" "root_storage_bucket" {
bucket = "${local.prefix}-rootbucket"
acl = "private"
versioning {
enabled = false
}
force_destroy = true
tags = merge(var.tags, {
Name = "${local.prefix}-rootbucket"
})
}
// Ignore public access control lists (ACLs) on the S3 root bucket and on any objects that this bucket contains.
// See https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_public_access_block
resource "aws_s3_bucket_public_access_block" "root_storage_bucket" {
bucket = aws_s3_bucket.root_storage_bucket.id
ignore_public_acls = true
depends_on = [aws_s3_bucket.root_storage_bucket]
}
// Configure a simple access policy for the S3 root bucket within your AWS account, so that Databricks can access data in it.
// See https://registry.terraform.io/providers/databricks/databricks/latest/docs/data-sources/aws_bucket_policy
data "databricks_aws_bucket_policy" "this" {
bucket = aws_s3_bucket.root_storage_bucket.bucket
}
// Attach the access policy to the S3 root bucket within your AWS account.
// See https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_policy
resource "aws_s3_bucket_policy" "root_bucket_policy" {
bucket = aws_s3_bucket.root_storage_bucket.id
policy = data.databricks_aws_bucket_policy.this.json
depends_on = [aws_s3_bucket_public_access_block.root_storage_bucket]
}
// Configure the S3 root bucket within your AWS account for new Databricks workspaces.
// See https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_storage_configurations
resource "databricks_mws_storage_configurations" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
bucket_name = aws_s3_bucket.root_storage_bucket.bucket
storage_configuration_name = "${local.prefix}-storage"
}
工作空间.tf
给您的Databricks帐户创建一个工作区,通过Terraform进行指示。
// Set up the Databricks workspace to use the E2 version of the Databricks on AWS platform.
// See https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/mws_workspaces
resource "databricks_mws_workspaces" "this" {
provider = databricks.mws
account_id = var.databricks_account_id
aws_region = var.region
workspace_name = local.prefix
deployment_name = local.prefix
credentials_id = databricks_mws_credentials.this.credentials_id
storage_configuration_id = databricks_mws_storage_configurations.this.storage_configuration_id
network_id = databricks_mws_networks.this.network_id
}
// Capture the Databricks workspace's URL.
output "databricks_host" {
value = databricks_mws_workspaces.this.workspace_url
}
// Initialize the Databricks provider in "normal" (workspace) mode.
// See https://registry.terraform.io/providers/databricks/databricks/latest/docs#authentication
provider "databricks" {
// In workspace mode, you don't have to give providers aliases. Doing it here, however,
// makes it easier to reference, for example when creating a Databricks personal access token
// later in this file.
alias = "created_workspace"
host = databricks_mws_workspaces.this.workspace_url
}
// Create a Databricks personal access token, to provision entities within the workspace.
resource "databricks_token" "pat" {
provider = databricks.created_workspace
comment = "Terraform Provisioning"
lifetime_seconds = 86400
}
// Export the Databricks personal access token's value, for integration tests to run on.
output "databricks_token" {
value = databricks_token.pat.token_value
sensitive = true
}
教程.tfvars
请指定上述文件中引用的Databricks账户ID、账户所有者的用户ID和密码。不建议在文件中硬编码,因此将其拆分为单独的文件。如果使用git,请在.gitignore文件中包含*.tfvars以排除这些扩展名的文件。
databricks_account_username = "<your-Databricks-account-username>"
databricks_account_password = "<your-Databricks-account-password>"
databricks_account_id = "<your-Databricks-account-ID>"
使用Terraform创建Databricks和AWS资源
通过执行以下步骤,将生成所定义的资源并部署工作空间。
terraform init
terraform apply -var-file="tutorial.tfvars"
集群也已经启动了。多么方便啊。
清理工作
用以下命令将所有资源销毁。同时需要提供Databricks账户ID、账户所有者用户名和密码。
terraform destroy
因为经常会忘记资源的清理,所以这个功能非常有帮助。
下一步,我将尝试其他的部署模式。
Databricks 免费试用
Databricks 免费试用