当使用状态桶(S3 bucket)管理tfstate时,如果发生更新冲突,会发生什么情况,请确认
首先
首先
在团队开发时,有人认为使用存储库管理和状态桶来管理Terraform是一个不错的选择。为了确认一下,我测试了一下当多个人不加思考地竞争提交tfstate文件时会发生什么情况。
组成
假设有一个Terraform的存储库,其原始组织构成如下。
.
├── 00_main.tf
└── 01_resource1.tf
terraform {
backend "s3" {
bucket = "terraform-test"
key = "test/terraform.tfstate"
region = "ap-northeast-1"
}
resource "aws_s3_bucket" "terraform_test_1" {
bucket = "terraform-test-1"
acl = "private"
}
在这个存储库中,成员A和B在检出了上述状态之后。
- メンバAが以下のリソースをapply
resource "aws_s3_bucket" "terraform_test_2" {
bucket = "terraform-test-2"
acl = "private"
}
- その後、メンバBがリポジトリを更新しないで以下のリソースをapply
resource "aws_s3_bucket" "terraform_test_3" {
bucket = "terraform-test-3"
acl = "private"
}
如果这样会怎么样呢?
实验结果
也许Terraform可以很好地管理tfstate的更新时间,但实际上并不是这样的。
在A成员的terraform计划中,当然只执行了对terraform_test_2存储桶的添加操作。
Terraform will perform the following actions:
# aws_s3_bucket.terraform_test_2 will be created
+ resource "aws_s3_bucket" "terraform_test_2" {
+ acceleration_status = (known after apply)
+ acl = "private"
+ arn = (known after apply)
+ bucket = "terraform-test-2"
+ bucket_domain_name = (known after apply)
+ bucket_regional_domain_name = (known after apply)
+ force_destroy = false
+ hosted_zone_id = (known after apply)
+ id = (known after apply)
+ region = (known after apply)
+ request_payer = (known after apply)
+ website_domain = (known after apply)
+ website_endpoint = (known after apply)
+ versioning {
+ enabled = (known after apply)
+ mfa_delete = (known after apply)
}
}
Plan: 1 to add, 0 to change, 0 to destroy.
然而,在成员B的Terraform计划中,出现了以下情况。
Terraform will perform the following actions:
# aws_s3_bucket.terraform_test_2 will be destroyed
- resource "aws_s3_bucket" "terraform_test_2" {
- acl = "private" -> null
- arn = "arn:aws:s3:::terraform-test-2" -> null
- bucket = "terraform-test-2" -> null
- bucket_domain_name = "terraform-test-2.s3.amazonaws.com" -> null
- bucket_regional_domain_name = "terraform-test-2.s3.ap-northeast-1.amazonaws.com" -> null
- force_destroy = false -> null
- hosted_zone_id = "Z2M4EHUR26P7ZW" -> null
- id = "terraform-test-2" -> null
- region = "ap-northeast-1" -> null
- request_payer = "BucketOwner" -> null
- versioning {
- enabled = false -> null
- mfa_delete = false -> null
}
}
# aws_s3_bucket.terraform_test_3 will be created
+ resource "aws_s3_bucket" "terraform_test_3" {
+ acceleration_status = (known after apply)
+ acl = "private"
+ arn = (known after apply)
+ bucket = "terraform-test-3"
+ bucket_domain_name = (known after apply)
+ bucket_regional_domain_name = (known after apply)
+ force_destroy = false
+ hosted_zone_id = (known after apply)
+ id = (known after apply)
+ region = (known after apply)
+ request_payer = (known after apply)
+ website_domain = (known after apply)
+ website_endpoint = (known after apply)
+ versioning {
+ enabled = (known after apply)
+ mfa_delete = (known after apply)
}
}
Plan: 1 to add, 0 to change, 1 to destroy.
嗯,这很危险。如果不仔细确认terraform plan的内容,就有可能轻而易举地删除别人创建的资源……。
而且,即使设置资源的prevent_destroy = true,不会导致其不会被删除,只是明确表示无法通过destroy命令来删除标有”prevent_destroy = true”的文件。如果根本没有.tf文件存在,它似乎会毫不留情地成为destroy的对象。
当然,如果成员A正确执行git push,成员B适时执行git pull,就可以避免这场悲剧,但最终还是依赖于人的运作不可取啊……如果使用企业版,能够解决这个问题吗?
结论。
原来啊,人工执行terraform apply本身就是胡扯的事情吗?
如果有多个人在管理存储库,我们应该制定一个有效的分支策略,将经过验证的IaC请求合并到主分支上,并由适当的成员审核批准,从而启动适用于商业环境的IaC部署流程。我们需要建立一个流程来确保没有错误。
在实践中,Terraform揭示了在AWS中的系统设计和最佳实践。
因此,不要建立半成品的手动terraform apply环境。
最多只需要建立验证环境和开发环境。