当使用状态桶(S3 bucket)管理tfstate时,如果发生更新冲突,会发生什么情况,请确认

首先

首先

在团队开发时,有人认为使用存储库管理和状态桶来管理Terraform是一个不错的选择。为了确认一下,我测试了一下当多个人不加思考地竞争提交tfstate文件时会发生什么情况。

组成

假设有一个Terraform的存储库,其原始组织构成如下。

.
├── 00_main.tf
└── 01_resource1.tf
terraform {
  backend "s3" {
    bucket = "terraform-test"
    key    = "test/terraform.tfstate"
    region = "ap-northeast-1"
  }
resource "aws_s3_bucket" "terraform_test_1" {
  bucket = "terraform-test-1"
  acl    = "private"
}

在这个存储库中,成员A和B在检出了上述状态之后。

    メンバAが以下のリソースをapply
resource "aws_s3_bucket" "terraform_test_2" {
  bucket = "terraform-test-2"
  acl    = "private"
}
    その後、メンバBがリポジトリを更新しないで以下のリソースをapply
resource "aws_s3_bucket" "terraform_test_3" {
  bucket = "terraform-test-3"
  acl    = "private"
}

如果这样会怎么样呢?

实验结果

也许Terraform可以很好地管理tfstate的更新时间,但实际上并不是这样的。

在A成员的terraform计划中,当然只执行了对terraform_test_2存储桶的添加操作。

Terraform will perform the following actions:

  # aws_s3_bucket.terraform_test_2 will be created
  + resource "aws_s3_bucket" "terraform_test_2" {
      + acceleration_status         = (known after apply)
      + acl                         = "private"
      + arn                         = (known after apply)
      + bucket                      = "terraform-test-2"
      + bucket_domain_name          = (known after apply)
      + bucket_regional_domain_name = (known after apply)
      + force_destroy               = false
      + hosted_zone_id              = (known after apply)
      + id                          = (known after apply)
      + region                      = (known after apply)
      + request_payer               = (known after apply)
      + website_domain              = (known after apply)
      + website_endpoint            = (known after apply)

      + versioning {
          + enabled    = (known after apply)
          + mfa_delete = (known after apply)
        }
    }

Plan: 1 to add, 0 to change, 0 to destroy.

然而,在成员B的Terraform计划中,出现了以下情况。

Terraform will perform the following actions:

  # aws_s3_bucket.terraform_test_2 will be destroyed
  - resource "aws_s3_bucket" "terraform_test_2" {
      - acl                         = "private" -> null
      - arn                         = "arn:aws:s3:::terraform-test-2" -> null
      - bucket                      = "terraform-test-2" -> null
      - bucket_domain_name          = "terraform-test-2.s3.amazonaws.com" -> null
      - bucket_regional_domain_name = "terraform-test-2.s3.ap-northeast-1.amazonaws.com" -> null
      - force_destroy               = false -> null
      - hosted_zone_id              = "Z2M4EHUR26P7ZW" -> null
      - id                          = "terraform-test-2" -> null
      - region                      = "ap-northeast-1" -> null
      - request_payer               = "BucketOwner" -> null

      - versioning {
          - enabled    = false -> null
          - mfa_delete = false -> null
        }
    }

  # aws_s3_bucket.terraform_test_3 will be created
  + resource "aws_s3_bucket" "terraform_test_3" {
      + acceleration_status         = (known after apply)
      + acl                         = "private"
      + arn                         = (known after apply)
      + bucket                      = "terraform-test-3"
      + bucket_domain_name          = (known after apply)
      + bucket_regional_domain_name = (known after apply)
      + force_destroy               = false
      + hosted_zone_id              = (known after apply)
      + id                          = (known after apply)
      + region                      = (known after apply)
      + request_payer               = (known after apply)
      + website_domain              = (known after apply)
      + website_endpoint            = (known after apply)

      + versioning {
          + enabled    = (known after apply)
          + mfa_delete = (known after apply)
        }
    }

Plan: 1 to add, 0 to change, 1 to destroy.

嗯,这很危险。如果不仔细确认terraform plan的内容,就有可能轻而易举地删除别人创建的资源……。

而且,即使设置资源的prevent_destroy = true,不会导致其不会被删除,只是明确表示无法通过destroy命令来删除标有”prevent_destroy = true”的文件。如果根本没有.tf文件存在,它似乎会毫不留情地成为destroy的对象。

当然,如果成员A正确执行git push,成员B适时执行git pull,就可以避免这场悲剧,但最终还是依赖于人的运作不可取啊……如果使用企业版,能够解决这个问题吗?

结论。

原来啊,人工执行terraform apply本身就是胡扯的事情吗?

如果有多个人在管理存储库,我们应该制定一个有效的分支策略,将经过验证的IaC请求合并到主分支上,并由适当的成员审核批准,从而启动适用于商业环境的IaC部署流程。我们需要建立一个流程来确保没有错误。

在实践中,Terraform揭示了在AWS中的系统设计和最佳实践。

因此,不要建立半成品的手动terraform apply环境。
最多只需要建立验证环境和开发环境。

广告
将在 10 秒后关闭
bannerAds