用Terraform创建BigQuery的数据集和表

3 年 ago

雅, 悟

4 minutes

首先

使用Terraform能方便地管理资源，是不是很方便呢？这次，我们就试试用Terraform创建BigQuery的数据集和表。

事前准备

我们使用docker-compose + Makefile来创建Terraform环境。

同时，我们假设您已经创建并颁发了可操作Google Cloud Platform（GCP）中BigQuery的服务帐户的凭据。

试一试

构成

目录结构如下所示。
我们将.tf文件组放在./terraform目录下。

.
├── .env
├── Makefile
├── docker-compose.yaml
└── terraform
    ├── bigquery.tf
    ├── credentials
    │   └── credentials.json
    └── main.tf

构建准备

有关凭证，应将其配置在./terraform/credentials/credentials.json文件中，并存储为环境变量于.env文件。

GOOGLE_APPLICATION_CREDENTIALS="credentials/credentials.json"`

在docker-compose.yaml文件中，描述了使用Terraform官方镜像和加载环境变量的内容。

version: '3'

services:
    terraform:
        image: hashicorp/terraform:1.0.0
        container_name: terraform
        volumes:
            - ./terraform:/terraform
        env_file: .env
        working_dir: /terraform

以下是Makefile。确保可以调用基本的Terraform操作。

.PHONY: init plan apply destroy
ARG="default"

init:
    @docker-compose run --rm terraform init

plan:
    @docker-compose run --rm terraform plan

apply:
    @docker-compose run --rm terraform apply

destroy:
    @docker-compose run --rm terraform destroy

土地改造

我将在.tf文件中进行描述。
首先是main.tf文件
这里将描述用于创建GCP资源的配置。

terraform {
  required_providers {
    google = {
      version = "~> 4.0.0"
    }
  }
}

provider "google" {
  project     = "<your project id>"
  region      = "<your region>"
}

请根据需要自行更改和。

接下来是bigquery.tf。

resource "google_bigquery_dataset" "dataset" {
  dataset_id                  = "example_dataset"
  friendly_name               = "test"
  description                 = "This is a test description"
  location                    = "<your location>"
}

resource "google_bigquery_table" "users" {
  dataset_id          = google_bigquery_dataset.dataset.dataset_id
  table_id            = "users"
  deletion_protection = false
  clustering          = ["user_id"]

  time_partitioning {
    field                    = "dateday"
    type                     = "DAY"
    require_partition_filter = true
  }

  schema = <<EOF
[
  {
    "name": "user_id",
    "type": "INT64",
    "mode": "REQUIRED",
    "description": "user id"
  },
  {
    "name": "name",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "user name"
  },
  {
    "name": "dateday",
    "type": "DATE",
    "mode": "REQUIRED",
    "description": "created date"
  }
]
EOF

}

请酌情读取并修改有关<您的位置>的内容。

数据集和表格的概念如下所示。

数据集

在这里，我们将按照Terraform官方示例的方式创建一个名为example_dataset的数据集。

桌子

根据schema的说明，在名为users的表中创建包含user_id、name、dateday三列的表。

另外，我们在日期(day)列上设置了基于创建日期的分区，并在user_id列上设置了聚簇索引。另外，在time_partitioning的部分，我们设置了require_partition_filter = true，以防止无分区的查询。

关于分区和聚类，可能有人在下面的文章中进行了验证，对于参考可能有帮助！

开始・计划・申请 · ·

现在已经准备好创建资源了，我们开始实际应用。

我们首先来进行初始化。

$ make init

Initializing the backend...

Initializing provider plugins...
- Reusing previous version of hashicorp/google from the dependency lock file
- Using previously-installed hashicorp/google v4.0.0

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

接下来，我们将通过计划来确认差异。

$ make plan

Creating terraform_terraform_run ... done
google_bigquery_dataset.dataset: Refreshing state... [id=projects/<your project id>/datasets/example_dataset]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # google_bigquery_dataset.dataset will be created
  + resource "google_bigquery_dataset" "dataset" {
      + creation_time               = (known after apply)
      + dataset_id                  = "example_dataset"
      + delete_contents_on_destroy  = false
      + description                 = "This is a test description"
      + etag                        = (known after apply)
      + friendly_name               = "test"
      + id                          = (known after apply)
      + last_modified_time          = (known after apply)
      + location                    = "<your location>"
      + project                     = (known after apply)
      + self_link                   = (known after apply)

      + access {
          + domain         = (known after apply)
          + group_by_email = (known after apply)
          + role           = (known after apply)
          + special_group  = (known after apply)
          + user_by_email  = (known after apply)

          + view {
              + dataset_id = (known after apply)
              + project_id = (known after apply)
              + table_id   = (known after apply)
            }
        }
    }

  # google_bigquery_table.users will be created
  + resource "google_bigquery_table" "users" {
      + clustering          = [
          + "user_id",
        ]
      + creation_time       = (known after apply)
      + dataset_id          = "example_dataset"
      + deletion_protection = false
      + etag                = (known after apply)
      + expiration_time     = (known after apply)
      + id                  = (known after apply)
      + last_modified_time  = (known after apply)
      + location            = (known after apply)
      + num_bytes           = (known after apply)
      + num_long_term_bytes = (known after apply)
      + num_rows            = (known after apply)
      + project             = (known after apply)
      + schema              = jsonencode(
            [
              + {
                  + description = "user id"
                  + mode        = "REQUIRED"
                  + name        = "user_id"
                  + type        = "INT64"
                },
              + {
                  + description = "user name"
                  + mode        = "NULLABLE"
                  + name        = "name"
                  + type        = "STRING"
                },
              + {
                  + description = "created date"
                  + mode        = "REQUIRED"
                  + name        = "dateday"
                  + type        = "DATE"
                },
            ]
        )
      + self_link           = (known after apply)
      + table_id            = "users"
      + type                = (known after apply)

      + time_partitioning {
          + expiration_ms            = (known after apply)
          + field                    = "dateday"
          + require_partition_filter = true
          + type                     = "DAY"
        }
    }

Plan: 2 to add, 0 to change, 0 to destroy.

我们可以看到两个资源被应用于数据集和表。

那么，让我们开始应用吧。

$ make apply

.
.
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:
.
.
.
Plan: 2 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

google_bigquery_dataset.dataset: Creating...
google_bigquery_dataset.dataset: Creation complete after 2s [id=projects/<your porject id>/datasets/example_dataset]
google_bigquery_table.users: Creating...
google_bigquery_table.users: Creation complete after 0s [id=projects/<your porject id>/datasets/example_dataset/tables/users]

我們似乎成功地進行了應用。現在可以在控制台上確認是否已經成功創建了資源。

通过使用Terraform，我们成功地创建了BigQuery的资源。

最终

我们可以非常容易地使用Terraform创建BigQuery资源！这次我们仅仅创建了数据集和表格的部分，但我们还希望继续进行权限管理的工作！

顺便说一句，除了创建和部署密钥之外，是否有其他好的方法来管理凭据呢？
我想停止创建和部署密钥…
我会另外调查这方面的问题，如果可以的话，再写一篇文章！

好了，這次就到這裡吧！