在IBM云上的Red Hat OpenShift中部署自定义的Ansible Operator(通过OLM安装)
目标
在之前的文章中,我们手动将自己创建的Operator安装到集群中。这一次,我们将尝试使用Operator Lifecycle Manager(OLM)。通过使用OLM,我们可以通过OpenShift控制台对Operator进行操作。
步骤
假设从上一篇文章中的步骤执行完毕后的状态开始。
创建Bundle
将用于OLM管理的Operator包信息称为Bundle。生成Bundle的定义。需要手动输入一些信息。
$ make bundle
operator-sdk generate kustomize manifests -q
Display name for the operator (required):
> Hello Operator
Description for the operator (required):
> An example of Ansible Operator
Provider's name for the operator (required):
> teruq
Any relevant URL for the provider name (optional):
> https://qiita.com/teruq
Comma-separated list of keywords for your operator (required):
> ansible,hello
Comma-separated list of maintainers and their emails (e.g. 'name1:email1, name2:email2') (required):
> teruq
cd config/manager && /.../hello-ansible-operator/bin/kustomize edit set image controller=jp.icr.io/teruq/hello-ansible-operator:0.0.1
/.../hello-ansible-operator/bin/kustomize build config/manifests | operator-sdk generate bundle -q --overwrite --version 0.0.1
INFO[0002] Creating bundle.Dockerfile
INFO[0002] Creating bundle/metadata/annotations.yaml
INFO[0002] Bundle metadata generated suceessfully
operator-sdk bundle validate ./bundle
INFO[0000] All validation tests have completed successfully
构建捆绑图像
Bundle本身也是一个容器镜像。对镜像进行构建。
$ make bundle-build
docker build -f bundle.Dockerfile -t jp.icr.io/teruq/hello-ansible-operator-bundle:v0.0.1 .
...
捆绑推送图像
若经过上一篇文章所提到的步骤已经过了一段时间,可能会出现认证过期的情况,因此需要重新进行认证。
$ export IBMCLOUD_API_KEY=********
$ ibmcloud login
$ ibmcloud cr login
将图像推送到ICR。
$ make bundle-push
make docker-push IMG=jp.icr.io/teruq/hello-ansible-operator-bundle:v0.0.1
make[1]: ディレクトリ '/.../hello-ansible-operator' に入ります
docker push jp.icr.io/teruq/hello-ansible-operator-bundle:v0.0.1
The push refers to repository [jp.icr.io/teruq/hello-ansible-operator-bundle]
598401fa8392: Layer already exists
14f153f590a2: Layer already exists
9d6970a7da8f: Layer already exists
v0.0.1: digest: sha256:aa57c38c38d897e1275e4b95dbc62c3b53d37399bdbb7529e9ff15ab4c098494 size: 939
make[1]: ディレクトリ '/.../hello-ansible-operator' から出ます
我会确认是否在ICR注册。
$ ibmcloud cr images | grep hello-ansible
$ ibmcloud cr images | grep hello-ansible
jp.icr.io/teruq/hello-ansible-operator 0.0.1 031595ea6f53 teruq 2 hours ago 156 MB 55 件の問題
jp.icr.io/teruq/hello-ansible-operator-bundle v0.0.1 aa57c38c38d8 teruq 1 hour ago 3.4 kB サポート対象外 OS
将 ImagePullSecret 复制并链接
为了从ICR获取各种映像,像往常一样,复制ImagePullSecret。
创建命名空间
事先创建一个名为qiita-operators的命名空间以安装Operator。
$ oc create ns qiita-operators
将ImagePullSecret复制
从默认命名空间中复制。
$ oc get secret all-icr-io -n default -o yaml | grep -v namespace: | oc create -n qiita-operators -f-
将ImagePullSecret链接到ServiceAccount。
将all-icr-io链接到Bundle的defaultServiceAccount。
$ oc secrets link default all-icr-io --for pull -n qiita-operators
使用OLM进行安装。
确认 OLM
OpenShift 4.x及以上的版本会默认安装OLM。ROKS也是如此。您可以按照下面的方式来确认状态。
$ operator-sdk olm status --olm-namespace openshift-operator-lifecycle-manager
I0930 15:08:43.304528 15260 request.go:668] Waited for 1.0293062s due to client-side throttling, not priority and fairness, request: GET:https://c100-e.jp-tok.containers.cloud.ibm.com:32154/apis/samples.operator.openshift.io/v1?timeout=32s
INFO[0004] Fetching CRDs for version "0.16.1"
INFO[0004] Using locally stored resource manifests
INFO[0004] Successfully got OLM status for version "0.16.1"
NAME NAMESPACE KIND STATUS
operators.operators.coreos.com CustomResourceDefinition Installed
operatorgroups.operators.coreos.com CustomResourceDefinition Installed
installplans.operators.coreos.com CustomResourceDefinition Installed
clusterserviceversions.operators.coreos.com CustomResourceDefinition Installed
subscriptions.operators.coreos.com CustomResourceDefinition Installed
system:controller:operator-lifecycle-manager ClusterRole Installed
aggregate-olm-edit ClusterRole Installed
aggregate-olm-view ClusterRole Installed
catalogsources.operators.coreos.com CustomResourceDefinition Installed
olm Namespace namespaces "olm" not found
olm-operator-binding-olm ClusterRoleBinding clusterrolebindings.rbac.authorization.k8s.io "olm-operator-binding-olm" not found
olm-operator olm Deployment deployments.apps "olm-operator" not found
catalog-operator olm Deployment deployments.apps "catalog-operator" not found
olm-operator-serviceaccount olm ServiceAccount serviceaccounts "olm-operator-serviceaccount" not found
operators Namespace namespaces "operators" not found
global-operators operators OperatorGroup operatorgroups.operators.coreos.com "global-operators" not found
olm-operators olm OperatorGroup operatorgroups.operators.coreos.com "olm-operators" not found
packageserver olm ClusterServiceVersion clusterserviceversions.operators.coreos.com "packageserver" not found
operatorhubio-catalog olm CatalogSource catalogsources.operators.coreos.com "operatorhubio-catalog" not found
运行 Bundle
请使用以下命令来执行Bundle。您需要在这里重新指定刚刚链接到ServiceAccount的ImagePullSecret。
$ operator-sdk run bundle jp.icr.io/teruq/hello-ansible-operator-bundle:v0.0.1 --pull-secret-name all-icr-io -n qiita-operators
INFO[0009] Successfully created registry pod: jp-icr-io-teruq-hello-ansible-operator-bundle-v0-0-1
INFO[0010] Created CatalogSource: hello-ansible-operator-catalog
INFO[0010] OperatorGroup "operator-sdk-og" created
INFO[0010] Created Subscription: hello-ansible-operator-v0-0-1-sub
INFO[0012] Approved InstallPlan install-lcqlh for the Subscription: hello-ansible-operator-v0-0-1-sub
INFO[0012] Waiting for ClusterServiceVersion "qiita-operators/hello-ansible-operator.v0.0.1" to reach 'Succeeded' phase
INFO[0012] Waiting for ClusterServiceVersion "qiita-operators/hello-ansible-operator.v0.0.1" to appear
INFO[0018] Found ClusterServiceVersion "qiita-operators/hello-ansible-operator.v0.0.1" phase: Pending
INFO[0020] Found ClusterServiceVersion "qiita-operators/hello-ansible-operator.v0.0.1" phase: Installing
INFO[0107] Found ClusterServiceVersion "qiita-operators/hello-ansible-operator.v0.0.1" phase: Succeeded
INFO[0108] OLM has successfully installed "hello-ansible-operator.v0.0.1"
以下是Pod的状态。
$ oc get pods -n qiita-operators
NAME READY STATUS RESTARTS AGE
9cbeaa081bdcc6ecb2ba12064d0e68820efc754b3a48e9acca07d3987bcgsgl 0/1 Completed 0 28s
hello-ansible-operator-controller-manager-7cb6dcf49d-99rnt 2/2 Running 0 14s
jp-icr-io-teruq-hello-ansible-operator-bundle-v0-0-1 1/1 Running 0 40s
关于ImagePullSecret的补充说明。jp-icr-io-teruq〜是与默认ServiceAccount关联的,9cbeaa081bd〜是在CLI执行时指定的,hello-ansible-operator〜是指定给Deployment使用的。可以从这种不一致中推断出,开发这个SDK的人可能没有考虑到镜像位于私有注册表中的情况。
确认OpenShift控制台
从OpenShift控制台中选择OperatorHub。存在Hello Operator。
查看已安装的运营商,发现已安装了Hello Operator。
应用程序部署
只需提供一个选项,用中文将以下内容进行释义翻译:
通过已安装的操作员,我们可以将屏幕顶部的项目更改为要部署的应用程序项目。然后选择Hello Operator。
选择“创建实例”。
上次在自定义资源定义中将“replicas”设为了必需属性,现在已经可以在界面上输入了。选择“创建”。
状态变为运行。
我要确认Pod正在运行。
$ oc get pods -n qiita | grep hello
sample-hello-c54fb8b58-7hmbn 1/1 Running 0 28s
sample-hello-c54fb8b58-sdjlk 1/1 Running 0 28s
请参考上一篇文章以确认应用程序的运行情况。
清理
您可以使用SDK进行清理。
$ operator-sdk cleanup hello-ansible-operator
INFO[0002] subscription "hello-ansible-operator-v0-0-1-sub" deleted
INFO[0002] customresourcedefinition "hellos.example.teruq.example.com" deleted
INFO[0002] clusterserviceversion "hello-ansible-operator.v0.0.1" deleted
INFO[0002] catalogsource "hello-ansible-operator-catalog" deleted
INFO[0002] operatorgroup "operator-sdk-og" deleted
INFO[0002] Operator "hello-ansible-operator" uninstalled
为了完全清理,需要手动逐个删除已创建的内容。
$ oc secrets unlink default all-icr-io
$ oc delete secret all-icr-io
$ oc delete ns qiita-operators
印象
使用私有注册表确实会增加手续的复杂性。如果现在这样继续下去,将无法将Operator编目化并轻松安装,因此我们希望找到一种解决办法。我想进一步验证,是否有好的方法可供选择。
请参考
-
- https://operatorhub.io/getting-started
- https://sdk.operatorframework.io/docs/building-operators/ansible/tutorial/