能否通过加速来提高 OpenStack Keystone 的速度?

Keystone太慢了。

当使用任何OpenStack服务时,Keystone的身份验证过程在背后运行。通过加快Keystone的操作,能否创建一个舒适的OpenStack环境?我们尝试了很多不同的方法,但并没有得出一个正式的结论。

贝斯

为了在libvirt上运行Ubuntu20.04,需要一个Vagrantfile文件。可以使用一个简单的脚本进行预处理。

Vagrant.configure("2") do |config|
    config.vm.define "master" do |host|
      host.vm.box = "generic/ubuntu2004"
      host.vm.provision :shell, inline: $script
      host.vm.provider "libvirt" do |vb|
        vb.memory = 8192
        vb.cpus = 8
      end
    end
end

$script = <<END
apt update
apt upgrade -y
apt install memcached keystone python3-openstackclient -y
END

软件版本

vagrant@ubuntu2004:~$ dpkg -l | grep -P "(mysql)|(memcache)|(keystone)"
ii  keystone                             2:17.0.0-0ubuntu0.20.04.1         all          OpenStack identity service - Daemons
ii  mysql-server-8.0                     8.0.22-0ubuntu0.20.04.2           amd64        MySQL database server binaries and system database setup
ii  memcached                            1.5.22-2ubuntu0.1                 amd64        High-performance in-memory object caching system

首先按照文档进行安装。
https://docs.openstack.org/keystone/ussuri/install/keystone-install-ubuntu.html
*暂时不使用memcached。

时间花在哪里?

通过添加timing选项可以测量每个API的时间。
在这里,对tokens的请求仅需要0.3秒*2,但是却花费了0.6秒。
(为什么要请求两次tokens呢?)

root@ubuntu2004:~# openstack project list --timing
+----------------------------------+-------+
| ID                               | Name  |
+----------------------------------+-------+
| 973e028a13184bf585915d1dbcb8bd69 | admin |
+----------------------------------+-------+

+-------------------------------------------+--------------------+
| URL                                       |            Seconds |
+-------------------------------------------+--------------------+
| GET http://localhost:5000/v3              |           0.003235 |
| POST http://localhost:5000/v3/auth/tokens |           0.333022 |
| POST http://localhost:5000/v3/auth/tokens |           0.313014 |
| GET http://localhost:5000/v3/projects     |           0.079824 |
| Total                                     | 0.7290949999999999 |
+-------------------------------------------+--------------------+
root@ubuntu2004:~#

用JMeter工具对/tokens进行了100个请求,在10秒内完成。
考虑到JMeter执行服务器的性能有限,这只是一个参考值,但平均处理时间为0.263秒。

Starting the test @ Sat Oct 31 12:26:14 JST 2020 (1604114774995)
Waiting for possible shutdown message on port 4445
summary =    100 in    12s =    8.6/s Avg:   263 Min:   249 Max:   313 Err:     0 (0.00%)
Tidying up ...    @ Sat Oct 31 12:26:26 JST 2020 (1604114786675)

尝试使用Memcached

在Keystone中添加使用Memcache的配置设置。
当令牌发行时,不确定Memcached的效果如何,但似乎有些改善。(平均为0.231秒)
在keystone.conf中,使用以下缓存后端设置:
backend = dogpile.cache.memcached

Starting the test @ Sat Oct 31 12:31:27 JST 2020 (1604115087396)
Waiting for possible shutdown message on port 4445
summary +     19 in     3s =    7.2/s Avg:   232 Min:   222 Max:   284 Err:     0 (0.00%) Active: 2 Started: 3 Finished: 1
summary +     81 in     9s =    9.3/s Avg:   231 Min:   218 Max:   238 Err:     0 (0.00%) Active: 0 Started: 10 Finished: 10
summary =    100 in  11.3s =    8.8/s Avg:   231 Min:   218 Max:   284 Err:     0 (0.00%)
Tidying up ...    @ Sat Oct 31 12:31:38 JST 2020 (1604115098736)

尝试使用以下后端设置进行了测试:
backend = oslo_cache.memcache_pool
结果看起来与dogpile.cache.memcached几乎没有差别。

Starting the test @ Sat Oct 31 12:36:02 JST 2020 (1604115362429)
Waiting for possible shutdown message on port 4445
summary +      1 in   0.5s =    2.2/s Avg:   280 Min:   280 Max:   280 Err:     0 (0.00%) Active: 1 Started: 1 Finished: 0
summary +     99 in    11s =    9.1/s Avg:   230 Min:   219 Max:   239 Err:     0 (0.00%) Active: 0 Started: 10 Finished: 10
summary =    100 in  11.3s =    8.9/s Avg:   230 Min:   219 Max:   280 Err:     0 (0.00%)
Tidying up ...    @ Sat Oct 31 12:36:13 JST 2020 (1604115373754)

Dogpile.cache.memcached和oslo_cache.memcache_pool的区别是什么?

根据配置文件的文件记录,以下内容被提及。

除了小型环境外,基本上推荐使用memcache_pool。
这意味着连接会被池化到memcache。

Cache backend module. For eventlet-based or environments with hundreds of threaded servers,
Memcache with pooling (oslo_cache.memcache_pool) is recommended.
For environments with less than 100 threaded servers,
Memcached (dogpile.cache.memcached) or Redis (dogpile.cache.redis) is recommended. 
Test environments with a single instance of the server can use the dogpile.cache.memory backend.

顺便说一句,根据pool_maxsize设置,可以控制连接池的数量,所以根据不同的环境可以通过改变这个值来改善性能。(但是在本文的实验环境中,只启动了keystone,因此并不是通过调整MySQL或Memcached连接池的数量来改善性能,而是旨在提高单个请求的速度。)

访问MySQL

虽然不清楚令牌发行程序中的哪个部分花费的时间,但可以尝试从易调整的MySQL中进行检查。
通过tcpdump+wireshark的观察发现,在令牌发行时访问了MySQL,所以看了一下查询语句。既然都是SELECT语句,如果改善读取速度应该能加快。此外,由于存在ORDER BY进行排序,也许还可以优化排序相关的性能。

SELECT user.enabled AS user_enabled, user.id AS user_id, user.domain_id AS user_domain_id, user.extra AS user_extra, user.default_project_id AS user_default_project_id, user.created_at AS user_created_at, user.last_active_at AS user_last_active_at, password_1.created_at AS password_1_created_at, password_1.expires_at AS password_1_expires_at, password_1.id AS password_1_id, password_1.local_user_id AS password_1_local_user_id, password_1.password_hash AS password_1_password_hash, password_1.created_at_int AS password_1_created_at_int, password_1.expires_at_int AS password_1_expires_at_int, password_1.self_service AS password_1_self_service, local_user_1.id AS local_user_1_id, local_user_1.user_id AS local_user_1_user_id, local_user_1.domain_id AS local_user_1_domain_id, local_user_1.name AS local_user_1_name, local_user_1.failed_auth_count AS local_user_1_failed_auth_count, local_user_1.failed_auth_at AS local_user_1_failed_auth_at, federated_user_1.id AS federated_user_1_id, federated_user_1.user_id AS federated_user_1_user_id, federated_user_1.idp_id AS federated_user_1_idp_id, federated_user_1.protocol_id AS federated_user_1_protocol_id, federated_user_1.unique_id AS federated_user_1_unique_id, federated_user_1.display_name AS federated_user_1_display_name, nonlocal_user_1.domain_id AS nonlocal_user_1_domain_id, nonlocal_user_1.name AS nonlocal_user_1_name, nonlocal_user_1.user_id AS nonlocal_user_1_user_id 
FROM user LEFT OUTER JOIN local_user AS local_user_1 ON user.id = local_user_1.user_id AND user.domain_id = local_user_1.domain_id LEFT OUTER JOIN password AS password_1 ON local_user_1.id = password_1.local_user_id LEFT OUTER JOIN federated_user AS federated_user_1 ON user.id = federated_user_1.user_id LEFT OUTER JOIN nonlocal_user AS nonlocal_user_1 ON user.domain_id = nonlocal_user_1.domain_id AND user.id = nonlocal_user_1.user_id 
WHERE user.id = 'e901f4f3d5544817bf89c1946e4ed419' ORDER BY password_1.created_at_int

SELECT user_option.user_id AS user_option_user_id, user_option.option_id AS user_option_option_id, user_option.option_value AS user_option_option_value, anon_1.user_id AS anon_1_user_id 
FROM (SELECT user.id AS user_id 
FROM user 
WHERE user.id = 'e901f4f3d5544817bf89c1946e4ed419') AS anon_1 INNER JOIN user_option ON anon_1.user_id = user_option.user_id ORDER BY anon_1.user_id

SELECT user.enabled AS user_enabled, user.id AS user_id, user.domain_id AS user_domain_id, user.extra AS user_extra, user.default_project_id AS user_default_project_id, user.created_at AS user_created_at, user.last_active_at AS user_last_active_at, password_1.created_at AS password_1_created_at, password_1.expires_at AS password_1_expires_at, password_1.id AS password_1_id, password_1.local_user_id AS password_1_local_user_id, password_1.password_hash AS password_1_password_hash, password_1.created_at_int AS password_1_created_at_int, password_1.expires_at_int AS password_1_expires_at_int, password_1.self_service AS password_1_self_service, local_user_1.id AS local_user_1_id, local_user_1.user_id AS local_user_1_user_id, local_user_1.domain_id AS local_user_1_domain_id, local_user_1.name AS local_user_1_name, local_user_1.failed_auth_count AS local_user_1_failed_auth_count, local_user_1.failed_auth_at AS local_user_1_failed_auth_at, federated_user_1.id AS federated_user_1_id, federated_user_1.user_id AS federated_user_1_user_id, federated_user_1.idp_id AS federated_user_1_idp_id, federated_user_1.protocol_id AS federated_user_1_protocol_id, federated_user_1.unique_id AS federated_user_1_unique_id, federated_user_1.display_name AS federated_user_1_display_name, nonlocal_user_1.domain_id AS nonlocal_user_1_domain_id, nonlocal_user_1.name AS nonlocal_user_1_name, nonlocal_user_1.user_id AS nonlocal_user_1_user_id 
FROM user LEFT OUTER JOIN local_user AS local_user_1 ON user.id = local_user_1.user_id AND user.domain_id = local_user_1.domain_id LEFT OUTER JOIN password AS password_1 ON local_user_1.id = password_1.local_user_id LEFT OUTER JOIN federated_user AS federated_user_1 ON user.id = federated_user_1.user_id LEFT OUTER JOIN nonlocal_user AS nonlocal_user_1 ON user.domain_id = nonlocal_user_1.domain_id AND user.id = nonlocal_user_1.user_id 
WHERE user.id = 'e901f4f3d5544817bf89c1946e4ed419' ORDER BY password_1.created_at_int

SELECT user_option.user_id AS user_option_user_id, user_option.option_id AS user_option_option_id, user_option.option_value AS user_option_option_value, anon_1.user_id AS anon_1_user_id 
FROM (SELECT user.id AS user_id 
FROM user 
WHERE user.id = 'e901f4f3d5544817bf89c1946e4ed419') AS anon_1 INNER JOIN user_option ON anon_1.user_id = user_option.user_id ORDER BY anon_1.user_id

SELECT revocation_event.id AS revocation_event_id, revocation_event.domain_id AS revocation_event_domain_id, revocation_event.project_id AS revocation_event_project_id, revocation_event.user_id AS revocation_event_user_id, revocation_event.role_id AS revocation_event_role_id, revocation_event.trust_id AS revocation_event_trust_id, revocation_event.consumer_id AS revocation_event_consumer_id, revocation_event.access_token_id AS revocation_event_access_token_id, revocation_event.issued_before AS revocation_event_issued_before, revocation_event.expires_at AS revocation_event_expires_at, revocation_event.revoked_at AS revocation_event_revoked_at, revocation_event.audit_id AS revocation_event_audit_id, revocation_event.audit_chain_id AS revocation_event_audit_chain_id 
FROM revocation_event 
WHERE revocation_event.issued_before >= '2020-10-31 09:13:17' AND (revocation_event.user_id IS NULL OR revocation_event.user_id = 'e901f4f3d5544817bf89c1946e4ed419') AND (revocation_event.project_id IS NULL OR revocation_event.project_id = '973e028a13184bf585915d1dbcb8bd69') AND (revocation_event.audit_id IS NULL OR revocation_event.audit_id = '8QLRph58TVuNZ-DwS_m1Qw')

SELECT project.id AS project_id, project.name AS project_name, project.domain_id AS project_domain_id, project.description AS project_description, project.enabled AS project_enabled, project.extra AS project_extra, project.parent_id AS project_parent_id, project.is_domain AS project_is_domain 
FROM project 
WHERE project.id != '<<keystone.domain.root>>' AND project.is_domain = false

SELECT project_tag.project_id AS project_tag_project_id, project_tag.name AS project_tag_name, anon_1.project_id AS anon_1_project_id 
FROM (SELECT project.id AS project_id 
FROM project 
WHERE project.id != '<<keystone.domain.root>>' AND project.is_domain = false) AS anon_1 INNER JOIN project_tag ON project_tag.project_id = anon_1.project_id ORDER BY anon_1.project_id

SELECT project_option.project_id AS project_option_project_id, project_option.option_id AS project_option_option_id, project_option.option_value AS project_option_option_value, anon_1.project_id AS anon_1_project_id 
FROM (SELECT project.id AS project_id 
FROM project 
WHERE project.id != '<<keystone.domain.root>>' AND project.is_domain = false) AS anon_1 INNER JOIN project_option ON anon_1.project_id = project_option.project_id ORDER BY anon_1.project_id

MySQL的性能

MySQL8.0版本开始不再支持查询缓存,如果进行查询缓存相关的设置,MySQL将无法启动。

听说他们正在尝试使用名为ProxySQL的东西作为一种代替选择,以在客户端和MySQL之间进行缓存,然而考虑到会增加管理软件和新的学习成本,可能不太理想。暂时先尝试使用mysqltuner提供的简易方法。

使用mysqltuner进行性能测量

在互联网上有很多关于MySQL性能改善的信息,但学习成本可能会很高…有点麻烦,所以我想使用mysqltuner进行简单的诊断。它可以在Ubuntu的软件库中找到,很容易通过apt进行安装。

执行后,发现了许多问题,以下是可能存在改进空间的要点。

-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
    Control warning line(s) into /var/log/mysql/error.log file
    Control error line(s) into /var/log/mysql/error.log file
    MySQL was started within the last 24 hours - recommendations may be inaccurate
    Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
    Before changing innodb_log_file_size and/or innodb_log_files_in_group read this: https://bit.ly/2TcGgtU
Variables to adjust:
    innodb_log_file_size should be (=16M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
root@ubuntu2004:~#

首先是一个简单易懂的命名解决的会议。虽然没有期望改变,但结果并没有变化。

Starting the test @ Sat Oct 31 18:55:19 JST 2020 (1604138119110)
Waiting for possible shutdown message on port 4445
summary +     98 in    11s =    9.0/s Avg:   233 Min:   223 Max:   286 Err:     0 (0.00%) Active: 1 Started: 10 Finished: 9
summary +      2 in   0.5s =    4.4/s Avg:   224 Min:   222 Max:   226 Err:     0 (0.00%) Active: 0 Started: 10 Finished: 10
summary =    100 in  11.3s =    8.8/s Avg:   232 Min:   222 Max:   286 Err:     0 (0.00%)
Tidying up ...    @ Sat Oct 31 18:55:30 JST 2020 (1604138130464)
... end of run

然后我把innodb_log_file_size改为16MB进行了尝试。嗯,这个改变并没有产生任何效果。

Starting the test @ Sat Oct 31 19:19:25 JST 2020 (1604139565449)
Waiting for possible shutdown message on port 4445
summary +     39 in     5s =    8.4/s Avg:   235 Min:   227 Max:   279 Err:     0 (0.00%) Active: 2 Started: 5 Finished: 3
summary +     61 in     7s =    9.1/s Avg:   234 Min:   227 Max:   243 Err:     0 (0.00%) Active: 0 Started: 10 Finished: 10
summary =    100 in  11.4s =    8.8/s Avg:   235 Min:   227 Max:   279 Err:     0 (0.00%)
Tidying up ...    @ Sat Oct 31 19:19:36 JST 2020 (1604139576840)

ProxySQL代理

最后我也尝试了一下ProxySQL。

这个设置参考了下面的内容。顺便提一句,直接用rm命令删除设置可能更好一些。很困惑的是keystone用户没有被识别出来。

然后,将keystone.conf中的访问端口更改为6033(通过ProxySQL),并进行测量。结果如下。

Starting the test @ Sat Oct 31 20:33:14 JST 2020 (1604143994240)
Waiting for possible shutdown message on port 4445
summary =    100 in  11.4s =    8.8/s Avg:   235 Min:   224 Max:   286 Err:     0 (0.00%)
Tidying up ...    @ Sat Oct 31 20:33:25 JST 2020 (1604144005631)

只是因為加了一個環節,所以不會那麼快,但也不算慢。從實際的角度考慮,在構建MySQL服務器集群的環境上,比起HAProxy來說這樣做更好一些吧?

总结

在可以轻松尝试的部分中,使用memcached效果明显且易于理解。由于memcached在许多地方都被普遍使用,所以下一步可以尝试针对MySQL和memcached进行调优,效果可能会显著。此外,如果处理速度慢的原因是:
* 可以增加keystone进程数量
* 可以增加连接池数
* 可以使用像ProxySQL这样的代理服务器
通过这些方法也可能看到明显的效果。

对于单个操作的速度改进,需要在程序中加入调试日志,以便更详细地找出占用时间较长的部分,这可能会比较困难。

广告
将在 10 秒后关闭
bannerAds