能否通过加速来提高 OpenStack Keystone 的速度?
Keystone太慢了。
当使用任何OpenStack服务时,Keystone的身份验证过程在背后运行。通过加快Keystone的操作,能否创建一个舒适的OpenStack环境?我们尝试了很多不同的方法,但并没有得出一个正式的结论。
贝斯
为了在libvirt上运行Ubuntu20.04,需要一个Vagrantfile文件。可以使用一个简单的脚本进行预处理。
Vagrant.configure("2") do |config|
config.vm.define "master" do |host|
host.vm.box = "generic/ubuntu2004"
host.vm.provision :shell, inline: $script
host.vm.provider "libvirt" do |vb|
vb.memory = 8192
vb.cpus = 8
end
end
end
$script = <<END
apt update
apt upgrade -y
apt install memcached keystone python3-openstackclient -y
END
软件版本
vagrant@ubuntu2004:~$ dpkg -l | grep -P "(mysql)|(memcache)|(keystone)"
ii keystone 2:17.0.0-0ubuntu0.20.04.1 all OpenStack identity service - Daemons
ii mysql-server-8.0 8.0.22-0ubuntu0.20.04.2 amd64 MySQL database server binaries and system database setup
ii memcached 1.5.22-2ubuntu0.1 amd64 High-performance in-memory object caching system
首先按照文档进行安装。
https://docs.openstack.org/keystone/ussuri/install/keystone-install-ubuntu.html
*暂时不使用memcached。
时间花在哪里?
通过添加timing选项可以测量每个API的时间。
在这里,对tokens的请求仅需要0.3秒*2,但是却花费了0.6秒。
(为什么要请求两次tokens呢?)
root@ubuntu2004:~# openstack project list --timing
+----------------------------------+-------+
| ID | Name |
+----------------------------------+-------+
| 973e028a13184bf585915d1dbcb8bd69 | admin |
+----------------------------------+-------+
+-------------------------------------------+--------------------+
| URL | Seconds |
+-------------------------------------------+--------------------+
| GET http://localhost:5000/v3 | 0.003235 |
| POST http://localhost:5000/v3/auth/tokens | 0.333022 |
| POST http://localhost:5000/v3/auth/tokens | 0.313014 |
| GET http://localhost:5000/v3/projects | 0.079824 |
| Total | 0.7290949999999999 |
+-------------------------------------------+--------------------+
root@ubuntu2004:~#
用JMeter工具对/tokens进行了100个请求,在10秒内完成。
考虑到JMeter执行服务器的性能有限,这只是一个参考值,但平均处理时间为0.263秒。
Starting the test @ Sat Oct 31 12:26:14 JST 2020 (1604114774995)
Waiting for possible shutdown message on port 4445
summary = 100 in 12s = 8.6/s Avg: 263 Min: 249 Max: 313 Err: 0 (0.00%)
Tidying up ... @ Sat Oct 31 12:26:26 JST 2020 (1604114786675)
尝试使用Memcached
在Keystone中添加使用Memcache的配置设置。
当令牌发行时,不确定Memcached的效果如何,但似乎有些改善。(平均为0.231秒)
在keystone.conf中,使用以下缓存后端设置:
backend = dogpile.cache.memcached
Starting the test @ Sat Oct 31 12:31:27 JST 2020 (1604115087396)
Waiting for possible shutdown message on port 4445
summary + 19 in 3s = 7.2/s Avg: 232 Min: 222 Max: 284 Err: 0 (0.00%) Active: 2 Started: 3 Finished: 1
summary + 81 in 9s = 9.3/s Avg: 231 Min: 218 Max: 238 Err: 0 (0.00%) Active: 0 Started: 10 Finished: 10
summary = 100 in 11.3s = 8.8/s Avg: 231 Min: 218 Max: 284 Err: 0 (0.00%)
Tidying up ... @ Sat Oct 31 12:31:38 JST 2020 (1604115098736)
尝试使用以下后端设置进行了测试:
backend = oslo_cache.memcache_pool
结果看起来与dogpile.cache.memcached几乎没有差别。
Starting the test @ Sat Oct 31 12:36:02 JST 2020 (1604115362429)
Waiting for possible shutdown message on port 4445
summary + 1 in 0.5s = 2.2/s Avg: 280 Min: 280 Max: 280 Err: 0 (0.00%) Active: 1 Started: 1 Finished: 0
summary + 99 in 11s = 9.1/s Avg: 230 Min: 219 Max: 239 Err: 0 (0.00%) Active: 0 Started: 10 Finished: 10
summary = 100 in 11.3s = 8.9/s Avg: 230 Min: 219 Max: 280 Err: 0 (0.00%)
Tidying up ... @ Sat Oct 31 12:36:13 JST 2020 (1604115373754)
Dogpile.cache.memcached和oslo_cache.memcache_pool的区别是什么?
根据配置文件的文件记录,以下内容被提及。
除了小型环境外,基本上推荐使用memcache_pool。
这意味着连接会被池化到memcache。
Cache backend module. For eventlet-based or environments with hundreds of threaded servers,
Memcache with pooling (oslo_cache.memcache_pool) is recommended.
For environments with less than 100 threaded servers,
Memcached (dogpile.cache.memcached) or Redis (dogpile.cache.redis) is recommended.
Test environments with a single instance of the server can use the dogpile.cache.memory backend.
顺便说一句,根据pool_maxsize设置,可以控制连接池的数量,所以根据不同的环境可以通过改变这个值来改善性能。(但是在本文的实验环境中,只启动了keystone,因此并不是通过调整MySQL或Memcached连接池的数量来改善性能,而是旨在提高单个请求的速度。)
访问MySQL
虽然不清楚令牌发行程序中的哪个部分花费的时间,但可以尝试从易调整的MySQL中进行检查。
通过tcpdump+wireshark的观察发现,在令牌发行时访问了MySQL,所以看了一下查询语句。既然都是SELECT语句,如果改善读取速度应该能加快。此外,由于存在ORDER BY进行排序,也许还可以优化排序相关的性能。
SELECT user.enabled AS user_enabled, user.id AS user_id, user.domain_id AS user_domain_id, user.extra AS user_extra, user.default_project_id AS user_default_project_id, user.created_at AS user_created_at, user.last_active_at AS user_last_active_at, password_1.created_at AS password_1_created_at, password_1.expires_at AS password_1_expires_at, password_1.id AS password_1_id, password_1.local_user_id AS password_1_local_user_id, password_1.password_hash AS password_1_password_hash, password_1.created_at_int AS password_1_created_at_int, password_1.expires_at_int AS password_1_expires_at_int, password_1.self_service AS password_1_self_service, local_user_1.id AS local_user_1_id, local_user_1.user_id AS local_user_1_user_id, local_user_1.domain_id AS local_user_1_domain_id, local_user_1.name AS local_user_1_name, local_user_1.failed_auth_count AS local_user_1_failed_auth_count, local_user_1.failed_auth_at AS local_user_1_failed_auth_at, federated_user_1.id AS federated_user_1_id, federated_user_1.user_id AS federated_user_1_user_id, federated_user_1.idp_id AS federated_user_1_idp_id, federated_user_1.protocol_id AS federated_user_1_protocol_id, federated_user_1.unique_id AS federated_user_1_unique_id, federated_user_1.display_name AS federated_user_1_display_name, nonlocal_user_1.domain_id AS nonlocal_user_1_domain_id, nonlocal_user_1.name AS nonlocal_user_1_name, nonlocal_user_1.user_id AS nonlocal_user_1_user_id
FROM user LEFT OUTER JOIN local_user AS local_user_1 ON user.id = local_user_1.user_id AND user.domain_id = local_user_1.domain_id LEFT OUTER JOIN password AS password_1 ON local_user_1.id = password_1.local_user_id LEFT OUTER JOIN federated_user AS federated_user_1 ON user.id = federated_user_1.user_id LEFT OUTER JOIN nonlocal_user AS nonlocal_user_1 ON user.domain_id = nonlocal_user_1.domain_id AND user.id = nonlocal_user_1.user_id
WHERE user.id = 'e901f4f3d5544817bf89c1946e4ed419' ORDER BY password_1.created_at_int
SELECT user_option.user_id AS user_option_user_id, user_option.option_id AS user_option_option_id, user_option.option_value AS user_option_option_value, anon_1.user_id AS anon_1_user_id
FROM (SELECT user.id AS user_id
FROM user
WHERE user.id = 'e901f4f3d5544817bf89c1946e4ed419') AS anon_1 INNER JOIN user_option ON anon_1.user_id = user_option.user_id ORDER BY anon_1.user_id
SELECT user.enabled AS user_enabled, user.id AS user_id, user.domain_id AS user_domain_id, user.extra AS user_extra, user.default_project_id AS user_default_project_id, user.created_at AS user_created_at, user.last_active_at AS user_last_active_at, password_1.created_at AS password_1_created_at, password_1.expires_at AS password_1_expires_at, password_1.id AS password_1_id, password_1.local_user_id AS password_1_local_user_id, password_1.password_hash AS password_1_password_hash, password_1.created_at_int AS password_1_created_at_int, password_1.expires_at_int AS password_1_expires_at_int, password_1.self_service AS password_1_self_service, local_user_1.id AS local_user_1_id, local_user_1.user_id AS local_user_1_user_id, local_user_1.domain_id AS local_user_1_domain_id, local_user_1.name AS local_user_1_name, local_user_1.failed_auth_count AS local_user_1_failed_auth_count, local_user_1.failed_auth_at AS local_user_1_failed_auth_at, federated_user_1.id AS federated_user_1_id, federated_user_1.user_id AS federated_user_1_user_id, federated_user_1.idp_id AS federated_user_1_idp_id, federated_user_1.protocol_id AS federated_user_1_protocol_id, federated_user_1.unique_id AS federated_user_1_unique_id, federated_user_1.display_name AS federated_user_1_display_name, nonlocal_user_1.domain_id AS nonlocal_user_1_domain_id, nonlocal_user_1.name AS nonlocal_user_1_name, nonlocal_user_1.user_id AS nonlocal_user_1_user_id
FROM user LEFT OUTER JOIN local_user AS local_user_1 ON user.id = local_user_1.user_id AND user.domain_id = local_user_1.domain_id LEFT OUTER JOIN password AS password_1 ON local_user_1.id = password_1.local_user_id LEFT OUTER JOIN federated_user AS federated_user_1 ON user.id = federated_user_1.user_id LEFT OUTER JOIN nonlocal_user AS nonlocal_user_1 ON user.domain_id = nonlocal_user_1.domain_id AND user.id = nonlocal_user_1.user_id
WHERE user.id = 'e901f4f3d5544817bf89c1946e4ed419' ORDER BY password_1.created_at_int
SELECT user_option.user_id AS user_option_user_id, user_option.option_id AS user_option_option_id, user_option.option_value AS user_option_option_value, anon_1.user_id AS anon_1_user_id
FROM (SELECT user.id AS user_id
FROM user
WHERE user.id = 'e901f4f3d5544817bf89c1946e4ed419') AS anon_1 INNER JOIN user_option ON anon_1.user_id = user_option.user_id ORDER BY anon_1.user_id
SELECT revocation_event.id AS revocation_event_id, revocation_event.domain_id AS revocation_event_domain_id, revocation_event.project_id AS revocation_event_project_id, revocation_event.user_id AS revocation_event_user_id, revocation_event.role_id AS revocation_event_role_id, revocation_event.trust_id AS revocation_event_trust_id, revocation_event.consumer_id AS revocation_event_consumer_id, revocation_event.access_token_id AS revocation_event_access_token_id, revocation_event.issued_before AS revocation_event_issued_before, revocation_event.expires_at AS revocation_event_expires_at, revocation_event.revoked_at AS revocation_event_revoked_at, revocation_event.audit_id AS revocation_event_audit_id, revocation_event.audit_chain_id AS revocation_event_audit_chain_id
FROM revocation_event
WHERE revocation_event.issued_before >= '2020-10-31 09:13:17' AND (revocation_event.user_id IS NULL OR revocation_event.user_id = 'e901f4f3d5544817bf89c1946e4ed419') AND (revocation_event.project_id IS NULL OR revocation_event.project_id = '973e028a13184bf585915d1dbcb8bd69') AND (revocation_event.audit_id IS NULL OR revocation_event.audit_id = '8QLRph58TVuNZ-DwS_m1Qw')
SELECT project.id AS project_id, project.name AS project_name, project.domain_id AS project_domain_id, project.description AS project_description, project.enabled AS project_enabled, project.extra AS project_extra, project.parent_id AS project_parent_id, project.is_domain AS project_is_domain
FROM project
WHERE project.id != '<<keystone.domain.root>>' AND project.is_domain = false
SELECT project_tag.project_id AS project_tag_project_id, project_tag.name AS project_tag_name, anon_1.project_id AS anon_1_project_id
FROM (SELECT project.id AS project_id
FROM project
WHERE project.id != '<<keystone.domain.root>>' AND project.is_domain = false) AS anon_1 INNER JOIN project_tag ON project_tag.project_id = anon_1.project_id ORDER BY anon_1.project_id
SELECT project_option.project_id AS project_option_project_id, project_option.option_id AS project_option_option_id, project_option.option_value AS project_option_option_value, anon_1.project_id AS anon_1_project_id
FROM (SELECT project.id AS project_id
FROM project
WHERE project.id != '<<keystone.domain.root>>' AND project.is_domain = false) AS anon_1 INNER JOIN project_option ON anon_1.project_id = project_option.project_id ORDER BY anon_1.project_id
MySQL的性能
MySQL8.0版本开始不再支持查询缓存,如果进行查询缓存相关的设置,MySQL将无法启动。
听说他们正在尝试使用名为ProxySQL的东西作为一种代替选择,以在客户端和MySQL之间进行缓存,然而考虑到会增加管理软件和新的学习成本,可能不太理想。暂时先尝试使用mysqltuner提供的简易方法。
使用mysqltuner进行性能测量
在互联网上有很多关于MySQL性能改善的信息,但学习成本可能会很高…有点麻烦,所以我想使用mysqltuner进行简单的诊断。它可以在Ubuntu的软件库中找到,很容易通过apt进行安装。
执行后,发现了许多问题,以下是可能存在改进空间的要点。
-------- Recommendations ---------------------------------------------------------------------------
General recommendations:
Control warning line(s) into /var/log/mysql/error.log file
Control error line(s) into /var/log/mysql/error.log file
MySQL was started within the last 24 hours - recommendations may be inaccurate
Configure your accounts with ip or subnets only, then update your configuration with skip-name-resolve=1
Before changing innodb_log_file_size and/or innodb_log_files_in_group read this: https://bit.ly/2TcGgtU
Variables to adjust:
innodb_log_file_size should be (=16M) if possible, so InnoDB total log files size equals to 25% of buffer pool size.
root@ubuntu2004:~#
首先是一个简单易懂的命名解决的会议。虽然没有期望改变,但结果并没有变化。
Starting the test @ Sat Oct 31 18:55:19 JST 2020 (1604138119110)
Waiting for possible shutdown message on port 4445
summary + 98 in 11s = 9.0/s Avg: 233 Min: 223 Max: 286 Err: 0 (0.00%) Active: 1 Started: 10 Finished: 9
summary + 2 in 0.5s = 4.4/s Avg: 224 Min: 222 Max: 226 Err: 0 (0.00%) Active: 0 Started: 10 Finished: 10
summary = 100 in 11.3s = 8.8/s Avg: 232 Min: 222 Max: 286 Err: 0 (0.00%)
Tidying up ... @ Sat Oct 31 18:55:30 JST 2020 (1604138130464)
... end of run
然后我把innodb_log_file_size改为16MB进行了尝试。嗯,这个改变并没有产生任何效果。
Starting the test @ Sat Oct 31 19:19:25 JST 2020 (1604139565449)
Waiting for possible shutdown message on port 4445
summary + 39 in 5s = 8.4/s Avg: 235 Min: 227 Max: 279 Err: 0 (0.00%) Active: 2 Started: 5 Finished: 3
summary + 61 in 7s = 9.1/s Avg: 234 Min: 227 Max: 243 Err: 0 (0.00%) Active: 0 Started: 10 Finished: 10
summary = 100 in 11.4s = 8.8/s Avg: 235 Min: 227 Max: 279 Err: 0 (0.00%)
Tidying up ... @ Sat Oct 31 19:19:36 JST 2020 (1604139576840)
ProxySQL代理
最后我也尝试了一下ProxySQL。
这个设置参考了下面的内容。顺便提一句,直接用rm命令删除设置可能更好一些。很困惑的是keystone用户没有被识别出来。
然后,将keystone.conf中的访问端口更改为6033(通过ProxySQL),并进行测量。结果如下。
Starting the test @ Sat Oct 31 20:33:14 JST 2020 (1604143994240)
Waiting for possible shutdown message on port 4445
summary = 100 in 11.4s = 8.8/s Avg: 235 Min: 224 Max: 286 Err: 0 (0.00%)
Tidying up ... @ Sat Oct 31 20:33:25 JST 2020 (1604144005631)
只是因為加了一個環節,所以不會那麼快,但也不算慢。從實際的角度考慮,在構建MySQL服務器集群的環境上,比起HAProxy來說這樣做更好一些吧?
总结
在可以轻松尝试的部分中,使用memcached效果明显且易于理解。由于memcached在许多地方都被普遍使用,所以下一步可以尝试针对MySQL和memcached进行调优,效果可能会显著。此外,如果处理速度慢的原因是:
* 可以增加keystone进程数量
* 可以增加连接池数
* 可以使用像ProxySQL这样的代理服务器
通过这些方法也可能看到明显的效果。
对于单个操作的速度改进,需要在程序中加入调试日志,以便更详细地找出占用时间较长的部分,这可能会比较困难。