从Elastcisearch 7X到Elasticsearch-head

2 年 ago

科, 雅

2 minutes

Elasticsearch是什么

①Elasticstack的核心组成部分（心脏部分）
②主流搜索引擎
③开源
④负责数据存储和分析工作
⑤太过迅速
⑥可通过RESTful接口进行操作
⑦基于Lucene开发（安装Elasticsearch后，查看/etc/elasticsearch/lib的内容可以确认存在Lucene）。

Elastcisearch在哪里被使用？

我要介绍一个使用案例。

地図情報分析する

搜索引擎的比较（Lucene、Solr、Elasticsearch）

Lucene

这个搜索库是开源的，并且被全世界广泛认可为性能最好的搜索库。据我所知，它是目前性能最强的搜索引擎。

Apacche Solr

提供基于Apache Lucene的开源搜索平台和Web服务API的工具，不积极进行版本更新。支持Json、XML和CSV格式。该服务从很久以前开始提供，用户众多，学习成本较低。

Elasticsearch

基於Apache Lucene的開源搜尋平台，提供RESTful API的Solr比較容易擴展、容易建置叢集，只支援JSON格式，每個月都會有版本更新，但很少有工程師具備維護它所需的知識，並且需要具備不錯的故障排除技能。

请参考一下比较Solr和Elasticsearch这两个选项。

为什么要使用Elasticsearch？

Lucene只是一个库。要使用它，必须将其集成到Java程序中。而且，如果对搜索没有足够的知识，使用Lucene会相当困难。

为了解决上述问题，诞生了 elasticsearch。

下载和安装

请根据您所需的环境从主页上下载。
由于有很多有关下载和安装的文章，在此我将省略。
我将使用的环境如下：

[root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 8.3.2011

动作确认① ①)

[root@localhost ~]# curl http://localhost:9200
{
"name" : "localhost.localdomain",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "5DyhRe1RQiSml_rErADF_Q",
"version" : {
"number" : "7.12.1",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "3186837139b9c6b6d23c3200870651f10d3343b7",
"build_date" : "2021-04-20T20:56:39.040728659Z",
"build_snapshot" : false,
"lucene_version" : "8.8.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

确认动作2

闲谈

以下的設置不是必要的，因此只需根據需要進行設置。

Elasticsearch的堆大小

堆大小越大，缓存内存越多，速度似乎也越快。我将我的堆大小从4GB调整为512MB。

# vi /etc/elasticsearch/jvm.options
-Xms***　→　調整
-Xmx***　→　調整
# systemctl restart elasticsearch →　設定反映する

虚拟内存服务器

# vi /etc/sysctl.conf
vm.max_map_count=262144 　→　必要なサイズを追記
# sysctl -p 　　→　設定反映する

安装elasticsearch-head

由于ES的官方不提供GUI特性，elasticsearch-head插件解决了这个问题，使得可以在GUI中查看。

elasticsearch-head的源代码

可以通过下面的链接从GitHub获取。

安装方式

这次似乎可以通过四种不同的方式进行安装，我们选择使用npm进行安装。
另外还可以通过安装Chrome插件来完成，也推荐使用这种方法。

# git clone git://github.com/mobz/elasticsearch-head.git
# cd elasticsearch-head
# npm install
# npm run start

确认行动

编辑设置文件

# vi etc/elasticsearch/elasticsearch.yml
http.cors.enabled: true →　追記
http.cors.allow-origin: "*"　→　追記
# systemctl restart elasticsearch →　設定反映のため再起動

似乎完成了。

再次确认

访问网址为http://xxx.xxx.xxx:9200/_cluster/health?pretty=true

{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

状态已经变成绿色了。
这样就完成了。

如果你遇到困难，请参考其他的安装说明。

Elasticsearch的基本概念

我希望我们可以进入正式演出，但在此之前，官方网站介绍了基本键盘概念，建议在理解后进行，但遗憾的是，我读了也不明白。在这里，我想再次简单解释一下每个关键字。

实时

文字通常

节点

节点指的是Elasticsearch服务器。

集群

集群是一组拥有一个或多个节点（服务器）的数据集合，用于通过Elasticsearch进行搜索时自然会产生流量。为了分散流量，通常需要多个Elasticsearch服务器。多个节点，即服务器群（Elasticsearch服务器），称为集群。

索引
数据的存储位置。顾名思义，它是索引。可以拥有多个索引，并存储搜索相关的文档。

类型
类似于RDB中的表格。可以在索引内定义一个或多个类型。根据类型的目的，存储的数据（文档）会有所不同。

文档
类似于RDB中的记录。它是实际的数据实体。搜索的最小单位是文档。

分片
将索引分割而成的部分，分为两种类型：主分片和副本分片。例如，为了分散数据，或者在发生故障时仍能提供服务，需要创建分布式数据库。原始数据库的副本是主分片，负责更新数据等操作。副本数据库只是复制主分片，不做其他任何操作。

Elasticsearch和数据库的比较

Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices   -> Types  -> Documents -> Fields