在Oracle Cloud Infrastructure环境中安装Elasticsearch和Sudachi插件

3 年 ago

宇, 华

4 minutes

这是安装了Elasticsearch和Kibana以及Sudachi作为形态分析引擎的日志。
参考如下。

Install Elasticsearch with RPM

由于这是一个试用环境，所以不考虑可用性和安全性。

使用过的环境

参考 Oracle Cloud Infrastructure 的 “创建实例-尝试使用 Oracle Cloud Infrastructure（第三节）”，创建虚拟云网络 (VCN)，并创建计算实例 (VM)。

操作系统：选择最新版本的Oracle Linux 7.x。

创建实例后，请确认公网IP地址和私有IP地址。
（计算 >> 实例 >> 实例详情）

安装

安装Java。下载rpm文件并运行以下操作。

# mkdir /usr/java
# cd /usr/java/
# rpm -ivh jre-8u191-linux-x64.rpm

请按照以下内容创建 /etc/yum.repos.d/elasticsearch.repo 文件。

[elasticsearch-6.x]
name=Elasticsearch repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

安装Elasticsearch 6.2.0。
由于Sudachi插件不支持最新版本的Elasticsearch，安装失败了。因此，需要明确指定Elasticsearch的版本并使用yum进行安装。

# yum --showduplicates search elasticsearch (※install可能なバージョンを確認)
# yum install elasticsearch-6.2.0-1.noarch

安装Sudachi插件

# yum install maven
# yum install git
# git clone https://github.com/WorksApplications/elasticsearch-sudachi.git
# cd elasticsearch-sudachi/
# mvn package
# /usr/share/elasticsearch/bin/elasticsearch-plugin install file:///root/elasticsearch-sudachi/target/releases/analysis-sudachi-elasticsearch6.2-1.1.0-SNAPSHOT.zip

安装 Kibana（版本与 6.2.0 相匹配）。

# yum --showduplicates search kibana 
# yum install kibana-6.2.0-1.x86_64

开启

经过以下确认，使用systemctl进行启动

$ ps -p 1
  PID TTY          TIME CMD
    1 ?        00:00:02 systemd

启动 Elasticsearch

# /bin/systemctl daemon-reload
# /bin/systemctl enable elasticsearch.service
Created symlink from /etc/systemd/system/multi-user.target.wants/elasticsearch.service to /usr/lib/systemd/system/elasticsearch.service.
# systemctl start elasticsearch.service

启动Kibana

# /bin/systemctl daemon-reload
# /bin/systemctl enable kibana.service
Created symlink from /etc/systemd/system/multi-user.target.wants/kibana.service to /etc/systemd/system/kibana.service.
# systemctl start kibana.service

确认行动 (Confirm action)

确认Elasticsearch的启动

# curl -X GET "localhost:9200/"
{
  "name" : "RTk9yAi",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "PtV0gqoQQsGjMa-fBZ1Uhw",
  "version" : {
    "number" : "6.2.0",
    "build_hash" : "37cdac1",
    "build_date" : "2018-02-01T17:31:12.527918Z",
    "build_snapshot" : false,
    "lucene_version" : "7.2.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

确认Sudachi插件

# curl -X GET 'http://localhost:9200/_nodes/plugins?pretty'
{
  "_nodes" : {
    "total" : 1,
..(中略)..
      "plugins" : [        {
          "name" : "analysis-sudachi",
          "version" : "1.1.0-SNAPSHOT",
          "description" : "The Japanese (Sudachi) Analysis plugin integrates Lucene Sudachi analysis module into elasticsearch.",
          "classname" : "com.worksap.nlp.elasticsearch.sudachi.plugin.AnalysisSudachiPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false,
          "requires_keystore" : false
        }
      ],

Sudachi的词典文件下载和部署

# wget https://oss.sonatype.org/content/repositories/snapshots/com/worksap/nlp/sudachi/0.1.1-SNAPSHOT/sudachi-0.1.1-20181108.091011-45-dictionary-core.zip
# mkdir /etc/elasticsearch/sudachi_tokenizer/
# unzip sudachi-0.1.1-20181108.091011-45-dictionary-core.zip
# mv system_core.dic /etc/elasticsearch/sudachi_tokenizer/

首先创建一个名为sudachi_test的索引（首先使用以下内容创建sudachi.json文件）。

{
  "settings": {
    "index": {
      "analysis": {
        "tokenizer": {
          "sudachi_tokenizer": {
            "type": "sudachi_tokenizer",
            "mode": "search",
            "discard_punctuation": true
          }
        },
        "analyzer": {
          "sudachi_analyzer": {
            "filter": [
            ],
            "tokenizer": "sudachi_tokenizer",
            "type": "custom"
          }
        }
      }
    }
  }
}

创建一个名为sudachi_test的索引。

$ curl -X PUT -H "Content-Type: application/json" http://localhost:9200/sudachi_test/ -d @sudachi.json
{"acknowledged":true,"shards_acknowledged":true,"index":"sudachi_test"}

观察一下Sudachi的运行情况。

請輸入「すももももももももものうち」這個字串。

如果使用Sudachi的话

$ curl -X POST -H "Content-Type: application/json" 'localhost:9200/sudachi_test/_analyze?pretty' -d '{"analyzer": "sudachi_analyzer", "text": "すもももももももものうち"}'

结果 (jié guǒ)

完璧

{
  "tokens" : [
    {
      "token" : "すもも",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "も",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "もも",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "も",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "もも",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "の",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "うち",
      "start_offset" : 10,
      "end_offset" : 12,
      "type" : "word",
      "position" : 6
    }
  ]
}

如果不使用Sudachi（使用标准分析器）

$ curl -X POST -H "Content-Type: application/json" 'localhost:9200/sudachi_test/_analyze?pretty' -d '{"analyzer": "standard", "text": "すもももももももものうち"}'

结果

每个文字都被分割开来

{
  "tokens" : [
    {
      "token" : "す",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<HIRAGANA>",
      "position" : 0
    },
    {
      "token" : "も",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<HIRAGANA>",
      "position" : 1
    },
    {
      "token" : "も",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<HIRAGANA>",
      "position" : 2
    },
    {
      "token" : "も",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<HIRAGANA>",
      "position" : 3
    },
    {
      "token" : "も",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<HIRAGANA>",
      "position" : 4
    },
    {
      "token" : "も",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<HIRAGANA>",
      "position" : 5
    },
    {
      "token" : "も",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<HIRAGANA>",
      "position" : 6
    },
    {
      "token" : "も",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "<HIRAGANA>",
      "position" : 7
    },
    {
      "token" : "も",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "<HIRAGANA>",
      "position" : 8
    },
    {
      "token" : "も",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "<HIRAGANA>",
      "position" : 9
    },
    {
      "token" : "の",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "<HIRAGANA>",
      "position" : 10
    },
    {
      "token" : "う",
      "start_offset" : 11,
      "end_offset" : 12,
      "type" : "<HIRAGANA>",
      "position" : 11
    },
    {
      "token" : "ち",
      "start_offset" : 12,
      "end_offset" : 13,
      "type" : "<HIRAGANA>",
      "position" : 12
    }
  ]
}

将设置进行更改，以便可以从远程连接。

Elasticsearch（ポート：9200）とKibana（ポート：5601）の初期設定では、両者ともlocalhostにバインドされており、localhostからのみアクセス可能な状態になっていました。

修改Elasticsearch和Kibana的配置文件后重新启动。

根据需要修改配置文件（可以选择将原始文件复制并保留原始版本）。

network.host: 0.0.0.0

server.host: "0.0.0.0"
elasticsearch.url: "http://<PUBLIC_IP_ADDRESS>:9200"

# systemctl restart elasticsearch.service
# systemctl restart kibana.service

这个特定的云环境配置更改

计算实例的防火墙设置 in Mandarin Chinese

默认情况下，除了SSH服务之外，其余所有服务都关闭，因此需要允许使用特定的端口。

# firewall-cmd --list-ports (※現状確認：何も無い)

# firewall-cmd --permanent --add-port=5601/tcp
success
# firewall-cmd --permanent --add-port=9200/tcp
success
# firewall-cmd --reload
success
# firewall-cmd --list-ports
5601/tcp 9200/tcp