尝试通过使用亚马逊Elasticsearch服务来进行Elasticsearch教程

2 年 ago

清, 扬

2 minutes

我將在 Amazon Elasticseach Service 上啟動 Elasticsearch，並嘗試簡單地使用它。

由于有许多出色的网站提供技术相关的概念，我们此次将使用以下网站的教程来实施Amazon Elasticsearch Service的操作。

code46.hatenablog.com的内容请用中文重新表达。

为了简化未在上述中提及的数据注册和操作，我们还参考了以下内容。

Elasticsearch 命令行接口（ES CLI）｜ Developers.IO

在Elasticsearch教程中插入数据- Qiita

从AWS的控制台开始Elasticsearch服务。
按照亚马逊的入门指南，在5.1版本上启动。

为了从一个单独启动的EC2实例对Elasticsearch进行操作，我们设置了IAM用户基于的访问策略。此外，为了进行Kibana的操作确认，我们还进行了IP限制。

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::....."
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-west-2:....."
    },
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-west-2:.....",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "xxx.xxx.xxx.xxx"
        }
      }
    }
  ]
}

为了简化对elasticsearch的操作，我将安装elasticsearch-fabric。

pip install elasticsearch-fabric

创建一个fabfile.py文件，以使得可以执行elasticsearch-fabric的命令。

from fabric.api import env
from elasticsearch import Elasticsearch
from elasticsearch import RequestsHttpConnection
from requests_aws4auth import AWS4Auth
from esfabric import tasks as es
from boto3 import Session
from pprint import pprint


session = Session()
credentials = session.get_credentials()
access_key = credentials.access_key
secret_key = credentials.secret_key
region = session.region_name

awsauth = AWS4Auth(access_key, secret_key, region, 'es')

env.elasticsearch_clients = {
    "default": Elasticsearch(**{
        "host": "elasticsearch serviceのエンドポイント",
        "port": 443,
        "send_get_body_as": "POST",
        "http_auth": awsauth,
        "use_ssl": True,
        "verify_certs": True,
        "connection_class": RequestsHttpConnection
    })
}

请在aws configure命令中设置aws cli的access_key, secret_key和region。（直接嵌入上面的源码也可以）
验证是否可以通过elasticsearch-fabric进行连接。

fab es.info

请确认返回Elasticsearch服务的相关信息。

现在准备工作已经完成，终于可以开始进行教程了。
下面是注册地图的教程步骤。

cat mapping.json | fab es.indices.create:index=ldgourmet

如果mapping.json文件出现错误，删除方法如下。

fab es.indices.delete:index=ldgourmet

接下来是数据注册的步骤，为了进行Elasticsearch的批量注册，
根据参考网站创建以下脚本，将csv数据转换为json数据。（只需取消注释参考网站脚本中的索引输出部分。）

#!/usr/bin/env ruby

require 'csv'
require 'json'
require 'securerandom'

line = STDIN.gets.chomp
csv = CSV.new(line)
header = csv.to_a[0]

INDEX = "ldgourmet"
TYPE  = "restaurant"

def string_to_float(string)
  string =~ /([0-9]+)\.([0-9]+)\.(.+)/
  ($1.to_f + ($2.to_f / 60) +  ($3.to_f / 60**2)).to_s
end


CSV(STDIN).each_with_index do |row, i|
  index = { "index" =>
    { "_index" => INDEX, "_type" => TYPE, "_id" => SecureRandom.uuid }
  }
  puts JSON.dump(index)
  hash = Hash[header.zip row]

  hash["location"] = {
    :lat => string_to_float(hash["north_latitude"]),
    :lon => string_to_float(hash["east_longitude"]),
  }

  puts JSON.dump(hash)
end

转换数据。

cat restaurants.csv | ruby csv2json.rb > restaurants.simple.json

我将注册数据。

cat restaurants.simple.json | fab es.helpers.bulk:index=ldgourmet,doc_type=restaurant

确认数据已经被注册。

fab es.count:index=ldgourmet

{"count": 428475, "_shards": {"successful": 5, "failed": 0, "total": 5}}

如果确认数据已经注册，您可以使用Kibana控制台来执行查询！

在控制台中发送以下请求并确认结果是否返回。

GET ldgourmet/restaurant/_search
{
  "query" : {
    "simple_query_string" : {
      "query": "\"世田谷\",
      "fields": ["_all"],
      "default_operator": "and"
    }
  }
}