整在Rails应用程序的搜索处理中添加Elasticsearch的步骤总结

2 年 ago

文, 翔

6 minutes

首先

这是关于将一个使用Rails编写的Web服务的搜索功能从MySQL的InnoDB FTS替换为Elasticsearch时所做的事情的备忘录。

将以下过程在中国的 Elasticsearch 厨师中进行安装、使用 serverspec 进行测试，确认 Elasticsearch 单独运行，集成到 Rails 应用程序中，以及使用 RSpec 进行测试等一系列流程进行广泛但浅层次的解释。虽然每个步骤都非常基础，但作为整个流程的部分，我认为它们的汇总是有价值的，因此我将它们总结在一起。

顺便提一下，这是一个名为Web服务的网站。

提交-m：可以搜索GitHub提交消息示例的服务
http://commit-m.minamijoyo.com/

从系统规模来看，完全不需要Elastcisearch这样的级别，但反过来，对于想要尝试一下Elasticsearch的引入来说，这种简单的配置方式最起码最低限度地组织起来，易于理解，非常好。

请查看以下分支，其中包含用于说明的全部源代码。

[Rails应用程式部分](https://github.com/minamijoyo/commit-m/tree/change-fts-to-es)

以下是关于chef的设置等内容：
https://github.com/minamijoyo/commit-infra/tree/change-fts-to-es

使用的Elasticsearch版本是1.5.0，Rails的版本是4.2.0。

安装 Elasticsearch

只要安装好java，Elasticsearch的安装就只需要从官方网站下载并解压分发的zip文件即可。根据平台的不同，也可以以rpm等软件包形式进行分发。关于这些步骤，可以通过搜索引擎找到很多相关信息，这里就不再赘述了。

我会在下面用中国语来描述如何在chef上安装信息有限的Elasticsearch。

首先，在Berksfile文件中添加elasticsearch。

source "https://supermarket.chef.io"

(略)
cookbook 'elasticsearch', '~>1.0.0'

请注意，Elasticsearch的菜谱在0.3系和1.0系中的写法有所变化，所以在复制粘贴网络上的示例时要小心。基本上，它已经转变为提供大部分LWRP（Lightweight Resource Provider）形式的菜谱。

我会去Berks拿回菜谱。

$ bundle exec berks vendor cookbooks

创建一个es角色来赋予Elasticsearch服务器的功能，并阅读Elasticsearch的安装和定制设置。

{
  "name": "es",
  "chef_type": "role",
  "json_class": "Chef::Role",
  "default_attributes": {
    "java": {
      "install_flavor": "openjdk",
      "jdk_version": "8"
    }
  },
  "override_attributes": {},
  "run_list": [
    "recipe[java]",
    "recipe[elasticsearch]",
    "recipe[commitm-elasticsearch]"
  ]
}

作为要点，我们依赖于Java，并明确指定了OpenJDK 8的JDK版本。如果JDK版本为7，会在我们所使用的CentOS 6.5开发环境中出现与SSL证书相关的错误，导致无法成功安装head插件而陷入困境。

由于Elasticsearch默认进行了标准安装，因此为了安装其他插件等目的，需要额外创建一个用于自定义配置的site-cookbook。

$ bundle exec knife cookbook create -o site-cookbooks commitm-elasticsearch

在metadata.rb中添加依赖项。

name             'commitm-elasticsearch'
(略)
depends          'elasticsearch', '~> 1.0.0'

添加插件的安装和服务启动设置。在这里，我们将安装一个名为”head”的Elasticsearch的WebUI控制台。顺便提一下，如果要处理日语的话，可以使用”kuromoji”进行搜索。

elasticsearch_plugin 'mobz/elasticsearch-head'

service 'elasticsearch' do
  action :start
end

我会将创建的ES卷添加到节点的运行列表中。

{
  "environment": "production",
  "run_list": [
    "role[base]",
    "role[ap]",
    "role[db]",
    "role[es]"
  ]
}

如果準備好了，我就用knife solo單獨烹飪。
（還是先不談現在流行的chef-zero）

$ bundle exec knife solo cook commitm-ap

在服务器规范中进行测试。

我会在 serverspec 中添加对 es 角色的测试。

require 'spec_helper'

describe "elasticsearch spec" do
  # package
  describe package('java-1.8.0-openjdk') do
    it { should be_installed }
  end

  # command
  describe command('which elasticsearch') do
    let(:disable_sudo) { true }
    its(:exit_status) { should eq 0 }
  end

  # service
  describe service('elasticsearch') do
    it { should be_enabled }
    it { should be_running }
  end

  # port
  describe port("9200") do
    it { should be_listening }
  end

  # plugin
  describe command('curl http://127.0.0.1:9200/_plugin/head/ -o /dev/null -w "%{http_code}\n" -s') do
    its(:stdout) { should match /^200$/ }
  end
end

我正在进行以下操作以确认OpenJDK 1.8的安装情况，检查elasticsearch命令是否存在，设置elasticsearch的自动启动服务，确认监听端口情况，并测试head插件的响应是否正常。

由于ServerSpec的角色与Chef无关，我们故意将其分开管理，所以我们还会添加一个ES角色。关于使用ServerSpec来测试目标IP和角色的管理方法，我之前在博客上写过，请也参考这方面内容。
http://d.hatena.ne.jp/minamijoyo/20150301/p1

[
  (略)
  {
    "name": "commitm-ap",
    "host_name": "<%= ENV['TARGET_IP'] %>",
    "user": "ec2-user",
    "port": 22,
    "keys":  "<%= ENV['TARGET_SSH_KEYPATH'] %>",
    "roles":["base", "ap", "db", "es"]
  }
]

当准备好后，我们也来执行一下serverspec吧。

$ bundle exec rake serverspec:commitm-ap

Elasticsearch的简单使用方法

因为Elasticsearch的设置已经完成了，所以在将其集成到Rails应用程序之前，我们先来稍微验证一下Elasticsearch能否单独运行。由于可以通过curl的HTTP请求来操作Elasticsearch本身，所以通过这种方式大致了解如何使用它，之后就是讨论如何从Rails中使用它，基本的理解会对进度有所帮助。

Elasticsearch在默认情况下运行在9200端口。我们尝试使用curl进行访问一下。如果是在根目录下，它会返回Elasticsearch的版本号等信息的响应。

$ curl http://localhost:9200/
{
  "status" : 200,
  "name" : "commitm-dev",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.5.0",
    "build_hash" : "544816042d40151d3ce4ba4f95399d7860dc2e92",
    "build_timestamp" : "2015-03-23T14:30:58Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

接下来我们来创建Elasticsearch的索引。
Elasticsearch的索引类似于RDS中的数据库。
由于它是一个RESTful API，所以创建一个名为commitm的索引，可以通过PUT请求来完成。

$ curl -XPUT http://localhost:9200/commitm/

{"acknowledged":true}

接下来我们尝试定义一个名为commit的mapping。类似于RDS中的表类型定义。

$ curl -XPUT http://localhost:9200/commitm/commit/_mapping -d '{
  "commit": {
    "properties": {
      "id": { "type": "integer", "index": "not_analyzed" },
      "repo_full_name": { "type": "string" },
      "sha": { "type": "string", "index": "not_analyzed" },
      "message": { "type": "string" }
    }
  }
}'

{"acknowledged":true}

type的integer和string是类型定义，所以我认为不需要解释。而not_analyzed表示不进行分析的意思，我们将其指定为希望在搜索时进行精确匹配而不是部分匹配的字段。由于我们处理的数据是简单的英语句子，使用空格分隔，所以省略了tokenizer和analyzer的解释。

我会试着注册一条数据。就像这样使用PUT方法来插入真实数据。

$ curl -XPUT http://localhost:9200/commitm/commit/1 -d '{
  "id": 1,
  "repo_full_name": "twbs/bootstrap",
  "sha": "9e1e73f9dcfdf20305dcb6a83e77e67efe1948c5",
  "message": "Merge pull request #15762 from twbs/twitter-handle"
}'

{"_index":"commitm","_type":"commit","_id":"1","_version":1,"created":true}

搜寻时请使用GET请求发送查询。我尝试搜索包含关键词”merge”的消息。
如果输出结果很长，你可以添加pretty=true参数以整理输出。

$ curl -XGET 'http://localhost:9200/commitm/commit/_search?pretty=true' -d '{
  "query": {
    "match": {
      "message": "merge"
    }
  }
}'

{
  "took" : 13,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.095891505,
    "hits" : [ {
      "_index" : "commitm",
      "_type" : "commit",
      "_id" : "1",
      "_score" : 0.095891505,
      "_source":{
  "id": 1,
  "repo_full_name": "twbs/bootstrap",
  "sha": "9e1e73f9dcfdf20305dcb6a83e77e67efe1948c5",
  "message": "Merge pull request #15762 from twbs/twitter-handle"
}
    } ]
  }
}

你对Elasticsearch的本质感觉有所了解了吗？

我对Elasticsearch的功能有了一些了解，但是手动编写JSON的读写感觉有点困难呢。那么，是时候让Rails应用程序可以使用它了。

将Elasticsearch集成到Rails应用程序中。

要将Elasticsearch嵌入到Rails应用程序中，只需在Gemfile中添加以下宝石，它会使得Rails应用程序能够很好地使用Elasticsearch的API。

(略)
gem 'elasticsearch-rails', '~> 0.1.7'
gem 'elasticsearch-model', '~> 0.1.7'

使用 “bundle” 进行安装。

$ bundle install

接下来，我们将把与commit模型的搜索处理相关的代码封装成concern模块。

require 'active_support/concern'
module Commit::Searchable
  extend ActiveSupport::Concern

  included do
    include Elasticsearch::Model

    index_name "commitm"

    settings index: {
      number_of_shards: 1,
      number_of_replicas: 0
    } do
      mapping _source: { enabled: true } do
        indexes :id, type: 'integer', index: 'not_analyzed'
        indexes :repo_full_name, type: 'string'
        indexes :sha, type: 'string', index: 'not_analyzed'
        indexes :message, type: 'string'
      end
    end
  end

  module ClassMethods
    def create_index!(options={})
      client = __elasticsearch__.client
      client.indices.delete index: "commitm" rescue nil if options[:force]
      client.indices.create index: "commitm",
        body: {
          settings: settings.to_hash,
          mappings: mappings.to_hash
        }
    end
  end
end

在模块中，通过`include Elasticsearch::Model`来包含一组方便的方法。

索引名称是index_name，settings用于编写索引的设置。number_of_shards和number_of_replicas是与容错性和性能相关的分片和副本设置，但由于这次并不是特别要求，所以暂时忽略。

在mapping的部分，我們將寫入與先前定義的mapping相同的定義。就像撰寫Rails模型的遷移一樣。

create_index!是一个实际创建索引的助手。稍后可以从Rails控制台执行。通过__elasticsearch__.client可以获取Elasticsearch客户端对象，通过这个客户端，可以进行各种操作。

将创建的模块包含到模型中。

class Commit < ActiveRecord::Base
  include Commit::Searchable
  def self.search_message(keyword)
    if keyword.present?
      query = {
        "query": {
          "match": {
            "message": keyword
          }
        }
      }
      Commit.__elasticsearch__.search(query)
    else
      Commit.none
    end
  end
end

我会使用收到的关键词组装查询请求，并传递给Commit.__elasticsearch__.search方法。在不知不觉中，Commit模型上出现了__elasticsearch__.search这样的东西，这让我很惊讶，但是elasticsearch-rails和elasticsearch-model会自动向Elasticsearch发送查询请求。

在控制器周围的问题中，我现在只在意分页功能。从 will_pagenate 来看，它似乎会自动处理，使得它与使用 ActiveRecord 时的情况相同。

class CommitsController < ApplicationController
  def index
    @commits = []
    @keyword = ""
  end

  def search
    @keyword = params[:keyword]
    @commits = Commit.search_message(@keyword).paginate(page: params[:page])
  end
end

只有一个方面没有被理解，就是在view中无法通过@commits.count得到整个清单的条目数量，而需要使用@commits.total_entries才能获得。

<%= render 'search_form' %>
<hr>
<% unless @commits.nil? %>
    <%= pluralize(@commits.total_entries, "result") %>.
<% end %>
<% if @commits.any? %>
  <table class="table table-hover">
  (略)
  </table>
  <%= will_paginate @commits, :params => { :keyword => @keyword} %>
<% end %>

当准备好后，从Rails控制台中输入数据并进行实际搜索尝试。

$ bundle exec rails c
rails> Commit.create_index!
rails> Commit.import

刚才利用之前创建的create_index!助手，它会创建索引并根据数据库数据使用import将数据投入Elasticsearch。

顺便试试能不能在Rails控制台中进行搜索。

rails> Commit.__elasticsearch__.search(
  {
    "query": {
      "match": {
        "message": "merge"
      }
    }
  }
).records.to_a

当执行此操作时，它将把返回的搜索查询结果汇总到一个数组中并返回。

如果最终在Web界面上确认并输入搜索关键词，并且能返回结果，那就可以了。有反应了！可以愉快地结束了。如果你对此满意，就可以回去了，没有问题。

编写Elasticsearch的测试

大多数的入门文章都说到这就结束了，没有关于测试的说明，所以我也会补充一下关于RSpec测试的内容。

在Gemfile中添加一个名为elasticsearch-extensions的gem。

group :test do
    (略)
    gem 'elasticsearch-extensions', '~> 0.0.18'
end

$ bundle install

在spec的助手中进行加载: 在elasticsearch的测试前后设置elasticsearch的启动和停止。请根据需要适当调整助手的位置。另外，还将创建索引注册、数据注册和索引删除的助手。

（略）
Spork.prefork do
（略）
  require 'elasticsearch/extensions/test/cluster'
（略）
  RSpec.configure do |config|
  (略)
    # Elasticsearch test setting
    config.before(:all, :elasticsearch) do
      Elasticsearch::Extensions::Test::Cluster.start(nodes: 1) unless Elasticsearch::Extensions::Test::Cluster.running?
    end

    config.after(:all, :elasticsearch) do
      Elasticsearch::Extensions::Test::Cluster.stop if Elasticsearch::Extensions::Test::Cluster.running?
    end
  end

  def elasticsearch_create_index_and_import
    Commit.__elasticsearch__.create_index! force: true
    Commit.import
    sleep 1
  end

  def elasticsearch_delete_index
    Commit.__elasticsearch__.client.indices.delete index: Commit.index_name
  end
end

然后，我们将编写实际的RSpec测试。

require 'rails_helper'

RSpec.describe "Commits", type: :request do
  subject { page }

  describe "Root Page" do
    before { visit root_path }

    (略)
    describe "Search form", :elasticsearch do
      before do
        3.times { FactoryGirl.create(:commit) }
        elasticsearch_create_index_and_import
      end

      after do
        elasticsearch_delete_index
        Commit.delete_all
      end

      describe 'Click Search button' do
        before do
          fill_in "keyword", with:"Message"
          click_button "Search"
        end
        it { should have_content('3 results.') }
      end

    end
  end
  (略)
end

使用`describe`来控制Elasticsearch的启动和停止，并使用`elasticsearch_create_index_and_import`来创建Elasticsearch的索引并导入数据，使用`elasticsearch_delete_index`来删除数据。

$ bundle exec rake spec

现在可以通过使用Elasticsearch进行测试了，就像这样。
这一次，真是喜大普奔。

最后

如果模型非常简单的话，借助gem，将Elasticsearch集成到Rails应用程序中并不是很困难。但是当模型和查询变得复杂时，情况就不那么简单了，但我希望一旦掌握了相关的技巧，我可以与大家分享。

以下是参考的中文翻译（仅提供一个选项）：

– 参考

RSpecでElasticsearchを使ったテストを書く