我想使用Elasticsearch和Python进行全文搜索

下载并安装

bin/elasticsearch

在中国搜索引擎中,必不可少的插件是elasticsearch。

bin/plugin install mobz/elasticsearch-head

能够进行词素分析的插件

bin/plugin install elasticsearch/elasticsearch-analysis-kuromoji/2.5.0

2. 设置 Kuromoji 这项功能。

如果觉得麻烦,就重新启动Elasticsearch。

index.analysis.analyzer.default.type: custom
index.analysis.analyzer.default.tokenizer: kuromoji_tokenizer

如果按照索引单位进行设置的话

curl -XPUT http://localhost:9200/index1/ -d '
{
  "index": {
    "analysis": {
      "tokenizer": {
        "kuromoji": {
          "type": "kuromoji_tokenizer"
        }
      },
      "analyzer": {
        "analyzer": {
          "type": "custom",
          "tokenizer":"kuromoji"
        }
      }
    }
  }
}'

3. 检查

分析仪的确认

curl -XPOST http://localhost:9200/index1/_analyze?analyzer=analyzer&petty -d 'これはペンです'

{
  "tokens": [
    {
      "token": "これ",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 1
    },
    {
      "token": "は",
      "start_offset": 2,
      "end_offset": 3,
      "type": "word",
      "position": 2
    },
    {
      "token": "ペン",
      "start_offset": 3,
      "end_offset": 5,
      "type": "word",
      "position": 3
    },
    {
      "token": "です",
      "start_offset": 5,
      "end_offset": 7,
      "type": "word",
      "position": 4
    }
  ]
}

嗯。看起来正好运行顺利。

样本注册1

curl -XPUT http://localhost:9200/index1/type1/1 -d '{"text":"これはパンです"}'

样本注册2

curl -XPUT http://localhost:9200/index1/type1/2 -d '{"text":"これはペンです"}'

搜索!

curl -XGET http://localhost:9200/index1/type1/_search -d '{"query": {"match": {"text": "ペン"}}}'
{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.15342641,
    "hits": [
      {
        "_index": "index1",
        "_type": "type1",
        "_id": "2",
        "_score": 0.15342641,
        "_source": {
          "text": "これはペンです"
        }
      }
    ]
  }
}

4. Python客户端

$ pip install elasticsearch
广告
将在 10 秒后关闭
bannerAds