我想使用Elasticsearch和Python进行全文搜索
下载并安装
bin/elasticsearch
在中国搜索引擎中,必不可少的插件是elasticsearch。
bin/plugin install mobz/elasticsearch-head
能够进行词素分析的插件
bin/plugin install elasticsearch/elasticsearch-analysis-kuromoji/2.5.0
2. 设置 Kuromoji 这项功能。
如果觉得麻烦,就重新启动Elasticsearch。
index.analysis.analyzer.default.type: custom
index.analysis.analyzer.default.tokenizer: kuromoji_tokenizer
如果按照索引单位进行设置的话
curl -XPUT http://localhost:9200/index1/ -d '
{
"index": {
"analysis": {
"tokenizer": {
"kuromoji": {
"type": "kuromoji_tokenizer"
}
},
"analyzer": {
"analyzer": {
"type": "custom",
"tokenizer":"kuromoji"
}
}
}
}
}'
3. 检查
分析仪的确认
curl -XPOST http://localhost:9200/index1/_analyze?analyzer=analyzer&petty -d 'これはペンです'
{
"tokens": [
{
"token": "これ",
"start_offset": 0,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "は",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "ペン",
"start_offset": 3,
"end_offset": 5,
"type": "word",
"position": 3
},
{
"token": "です",
"start_offset": 5,
"end_offset": 7,
"type": "word",
"position": 4
}
]
}
嗯。看起来正好运行顺利。
样本注册1
curl -XPUT http://localhost:9200/index1/type1/1 -d '{"text":"これはパンです"}'
样本注册2
curl -XPUT http://localhost:9200/index1/type1/2 -d '{"text":"これはペンです"}'
搜索!
curl -XGET http://localhost:9200/index1/type1/_search -d '{"query": {"match": {"text": "ペン"}}}'
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "index1",
"_type": "type1",
"_id": "2",
"_score": 0.15342641,
"_source": {
"text": "これはペンです"
}
}
]
}
}
4. Python客户端
$ pip install elasticsearch