Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

中文搜索 #3

Open
billie66 opened this issue Jun 29, 2014 · 10 comments
Open

中文搜索 #3

billie66 opened this issue Jun 29, 2014 · 10 comments

Comments

@billie66
Copy link
Owner

elasticsearch 搜索中文的时候会按字的匹配,比如说搜索苹果,得到的结果是包含 所有条目,而期望的结果是只包含 苹果 的匹配项

@billie66
Copy link
Owner Author

解决办法是用匹配更精确的中文分词插件

http://shuminghuang.iteye.com/blog/1839760

ik 分词器

https://github.com/medcl/elasticsearch-rtf

@billie66
Copy link
Owner Author

一篇关于如何配置 elasticsearch 的文章

http://obtao.com/blog/2013/10/configure-elasticsearch-on-an-efficient-way/

@billie66
Copy link
Owner Author

billie66 commented Jul 1, 2014

这篇文章比较清晰的介绍了如何在 Ubuntu 环境下安装中文 ik 分词器

http://blog.segmentfault.com/lvye/1190000000448816

@happypeter
Copy link
Collaborator

elsaticsearch tutorial

http://joelabrahamsson.com/elasticsearch-101/

@billie66
Copy link
Owner Author

billie66 commented Jul 8, 2014

根据下面repo的说明文档,就可以配置中文分词 ik,相关的 elasticsearch 设置已经在本地设置成功,回头总结成文章。

https://github.com/medcl/elasticsearch-analysis-ik

@billie66
Copy link
Owner Author

在 model 中设置 elasticsearch 分词器

https://github.com/elasticsearch/elasticsearch-rails/tree/master/elasticsearch-model 文档中给出的例子

class Article
  settings index: { number_of_shards: 1 } do
    mappings dynamic: 'false' do
      indexes :title, analyzer: 'english', index_options: 'offsets'
    end
  end
end

@billie66
Copy link
Owner Author

用 elastic 的 API 设置一个 index,参考文档

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-create-index.html

贴一个最明了的例子吧:

/* the twitter index settings */
$ curl -XPUT 'http://localhost:9200/twitter/' -d '{
    "settings" : {
        "index" : {
            "number_of_shards" : 3,
            "number_of_replicas" : 2
        }
    }
}'

@billie66
Copy link
Owner Author

http://blog.segmentfault.com/lvye/1190000000448816

上面的例子已经跑通了(主要是给特定的 index 设置 ik 分词器),中文分词 ik 生效了

@billie66
Copy link
Owner Author

已经在这个项目中配置了 ik 分词器,总结为文档了,地址:

http://happycasts.github.io/ep/use-elasticsearch-with-rails/

UPdate: 由于在服务器上再次安装 ik 遇到了困难,所以中文分词的知识就不在这期 happycasts 中分享了。相关文档移动到了 https://github.com/billie66/esdemo/wiki/ik

@happypeter
Copy link
Collaborator

rails 中添加的代码: 91db269

@happypeter happypeter reopened this Sep 27, 2014
Repository owner deleted a comment from nerypy Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants