Search engines use indexing to store information about web pages, enabling them to quickly return relevant, high-quality results.
Indexing is the process by which search engines organize information before a search to enable super-fast responses to queries.
Searching through individual pages for keywords and topics would be a very slow process for search engines to identify relevant information. Instead, search engines (including Google) use an inverted index, also known as a reverse index.
The following libraries and engines are services for search, we will discuss each individually, then we’ll make a comparison between them based on several factors:
1 – Lucene
Apache Lucene is a free and open-source search engine software library, originally written completely in Java.
It is supported by the Apache Software Foundation and is released under the Apache Software License.
Lucene has been ported to other programming languages including Object Pascal, Perl, C#, C++, Python, Ruby and PHP.
2 – Solr
Solr (pronounced “solar”) is an open-source enterprise-search platform, written in Java, from the Apache Lucene project. It uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages.
3 – Elasticsearch
Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java. Following an open-core business model, parts of the software are licensed under various open-source licenses (mostly the Apache License), while other parts fall under the proprietary (source-available) Elastic License. Official clients are available in Java, .NET (C#), PHP, Python, Apache Groovy, Ruby and many other languages. According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene.
4 – Sphinx
Sphinx can be used either as a stand-alone server or as a storage engine (“SphinxSE”) for the MySQL family of databases. When run as a standalone server Sphinx operates similar to a DBMS and can communicate with MySQL, MariaDB and PostgreSQL through their native protocols or with any ODBC-compliant DBMS via ODBC. MariaDB, a fork of MySQL, is distributed with SphinxSE.
If Sphinx is run as a stand-alone server, it is possible to use SphinxAPI to connect an application to it. Official implementations of the API are available for PHP, Java, Perl, Ruby and Python languages. Unofficial implementations for other languages, as well as various third party plugins and modules are also available. Other data sources can be indexed via pipe in a custom XML format.
5 – Amazon CloudSearch
Amazon CloudSearch is a scalable cloud-based search service that forms part of Amazon Web Services (AWS). CloudSearch is typically used to integrate customized search capabilities into other applications. According to Amazon, developers can set a search application up and deploy it fully in less than an hour.
6 – Amazon Elasticsearch Service (Amazon ES)
With Amazon Elasticsearch Service, you pay only for what you use. There is no minimum fee or usage requirement. You are charged only for Amazon Elasticsearch Service instance hours, Amazon EBS storage (if you choose this option), and data transfer.
This table shows the comparison between these Frameworks:
Lucene | Solr | Elasticsearch | Sphinx | Amazon CloudSearch | Amazon Elasticsearch | |
Autocomplete | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Auto-suggestion | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Recommendation | ✔️ | ✔️ | ✔️ | – | ✔️ | ✔️ |
Support Arabic | ✔️ | ✔️ | ✔️ | – | ✔️ | ✔️ |
Memory size per million document | 125% – 150% of docs size | 125% – 150% of docs size | 125% – 150% of docs size | – | 125% – 150% of docs size | 125% – 150% of docs size |
Disk size | size of docs * 2 | size of docs * 2 | size of docs * 2 | size of docs * 2 | – | – |
Cost | Free | Free | Standard-16$/month | Free | Pay only for what you use | Pay only for what you use |
The response time dependence on the hardware you use.
Why Elasticsearch is paid?
Actually, the code of elastic is open-source, so if you want managed hosting from elastic.co, they charge you according to several variables. You can find the pricing here.
If you want to use the open-source version, stand up your own servers and manage your own deployment, the code is at no cost and can be found here.
AWS ELASTICSEARCH VS AWS CLOUDSEARCH
There is some useful comparison here.
Useful comparison: Amazon CloudSearch vs ElasticSearch vs Apache Solr Comparison in detail.
Conclusion
The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would require a considerable time and computing power.
According to my search, I think that Amazon CloudSearch, Amazon Elasticsearch and maybe Elasticsearch paid-version.
Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.