AWS provides 2 different managed search services to add the search capability to your application. However, if you are AWS enthusiast you will have to choose between the older AWS CloudSearch service and the newer AWS ElasticSearch.
In a previous article, we have explored the various features of the AWS CloudSearch service. We highly recommend reading that piece first.
In this article, we will explore the various differences between ElasticSearch and CloudSearch and when you should consider one of them instead of the other.
Scaling And Management
- AWS cloud Search support automatic scaling of instances based on the received traffic and thus the whole service can be fully automated. On the other hand, Elastic search requires manual intervention to scale the cluster. This intervention is simplified since it can be done through the Consol, CLI or the SDKs and also since the traffic data can be logged into cloud watch and alarms can be set to notify the system manager when a peak in usage is observed.
- Furthermore, in case you have your search service distributed across several instances and forming a cluster. In the case of ElasticSearch, this cluster needs to be configured and re-configured by the system manager inside the ElasticSearch framework. However, in the case of CloudSearch, the cluster management all of these steps are fully managed and automated.
- Even when it comes to availability and fail-over. The CloudSearch service provides out of the box fail-over capabilities. While in case of ElasticSearch it is important to configure your search service in a cluster and manually designate certain nodes as replicas or user-nodes.
- The latency of both APIs mostly depends on the number of instances and the way the whole service is configured, we have not come across any article reporting inherent differences between the performance of ElasticSearch and CloudSearch when the same hardware was used. The only upside for ElasticSearch is the ability to customize the cluster more, which gives us a better chance at fine-tuning the performance.
- Another factor in the performance is the ability to host multiple inverted indexes and multiple replicas across different regions in the world. Both services support creating a replica of the service to support a specific region using the AZ API. CloudSearch supports up to one replica, while ElasticSearch can support up to 2.
- A major factor is the supported regions of the instances. Having the search service closer to your customers can have a big impact on the latency of the request. CloudSearch only supports a handful of regions because it uses a restricted set of legacy instances. ElasticSearch on the other hand by relying on normal EC2 instances can support all the regions supported by EC2. For example, the newly added middle east region is available for ElasticSearch but not CloudSearch.
- Finally, ElasticSearch supports query caching in case a certain index have the same query searched for multiple times. Such service can heavily decrease the latency based on the application type (if the same query can be repeated frequently)
It is usually more efficient to perform your insertions or updates to the index in batches regardless of the service you are using, this can reduce the computational cost of indexing, and thus improve the performance and in case of CloudSearch reduce the costs. The simplicity of caching these operations and doing them is thus the main concern when deploying a search service.
- Data connectors: these are plugins that can link your data sources with the search service in order to simplify the process of adding new documents to the index. Both services support data connectors from S3 and DynamoDB using Lambda Functions as triggers for addition. However Elastic Search also supports adding data from CloudWatch and AWS Kinesis.
- The maximum allowed file size in CloudSearch is 1 MB while using ElasticSearch we can use files up to 2 GB.
- The maximum Allowed bulk Operation (inserting or updating documents) in case of CloudSearch is 5MB or 1000 document whichever comes first, but in case of ElasticSearch this can be as high as 100Mb .
For the same service specifications using AWS ElasticSearch is usually more economical. You can see a detailed description of the CloudSearch vs ElasticSearch pricing plans in our previous article about CloudSearch. But in a nutshell in both services, the main cost is for running the search instances. In the case of CloudSearch, there are some small added costs for transfer and re-indexing However they are negligible.
The main factor in favouring ElasticSearch over CloudSearch in this category is that ElasticSearch uses the cheaper and modern m5 instances, while CloudSearch still sticks to the older and more expensive m3 instances.
For example, the m5.large instance costs $0.158 per Hour while the m3.large costs $0.256. the 100 cents difference can really build up if you are storing a lot of data in multiple instances and having sizable traffic.
Furthermore, ElasticSearch supports some really big discounts of up to 52% when using the reserved instances plan. This option appeals to a larger corporation with a more stable budget. whereby paying a small fee upfront you can save a lot.
Please note that the data transfer fees are basically negligible if the dataSource and the Search service are in the same region, it can range between 0 and 0.01$ per GB.
However, when transferring data between different regions or between AWS and the Internet, you are charged at $0.09/GB. which is also very low in comparison with the instances costs.
Another factor in favouring ElasticSearch when it comes to the budget is the free tier setup. In the case of CloudSearch, the free tier is restricted to 12 months from the first request you do to the API. In ElasticSearch the free tier is not restricted by time, but rather by the amount of usage. For customers in the AWS Free Tier, Amazon Elasticsearch Service provides free usage of up to 750 hours per month of a t2.small.elasticsearch instance and 10GB per month of optional EBS storage (Magnetic or General Purpose). This is specifically helpful for new startups. Where you can start an initial search service and only pay for it when your project starts attracting traffic.
- ElasticSearch is schema-free and document-oriented while Solr, which is the framework on which AWS CloudSearch is built, is schema-oriented. What does that mean? Well, in a very simple sense: In case of CloudSearch, you need to set a specific schema of your documents before you start adding them to the search service (indexing them) and any changes to this schema would require a re-indexing all the already added documents. Why is re-indexing bad for you? Well because it usually translates to downtime or in best cases increased costs in case a replica was used. On the other-hand ElasticSearch being free of this schema means it can circumvent this issue.
- Elastic search supports the customization of text analyzers, tokenizers and token filters based on language while no such option is available in CloudSearch.
- Elastic search is a part of the ELK pipeline (Elasticsearch, Logstash, Kibana) is increasingly being used for big data log analytics use cases, such as IT security, e-commerce shopper behaviour analytics, market intelligence, risk management, and compliance.
- Elastic search allows for queries in SQL like language while no such queries are supported by CloudSearch.
- AutoCorrrection (Did you mean) feature is supported by the Elastic Search but not the CloudSearch service.
Both ElasticSearch and CloudSeach provide us with cool AWS style plug and play search services. Both being managed they can really encapsulate a lot of nuances that you need to delve into when you deploy your own solution using solr on a reserved.
CloudSearch provides a simplified off the shelf service that requires minimal configuration and management. This can reduce development and management costs. But on the other hand, the minimalistic approach means more restricted control over the service.
More control, greater ability to fine-tune the service and lower operational prices can be obtained when using ElasticSearch. But you will pay in the human effort to first develop and then modify your search service.
The following table illustrates the main differences between CloudSearch and ElasticSearch
|Feature||AWS Elasticsearch|| AWS|
|Backup||Snapshot API/Custom scripts||Fully-managed|
|Patch Management||Manual/Automated via custom scripts||Fully-managed|
|Re-indexing||Not Needed||Fully-managed Manual option available from management console|
|Monitoring||Amazon CloudWatch, which can be connected to kibana||CloudSearch default metrics|
|HTTP RESTful API||YES||YES|
|Request Format||XML, JSON||XML, JSON|
|Response Format||XML, JSON||XML, JSON|
|Schema||Schema and Schema-less||Schema|
|Dynamic fields support||Yes||Yes|
|Analyzers, Tokenizers and Token filters||Default/Custom||Default|
|Did you mean||Default/Custom||No|
|Scaling||Vertical scaling/ Horizontal scaling||Fully-managed horizontal scaling|
|Failover||Yes, if set up in Cluster Replica mode||Fully-managed|
|Fault tolerant||Yes, if set up in Cluster mode||Fully-managed|
|Data import||Batch upload using data connectors||Batch upload using data connectors|
|Web Interface||AWS Management Console||AWS Management Console|
|Allowed Availability Zones (different region replicas)||2||1|
|Supported regions||All EC2 regions||US East (Northern Virgina), US West (Oregon), US West (N. California), EU (Ireland), EU (Frankfurt), South America (Sao Paulo) and Asia Pacific (Singapore, Tokyo, Sydney, and Seoul)|
|Maximum file size||2 GB||1 MB|
|Maximum bulk operations||100 MB||1000 documents or 5 MB whichever comes first|
We really recommend this amazing article, while it is a bit old it is extremely detailed and covers most the aspects of the comparison between ElasticSearch and CloudSearch.
Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.