Caricature Tab is a new incoming feature planned to be a part of the Almeta News app soon. In its primary version, the feature will provide the user with a stack of in-house designed Caricature images to enjoy browsing.
If you’re curious about how we in Almeta manage to handle such new features, then you will discover this in this post.
In this post, we’re showing the entire process towards making decisions to answer a bunch of design and deployment related questions:
- Where to store the images?
- How to handle new ones?
- Do we need caching?
- What to store in the database?
- What’s the best API structure?
Here in Almeta, we love the giant AWS and decided to serve most of our services by the means of their cloud computing services. Starting from here, and for the sake of choosing the best storage service that satisfies the needs of our caricature tab feature, we explored the available AWS storage services and compared them. Our exploration result is summarized in the following table:
|Service||Description||Usage Pattern||Not Used For|
|Amazon S3||A service that provides scalable and highly durable object storage in the cloud.||Store and distribute static web content and media. |
Host entire static websites data store for computation and large-scale analytics.
a highly durable, scalable, and secure solution for backup and archiving of critical data.
|Structured data with query.|
Rapidly changing data.
|Amazon Glacier||A service that provides low-cost highly durable archive storage in the cloud.||Archiving offsite enterprise information, media assets, and research and scientific data, etc.||Rapidly changing data.|
|Amazon Elastic File System (Amazon EFS)||A service that provides scalable network file storage for Amazon EC2 instances.||Designed to meet the needs of multi-threaded applications and applications that concurrently access data from multiple EC2 instances and that require substantial levels of aggregate throughput and input/output operations per second.|
Supports highly parallelized workloads and is designed to meet the performance needs of big data and analytics, media processing, content management, web serving, and home directories.
Can’t be mount to AWS Lambda.
|Amazon Elastic Block Store (Amazon EBS)||A service that provides block storage volumes for Amazon EC2 instances.||Meant for data that changes relatively frequently and needs to persist beyond the life ofEC2 instance.||Temporary storage.|
Highly durable storage.
Static data or web content.
So, what fits our situation?
Recalling a few facts:
- The images can be considered as a nearly static media in our application since they won’t be changed so frequently.
- We need to serve them to the user with an immediate access.
Hmmm… seems like S3 fits our situation well.
But wait, that’s not all about S3, moreover, S3 is serverless so:
- Using S3 we don’t need to plan for and allocate a specific amount of storage space to handle new images because S3 buckets scale automatically.
- We don’t need to manage or patch servers that store files ourselves; we just put and get your content.
Cool! then let’s go with S3!
In its primary version, our API will let us retrieve the existed active images URLs added to the S3 bucket, a long with their additional information.
The API is defined as shown below in the OpenAPI 3.0 format.
Obviously, this API supposed structured data with query support. This is not the situation with S3, thus, we’re going to pair DynamoDB for this purpose.
Our database design will be quite simple, as we will only have one table called “Images” with the following schema:
|image_url||string||The URL of the image in the S3 bucket|
|creation_date||string||The creation date of the image|
|image_title||string||The title of the image|
|active||bool||To mark an image as active. This kind of markers is usually useful for any customer-facing features for easy on/off turning of features or objects.|
Hence, the API will simply take the records of the
active images from this DynamoDB and push them in the response.
Serving Our API
Again to take the advantages of serverless, we’re going to serve our API using a combination of AWS Lambda and API Gateway.
We can invoke AWS Lambda functions over HTTP, by defining our API using Amazon API Gateway, and then mapping
GET method, to our Lambda function. In this way when an HTTP request is sent to our API endpoint, the Amazon API Gateway service invokes our Lambda function with an event that contains details about the HTTP request that it received.
Handling New Images
As we decided to use S3 to store our images, one of the ways to handle a new image uploaded to the S3 bucket, is by taking advantage of AWS Lambda which can be triggered by S3 bucket events like upload and delete.
Following is an example of the notification that will be sent to our handler when an event related to the linked S3 bucket took place.
By checking the “eventName” attribute, we can choose the right behavior for our function, whether to go ahead with the rest of the process if we captured the right event, or to stop otherwise.
In our situation, we will be seeking “ObjectCreate:Put” events, to add any newly uploaded images to our “Images” table.
Also, the transmitted notification provides us with the newly added object key which makes it possible for us to construct the URL of the new object e.x. image. While the provided “eventTime” can act as the creation date attribute.
However, as we have a title for each image, this method is limited and we should consider uploading to Lambda then Lambda uploads to S3. So, everything should go as an API, and we don’t need to worry about S3 directly.
Caching improves response time, and in the case of serverless, this also translates to cost savings as most of the technologies we use are pay-per-use. In this section, we’re exploring a few caching choices.
Caching At CloudFront
When serving static content to the users, Amazon recommends also setting up Amazon CloudFront to work with our S3 bucket to serve and protect the content.
What is CloudFront?
CloudFront is a content delivery network (CDN) service that delivers static and dynamic web content, video streams, and APIs around the world, securely and at scale.
How does it work and why to use it?
CloudFront serves content through a worldwide network of data centers called Edge Locations. Using edge servers to cache and serve content improves performance by providing content closer to where viewers are located. CloudFront has edge servers in locations all around the world.
When a user requests content that you serve with CloudFront, their request is routed to a nearby Edge Location. If CloudFront has a cached copy of the requested file, CloudFront delivers it to the user, providing a fast (low-latency) response. If the file they’ve requested isn’t yet cached, CloudFront retrieves it from your origin – for example, the S3 bucket where you’ve stored your content. Then, for the next local request for the same content, it’s already cached nearby and can be served immediately.
By caching your content in Edge Locations, CloudFront reduces the load on your S3 bucket and helps ensure a faster response for your users when they request content. In addition, data transfer out for content by using CloudFront is often more cost-effective than serving files directly from S3, and there is no data transfer fee from S3 to CloudFront. You only pay for what is delivered to the internet from CloudFront, plus request fees.
Caching at API Gateway
CloudFront is great, but it has many limitations. For example, with API Gateway you have a lot more control over the cache key. Moreover, with API Gateway, you can cache responses to any request, including POST, PUT and PATCH. One major downside to API Gateway caching is that you switch from pay-per-use pricing to paying for uptime.
Caching at Redis
Redis is an in-memory database that persists on disk. This means that Redis is fast, but that is also non-volatile. Redis used as a database, cache and message broker.
The data model is key-value, but many different kinds of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, and Bitmaps.
Caching at The Client App
For data that are seldom change, this is very effective. However, caching data on the client-side means you have to respond to at least one request per client. This is still very inefficient, and you should be caching responses on the server-side as well.
Regarding our caricature feature, caching may happen at different levels:
- API response: using Redis.
- Image: using CloudFront.
- App: using some flutter widget. (or the browser)
Our Plan for a Minimum Viable Product (MVP)
An MVP is all about testing your idea and discovering what will work to properly target your customer.
In mobile app development, an MVP is a development method where you develop only the core functionalities to solve a specific problem and satisfy early adopters. Essentially, an MVP is the basic model of your product that will fulfill the primary goal you want to achieve.
Following this pattern, our plan for an MVP would be as follows:
- Don’t use caching at all. It’s just a proof of concept. Only pull from S3 for now. No cloud front or Redis. Not for images and not for responses. Get it directly from DynamoDb since the response size is small.
- Don’t handle any event via lambda for the update, add or delete. We’ll enter the URLs by hand from S3 to DynamoDb. The API will simply take the URLs from DynamoDb and push them in the response.
And simply that’s it!
In this post, we showed the entire process we followed for making decisions regarding the design and deployment of a new feature planned to be soon in Almeta News App.
Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.