Why do we need SNS and SQS:
Before moving to our next container, let’s just discuss AWS SNS and SQS’s purpose in our task. As you know, SNS is used for simply carrying messages and you can trigger different services whenever a new message is published into it. We will be using this behavior of SNS in our task.
Theoretically, whenever a new domain is pointed to our CDN and we make its entry in our database, we will push a new message to our SNS that says “Hey! A new domain has just been added!”. Not literally, but theoretically, that’s what it means. And we will have our Python listener listening to our SNS that will get notified when a new domain is added. For the allowed_domains list, this is not a list that our auto-ssl manages, we are just using this for our allow_domain function in our Nginx configuration. In that function, we can just check if the domain exists in the list, just allow a certificate for that.
Since we are building a CDN and these containers will be running in different regions. So when multiple instances are running on different regions than for scalability and integrity, every instance will have listeners that will get the message and update their respective lists in Redis servers.
We will be using SNS for one more thing. Whenever a new certificate is created in our Redis, we will push a message with the certificate to our SNS that a new certificate is added. This is helpful for data integrity when there are multiple docker containers running on different regions, each task will listen to this message and add the certificate to their respective Redis servers.
We will have one more listener that will add the certificate in our database against that domain so we don’t generate it multiple times. This practice will be helpful as when an instance restarts and Redis gets cleared so there will be no certificates for any domain in Redis. We can then grab certificates from our database and populate them in Redis. Don’t worry I’ll be explaining all this in the next sections as well.
This will also be the same for when a domain is removed. We will just publish a message with the domain that is removed so all those instances that will be listening to SNS will remove that domain from their allowed_domains lists.
For cache purging, we will also be using SNS. We will just publish a message to SNS that purges all the cache you have, our listeners will just send a PURGE request to /purge route on their local IP 127.0.0.1 and the cache will be dropped. We do not need individual file cache purging at this point so we kept it simple.
SNS is basically the spinal cord for our task that will be communicating between all the running instances and I mentioned SQS in the tools as well so what’s the purpose of SQS. SNS is enough if you have instances in the same region, because SNS can’t communicate across regions in AWS, we will be using SQS for that purpose. Our SQS will be listening to SNS triggers and the SNS in different regions will be listening to SQS of other regions. That way, we can send domains and certificates to the instances running in different regions.
In simple words, we will have separate SNS’s for each region but one SQS for all the regions.
All SNS’s will trigger that SQS and all SNS’s will be listening to that SQS. We will also have our listeners listen to SQS so that we can get direct messages from the SQS and we don’t have to wait for the SNS to pick up that message from SQS. So all these means is that our SQS will be listening to the same SNS and then pushing the message to that SNS. We have to do this because of cross-region compatibility.
Suppose a certificate is added to the Redis in region A, this will push a message into the SNS of the same region. Pushing messages into the SNS will trigger a message in SQS that will be global. When SQS is triggered, it will trigger all the SNS’s listening to that SQS, including the one which just pushed the certificate into SQS. This will create confusion to our listeners that 1 message will be picked up multiple times so this will cause many operations in Redis. There are some easy steps to handle this situation:
- Create SNS and SQS in a way that they will automatically destroy messages after 2-3 minutes. Don’t worry, the listeners will get these messages before they are destroyed.
- Just create a list in Redis that will have all the IDs of messages. So whenever the listener picks up a message just check for its ID in Redis. If it exists, leave it. And when this list has more than 50 or 70 items you can just clear that because until that limit is reached, all the previous messages from the SQS will be deleted.
Redis’s listener container
For this container, we will be using a python:3.8-slim-buster image. This container will be using Redis’s Publish-Subscribe functionality. You can read about it here. We will subscribe to Redis’s Keyspace Notifications. This will provide us with notifications each time a key is added or removed from our server. We will be using this service to detect any certificate addition and then push this certificate to our SNS. As discussed above, the SNS will then notify other SNSs of other regions through SQS as well as to our listener in a separate server that will add that certificate into our database. One more thing this container will be doing is on start-up of this container, we will check for any keys in our Redis server. If no key is found, we will push a message in SNS to initiate a re-populate mechanism.
As discussed earlier, we will be storing certificates in our database too. So if due to any reason, our Redis is rebooted, our Redis listener will trigger a notification to re-populate. This notification will be listened to by our API server listener, which will then get all the domains from our database and their respective certificates and push them into the SNS. Then our SQS listener container will get these messages and will populate Redis with all the domains and certificates.
For the Dockerfile of this container, we will just install Redis in Python and just run the subscriber file.
SQS listener Docker container
This is one of the most important containers in our task. It will be listening to messages in our SQS. It will have the following purposes:
- Listen for certificates from SNS. Whenever a new certificate is added our SQS will be triggered with that certificate. We will just get the certificate from the message, get its domain, and check if a certificate with that domain already exists. If not, just add the certificate against that domain in our Redis server.
- Listen for allowing domain messages. Whenever a new domain will be added, SQS will get the message with that domain, we will just get the domain and add it to our Redis server.
- Listen to cache purge messages. Whenever this message is picked, just send a purge request to 127.0.0.1.
- Same as allow domain, it will remove the domain whenever a removed domain message is received.
Additionally, we will be maintaining an already_processed list in our Redis that will contain all the IDs of SQS messages already received.
Those were the main four containers of our task. This is the most important phase of our task. Our next phase is to deploy all these containers in our ECS Fargate and get them working for us.
Setting up AWS Fargate
AWS Fargate is a serverless, pay-as-you-go type compute engine. It is compatible with ECS and EKS. We will be using ECS for our task. It lets you deploy your application without worrying about scalability and integrity. Its services will always be managing your application from starting it for the first time, shutting it down and then restarting it. If your load increases, Fargate’s auto-scaling service will automatically create new instances of your application to cope with the issue.
Before deploying your containers, you will have to build all of them and then push them to Docker Hub so that ECS can fetch them from there and set up automatically for you.
For deploying our containers to Fargate we will simply create an ECS cluster. After that, we will create a Task Definition. In the task definition, we will set up all our containers. Remember you should have appropriate permissions to set up task definition. Task definition Role should be “ecsTaskExecutionRole”. Network mode should be “awsvpc”.
Setup AutoSSL container
For setting up an auto-ssl container, first, you have to create a Load Balancer that. Our load balancer will be a Network Load Balancer that will have two Target Groups. One for port 80 and one for port 443. For help creating load balancers, you can use this. Always remember that the VPC and subnet that we will be using to create our ECS service and Load Balancers will be open externally. They should receive requests from outside. While creating Target Groups for your load balancer, just skip the IP addresses part. Do not add any value there. For security groups, they should allow requests from 80 and 443 ports. For more information about ECS service load balancing, you can get it from here. Additionally, your network load balancer should be internet-facing and it should have an elastic IP address associated with it.
After the load balancers, now let’s configure the container in the task definition. For this container, there is not much configuration needed, just pass in the image path from Docker Hub. Assign appropriate memory limits and CPU units. The recommended size for CPU units is 256 and for the memory limit, it should be 1024. Open ports 80 and 443 for this container. Add appropriate environment variables if you have any. And you will be good to go.
Setup Redis container
Setting up a Redis container is very simple. Just enter the image path, assign appropriate CPU units and memory limit and you will be good. But remember, you will have to add UIimit of “nofile” type with soft limit of 1024 and hard limit of 10064.
Setup Redis and SQS listener containers
Setting the listeners up is also the same for the above containers. Just add an image path, assign a memory limit and add appropriate environment variables that you are using in the code.
After creating the Task definition, you have to create a service with that task definition. Now a problem we faced here was that the AWS web console doesn’t let you assign load balancers while creating the service. But don’t worry, aws cli does that. You can use Cloud shell to use aws cli and create services from there. Look here for references. You just have to pass the load-balancers argument, which is an array that takes the target group ARN, container name and container port. You just have to assign appropriate target groups to appropriate ports. The container name will be the same that you used while setting up the auto-ssl container in the task definition.
After creating the service, you can find the service in the ECS console. Just edit that and in the AutoScaling section, add appropriate auto-scaling options for your tasks. After the service is saved, it will automatically get information from its configuration and set up instances for you. You will just have to wait for a few minutes until the ECS service does its magic.
Now for the final part, you can just create an A-record in your Route 53 for the IP address of your load balancer. You can use any name like “cdn.yourdomain.com”. And then when you point any domain to cdn.yourdomain.com, see our custom CDN doing its magic. It will automatically generate a certificate and you can access your site with HTTPS.