General practices for scaling a backend system
I recently met a client on Upwork and he asked me how would you scale a backend system based on Django Rest Framework and he asked me for general pointers, and though most of the below-mentioned points are based on our experience with DRF it is true for any other framework or language as they are mostly architectural things that matter in scaling
There is not a single way to achieve this. This is highly dependent on case to case basis as we have to see in detail what is causing the bottleneck and what potential bottlenecks there can be but according to our experience these are the first things we do as a general rule to improve the API performance
Cache, Cache, and Cache
Use the in-memory cache like Redis, Memcache, or other caching storage to cache the frequent requests. The database should only be hit as a last resort and most of the static and less changing requests should be served from the cache. Databases are a big choking point of any application and caching is the solution to that problem. Hitting the database not only takes more time but also puts extra strain on your database server which should always be avoided.
Denormalise the data where possible
We have always been told that normalization is good and we should normalize data as much as possible, but is it really? When databases were gaining traction, the storage was really expensive and it made sense to optimize the database to consume less space but in this day and age storage is pretty cheap and we should optimize our application to be processing intensive. Now I am not saying to denormalize everything if you are using a relational database but we should use pre-calculated aggregations and denormalization wherever it makes sense and helps reduce the joins in the application which are processing-intensive and can slow down apis. JSON fields in modern databases such as PostgreSQL are designed exactly for that purpose.
Keep the API responses short and concise
It may seem non-trivial at first but every bit matters. Do not append the full objects in the API responses where they are not required. Many times API frameworks like Django Rest Framework append all the related objects inside the API responses whereas you only need a primary key or something. It is always best to optimize your responses do not have extra fields that are not required. The larger the response, the more time it will take to download on your client devices and server to transmit. So you will save both bandwidths and speed up the response time
Indexing the DBMS
Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done. Indexing is an essential part of any good-performing database and it should be appropriately created on appropriate fields for faster retrieval. There are many index types and techniques that are explained here https://www.tutorialspoint.com/dbms/dbms_indexing.htm
In the simplest sense, sharding your database involves breaking up your big database into many, much smaller databases that share nothing and can be spread across multiple servers. It is also called horizontal partitioning. Sharding the database will reduce the index size and accessing the index much larger. Sharding is a really important concept and it is used in almost all the large scale applications
Use Read Replicas for write-intensive applications
Though Read Replicas are not specifically designed for performance improvements and it has its own issues like replica lag but it can significantly improve the performance of the application if you have to write an intensive application.
Load Balancers and Auto Scaling groups
Regardless of how well your code is written and how optimized your database schema is without the proper infrastructure, your application is bound to crash under stress loading. We always recommend using one of the top 3 cloud providers (AWS, Google Cloud, Azure) and use their infrastructure to scale your applications. The infrastructure needs to be set up with appropriate load balancers and auto-scaling groups that scale up with a surge of traffic and scale down when the surge is down because obviously, you want to save the costs as well
The above mentioned are just the broad points and each of these points needs a lot of attention and needs to be dealt with at granular levels.