Thumbnail for 1. Roadmap for backend from first principles by Sriniously

1. Roadmap for backend from first principles

Sriniously

31m 25s5,368 words~27 min read
Auto-Generated

[0:00]Backend engineering is a very wide scope. And when I say backend engineering, it is much more than building a set of crud APIs. The way I think backend engineering is about building reliable, scalable, fault tolerant, and maintainable code bases, and efficient systems. And if one were to start today to learn backend development, there are a lot of resources. And at least 1000 resources. But how do you decide what to learn? How do you prioritize? How do you see the big picture? When and how all these different concepts come together? And this is the reason it takes people years to get their head around a lot of these concepts and principles. Because people primarily start with a limited scope of training, whether it is from a college or a boot camp or a simple crud API course. And they eventually build on top of that with trial and error and with help of other developers over time. Now, I am a backend engineer and I have faced this struggle initially when I started out. I had to constantly search for resources, learn from other developers, I have read a lot of books on backend development, I have studied hundreds of open source code bases to see how people in the industry are building stuff. And it was a very time-consuming procedure. And the second problem is people start with backend development from a particular language or framework's point of view. It could be Express or Spring Boot or Ruby on Rails, and the problem with that is you look at the problems that you solve with the lens of your particular language and ecosystem. And there are blind spots in that. Let's imagine you have to switch to a different language. Let's say you are working with Ruby on Rails for years and one day your company decided to migrate to Golang for performance reasons. In that situation, how much of your knowledge can you transfer if you don't understand the underlying systems? So, here I have decided to put together a comprehensive list of videos, which are based on foundational concepts of a backend system. And these are from various books that I've read over the years, the open source code base. We'll start with a very high level understanding of how backend systems work behind the scenes. We'll look at how requests from browser flows through different hops, the network firewalls over the internet and how it is routed to our backend server that is situated in a remote AWS server and how it responds to that request and we'll look at what the response looks like. And that should give us a pretty vivid idea of how systems communicate, how a client communicates with a server and how the server responds. From there we'll move to understanding HTTP protocol. What is the role it plays and how the communication is established through HTTP and how HTTP raw messages look like and what are the HTTP headers, what are the role of the headers, the different types of headers like request headers or representational headers, general headers and security headers.

[3:17]And we'll look at different types of HTTP methods like get method, post, put, delete and when to use them and what are the semantics and what are the principles behind them. We'll look at what is the course flow and how does it work, we'll look at how simple request differs from a pre-flight request and how a pre-flight request flow looks like from our browser to the server and back to our browser. We'll look at HTTP responses, the structure of it and different status codes that server returns and when to return which type of code and what are the most commonly used HTTP status codes. Then we'll look at HTTP caching, what are the different types of caching techniques using HTTP, we have ETags, we have max age headers. Then we'll look at the differences between HTTP 1.1 and HTTP 2.0 and HTTP 3.0 and what are the differences between them. We will look at how content negotiation looks like between client and server using different headers, we'll see how persistent connections work in HTTP. We'll look at HTTP compression, different types of compression techniques like Gzip and deflate and BR and which is the commonly used technique. We'll see the security aspect of it, the SSL, TLS and HTTPs. Then we'll move onto routing. How routing maps URLs to server-side logic and what is the connection between routing and HTTP methods, different components of routes like path parameters and query parameters, different types of route, static routes, dynamic routes, nested routes, hierarchical routes, catch all, wildcard routes and regular expression based routes. How to do API versioning using HTTP, different types of versioning techniques. We'll see what is the best way to deprecate a route and what are the best practices in the industry. We'll look at the benefits of route grouping and how it helps with versioning permissions and shared middleware. We'll see how to secure routes, how to optimize route matching performance. Then we'll move onto serialization and deserialization. This basically means how before sending it over to the network, our server translates the data into a particular format. And after receiving the data from, uh, let's say client from over the internet, and how, and how it translates the data received from the client over the internet to its own native format, that is called deserialization. And we will see what is the need of it and how it helps with the interoperability standard. The different formats that are used in serialization and deserialization, we have text-based formats, which is JSON or XML, we have binary formats, which is Protobuf and what are the performance differences between these two and when to use which one. We'll look at how different programming languages implement serialization and deserialization. We will explore a popular text-based format for serialization and deserialization, which is JSON, the structure of JSON, the different data types like strings, numbers, booleans, arrays and objects, how serialization of nested objects and collections are handled in JSON. How deserializing into data structures, the native data structures works like in let's say Python dictionary or Golang structs or JavaScript object. What are the common errors while dealing with JSON like handling missing or extra fields, dealing with null values or date serialization issues and time zone issues. And how to implement custom serialization while before sending or serializing data into JSON. We'll look at error handling in serialization and deserialization, for example, invalid data, data conversion errors, unknown fields. Look at the security concerns of it like injection attacks, why to do validation before deserialization, and validating JSON schemas before processing data using JSON schema validation. We'll see the performance aspect of it like reducing the serialized data through compression and eliminating unnecessary fields. Like serialization performance between text-based and binary formats, like JSON versus Protobuf, the trade-offs between readability and performance because in text-based format you can easily check the payload and see. That does not work the same in binary formats, but binary formats are faster, so when do you use a binary format and when do you use a text-based format, that is a valid trade-off. Then we'll move to authentication and authorization, why do we use it, different types of authentication like stateful, stateless, we have basic authentication, bearer token authentication, we'll look at sessions, JWTs, cookies. We'll deep dive on OAuth protocol and OpenID Connect, we'll see how API keys work, how multifactor authentication work, what is salting, hashing and different cryptographic techniques used. In authorization, we'll explore ABAC, RBAC, ReBAC, we'll look at what are the best practices in security like securing cookies, avoiding CSRF, XSS, MITM, like audit logging, which basically means recording authentication and authorization events for audits and monitoring failed login attempts, privilege escalation and access to sensitive resources. We'll look at obfuscating authenticated related error messages, preventing information leakage to attackers through detailed error messages. Like handling edge cases, for example, consistency in responses across different failure modes, like rate limiting and account lock out. We'll see how to avoid timing attacks, for example, attackers can exploit time differences in error responses to infer valid credentials. For example, an error for a wrong password might take longer than an error for a valid username because to check for a password, you have to do some kind of hashing, or some use some kind of cryptographic technique, which takes a little bit of time. So people can generalize the tiny bit of timing difference between them and guess passwords, even though that's very difficult, but we don't want to keep any security halls. Then the next topic we'll explore is validation and transformation, different types of validation like syntactic validation, for example, checking whether a string is an email or not, or whether it is a valid phone number or not, or whether it is a valid date format or not. There is semantic validation, for example, a date, a date of birth cannot be in the future, or the age of a person should be between 1 and 120. These are called semantic validations. Then we have type validation, for example, checking the input values match expected types, whether it is a string or not, whether it is an integer or not, whether it is an array or not, whether it is an object or not. These types of checks are called type validation. We'll see what are the best practices for validation, what is the difference between client-side validation and what is the need of server-side validation, the importance of server-side validation, even if client-side validation is already implemented. Because client-side validation improves user experience by providing instant feedback, but server-side validation is the true security implementation because that is the gateway to your business logic. We'll look at the importance of failing fast by reducing unnecessary processing by returning early. And we'll look at why to keep consistency between frontend validation and backend validation. Then there is transformations, for example, typecasting, converting string to number or number to string, because in query parameters or path parameters, whatever we receive is a string, but let's say we are expecting an ID field, which is a number. So before we send it to our handlers, we have to convert that string into a number. So that step is called a transformation, which we have to take care of in our validation pipeline. And different date formats, for example, the frontend might send a different format, or we might be expecting a timestamp, so that also has to taken care of in the validation pipeline. Then there is normalization, for example, converting an email to lower case, or trimming white space from a string, or adding country code to a phone number. These are called normalizations. Then there is sanitization for security issues, for example, we have to sanitize a string that is submitted by the user to prevent attacks like SQL injection. Then there is complex validation logic, for example, relationships. Let's say user submitted a form and it has two fields. One is partner's name and the second is married, which is a boolean, true or false. So a partner field might only be required if the married is true. So that is a conditional validation.

[11:21]So these kinds of checks we have to do. Then there is chain validation like converting a string to lower case, then removing special characters, and then checking its length. Then we'll look at error handling in validation, like sending meaningful error messages to frontend, so that the user can fix them. And aggregating all validation errors in one response for client-side display, or off-skating error messages. So instead of saying invalid password, we'll have to say invalid credentials to prevent different types of attacks. Then we will see how to gracefully handle failed transformations, for example, an invalid JSON and a failed date conversion and how to let the user know in a meaningful message. Then we'll look at the performance tradeoffs of validation and how to optimize it by returning early, avoiding redundant validation. Then our next topic is going to be middlewares. We'll look at what is a middleware and when to use them, what are the common use cases of middleware. The role of a middleware in a request cycle, for example, a pre-request middleware or a post-response middleware, the flow of middlewares. For example, techniques like chaining, a middleware is executed in a sequence, passing control to the next middleware until request reaches its final handler. How to order middlewares appropriately, for example, we have to go in this order, we have to log the request, we have to check whether the user is authenticated or not, we have to do validation and we have to do route handling and then we have to do error handling. So this order matters in middleware flow, we'll see how the next function works in middlewares and exiting middlewares early. How middlewares can short circuit the request pipeline by handling 404 errors. We'll look at some of the common middlewares like security middlewares, which adds security headers like X content type or strict transport security or content security policy, or middlewares which add appropriate course headers to every single request or response. Or middlewares to avoid CSRF attacks, middlewares to rate limit. Then we have authentication middlewares to reuse the route protecting logic across our apps. Then we have logging and monitoring middlewares for request logging or structured logging for observability or easier debugging in production. Then we have error handling middlewares which catches and formats application level errors for consistent API responses. Then we have compression or performance related middlewares which basically compresses response bodies to reduce the size of data sent over the network. Then we have data passing middlewares, passing incoming request bodies like JSON, URL encoded forms and file uploads, handles multipart form data for the file uploads. Then we'll look at the performance and scalability aspect of middleware, like what are the best practices to keep middlewares lightweight and efficient, ensuring middleware is applied in the correct order, or how middleware order can affect the performance and security of the application. The next topic is going to be request context. Request context basically means the metadata that is often passed through application middlewares, controllers and services. It is kind of a request scoped state, right? The state is only valid for that request. So here we'll explore the lifecycle of a request, maintaining state for the duration of a request, sharing data across different layers of the application without coupling. How context provides a temporary request scoped state. We'll look at what are the different components of a request context. The request metadata, for example, the HTTP method, the URL headers, the query parameters and the body. And there is the session and user information. For example, in the authentication middleware, we fetch the user's information and then we add it to the request context. So for that request scope, the user's information is injected into the context. Then we have tracking and logging information like unique request IDs or trace IDs. Then we have request specific data like custom data injected during the request life cycle like caching data, permission checks. We'll look at what are the use cases, for example, authentication, rate limiting, tracing, logging. We'll explore the connection between middlewares and request context, we'll see what are the different types of timeouts, the request timeouts, custom timeouts, cancellation signals. We'll see what are the best practices like keeping it lightweight to prevent memory overhead, ensuring context data is cleaned up after request life cycle to prevent memory leaks. Avoiding tightly coupling components through context or over-relying on it for passing data. Then we'll move to handlers and controllers, the MVC pattern and what are handlers and controllers and services. The responsibilities of all of them and reducing code with middleware. Then we have centralized error handling in handlers and consistent success and error messages formats and how to implement them in controllers. Then we'll look at the different types of CRUD operations, like how CRUD operations map to HTTP methods and what are the common APIs associated with each method. For example, post method is usually used for creation and submissions and the status code is usually 201, created or a 400 if it's a bad request. And get requests are usually associated with fetching a list of resources or fetching a single resource and we have put and patch to update resources and delete to delete resources. We'll look at how to implement pagination and how to implement a search API, how to do sorting and how to do filtering. And we'll see what are the best practices, for example, strict validation, consistent response formatting, limiting payload, redacting sensitive fields, error handling, authentication and authorization. Then we'll explore what is a RESTful architecture and what are the best practices for implementing REST APIs. The principle of designing APIs around resources and sticking to HTTP semantics and best practices for filtering and pagination. What are the different types of versioning like URI versioning, header versioning, query string, media type. We'll see how to design APIs with OpenAPI spec in mind, we'll see content negotiation, capturing exceptions and providing meaningful messages. Supporting client-side caching, ETags and optimizing large requests and responses. After that, we'll move onto a very important topic which are databases. In databases, we'll look at relational and non-relational, what are the differences and to when to use which. We'll look at some of the theoretical concepts like ACID and CAP theorem, we'll take a look at basic querying and joins and database design best practices like schema design, indexing. We'll look at different optimization methods like query optimization, caching, connection pooling. Then we have data integrity like constraints and validations, transactions and concurrency. We'll see how ORMs work, whether to use an ORM, what are the tradeoffs and we'll look at what are database migrations. After that, we'll move on to business logic layer, which is also called BLL. What is the role of it, what are the different layers of a request cycle? For example, we have validation layer, we have routing, we have middlewares and we have handlers and controllers, which all of them fall under presentation layer because they deal with user's data. Whether it is accepting user's data or sending a user data. So those are part of our presentation layer. After that, we have business logic layer, which is the middle one, which deals with our core business logic. And after that, we have the data access layer, which deals with databases, performs querying or inserts or deletions and business logic layer uses the data access layer behind the scenes. We'll look at different design principles like separation of concerns, single responsibility, open close, dependency inversion. What are the components of a business logic layer? For example, we have services, we have domain models, which represent core entities like a user or an order. Then we have business tools, then we have business validation logic. We'll look at service layer design best practices. We'll look at how to handle errors properly and how to propagate those errors from our service layer to our presentation layer. After that, we have caching. We'll discuss what is the need of caching and how it differs from database persistence. What are the different types of caching, we have in-memory caching, browser caching, database caching and what is the need of client-side caching and server-side caching. The different caching strategies, for example, cache aside, write through, write behind or write back, read through, the different cache eviction strategies, for example, LRU, LFU, TTL and FIFO. Need for cache invalidation like manual cache invalidation, time to live invalidation or event based invalidation. The different levels of caching, level one which is in-memory, level two which is network distributed and there is the hierarchical caching, which combines level one and level two caching strategies where frequently used data is stored in a fast small cache, which is level one cache. And the less frequently used data is stored in a slower or large cache, which is the level two cache. We'll see how caching for web apps looks like, that is caching static assets or caching API responses using headers. We'll see how to cache with databases, for example, query caching, like storing the results of heavy joints in Redis and we'll look at cache hit and cache miss ratio and how to optimize them. After that, we'll move on to transactional emails, what are the use of them, what are the common use cases, the anatomy of a transactional email, the subject, the pre-header, the body, header, main content, CTA, footer, and how to personalize with different dynamic parameters. Then we have task queuing and scheduling, what are the common use cases, for example, queuing might be used for sending emails or processing image files and third-party integration like payment processing or web hooks, or offloading heavy computation like batch processing. For example, a user clicks a button to clear all my data. So to clear all the user's data, we have to call, we have to execute different queries for different tables to clear all the user's data and that might take some time. So instead of blocking the request, we return the response instantly and we trigger a background job by pushing into the task queue. We'll look at scheduling what are the use cases, for example, for example, running database backups, recurring notifications and reminders, data synchronization or maintenance related issues, for example, clearing logs or caches. The different components of a task queue, there is the producer, queue, consumer, broker, backend, the flow of a task dependency, for example, it might be chain dependency or it might have parent-child relationship. We'll look at task groups, executing multiple tasks concurrently and waiting for all of them to complete at the same time. We'll look at how to handle errors and implement retries in task queues. We'll look at task prioritization and rate limiting, for example, giving importance to tasks like payment processing before we process tasks like sending notifications. After that, we'll move onto Elasticsearch, why do we use Elasticsearch and how does it work behind the scenes, the different techniques that are used, for example, inverted index, term frequency and inverse document frequency, segments and shards. What are the use cases of Elasticsearch, for example, providing a type ahead experience or log analytics or social media search. For example, full-text search for user profiles, posts, comments, we'll see how to create and manage indexes, we'll see how to search and query different types of searching, basic searching, full-text search, relevance scoring. We'll see how to optimize search performance by tweaking text versus keyword fields, understanding analyzers and boosting and pagination. We'll take a look at some of the advanced search patterns, for example, filtering, aggregation, fuzzy search, then we'll see how Kibana works and how to use Elasticsearch in a user-friendly way. And different best practices, for example, defining field mappings explicitly, optimizing the number of shards, indexing data in batches and avoiding wild cards. Then we have error handling, the different types of errors in our apps, it could be syntax errors, runtime errors, logical errors, and different error handling strategies. For example, fail safe or fail fast, graceful degradation or prevention of errors. Different practices for error handling, for example, caching early, not swallowing errors, custom error types, failing gracefully, logging errors and using stack traces. We'll look at how global error handlers work, we'll see how to appropriately handle user facing errors, like providing friendly error messages and providing actionable feedbacks. We'll see the importance of monitoring, logging in error handling and different tools like Sentry or ELK stack and different error alerts like email based alerts and Slack based alerts. After that, we have config management, what is exactly config management and how does it help with flexibility and decouples environment specific settings from application logic and what are the use cases. For example, different environments using config management, safely managing sensitive data such as API keys, database passwords and private certificates. Dynamically enabling and disabling features without changing codebase. What are the best practices of config management, different types of configs, for example, static configs like DB credentials and API endpoints. Dynamic configs like feature flags, rate limits, and sensitive configs like credentials, tokens, secrets. And different sources of configs, for example, it could be .env file or a JSON or a YAML. And what are the differences between using environment variables versus command line flags versus static files. After that, we have logging, monitoring and observability, a very important topic. We'll see what are the differences between logging, tracing, monitoring and observability. The different types of logging like system logging, application, access, security logs, the different levels of logs like debug, info, warn, error, fatal. And we'll see the difference between structured logging and unstructured logging and the best practices for logging, like centralized logging, log rotation and retention. Contextual and meaningful logs and avoiding sensitive data like passwords and API keys. Then we'll look at monitoring, different types of monitoring like infrastructure monitoring, application performance monitoring, uptime monitoring. The different tools that are used in monitoring like Prometheus, Grafana and how to manage alerts and notifications by defining thresholds, creating alerts and avoiding alert fatigue by only creating actionable alerts and ensuring that alerts are meaningful and necessary. Then we'll take a look at observability, the three pillars of observability, which are logs, metrics and traces. The best practices around it, the security and compliance of log management. After that, we'll move on to graceful shutdown, why do we need graceful shutdown and how does it work behind the scenes. What are the different use cases, for example, you might need it when server restarts or or scaling in cloud environments or microservices or long-running jobs. How it works like signal handling, kill signal, SIGTERM, and SIGKILL. What are the different steps of graceful shutdown? For example, it starts with capturing a signal and then it stops accepting requests, then it completes any in-flight requests and then it closes external resources like database connections or any open files, etcetera, and at last it terminates the app. After that, we'll move on to security, the different aspects of security in a backend codebase, avoiding different security attacks like SQL injection, no SQL injection, XSS, CSRF, broken authentication, insecure deserialization. And principles of a secure software design, for example, least privilege, defense in depth, fail secure defaults, separation of duties, security by design. Then we'll look at the importance of input validation and sanitization and rate limits and content security policy, CORS and same side cookie, and the importance of monitoring events. After that, we have scaling and performance, the different metrics of performance like response time, resource utilization, identifying bottlenecks, caching and database optimization. For example, avoiding N+1 query problems and ensuring proper use of joints and using lazy loading where appropriate. Then we have using database indexes to speed up read operations on frequent query fields, like indexing foreign keys or search fields. How to process data in batches to minimize database load and improve performance for large data sets. How to avoid memory leaks like closing file handles, database connections or cleaning up after a long process. Minimizing network overhead by reducing payload size and using compression. We'll look at how to do performance testing and profiling. We'll look at some of the best practices for writing performant code like focusing on writing clear and maintainable code first, without premature optimization, and writing modular code to make it easier to optimize individual components without affecting the entire system. Ensuring that if a particular resource is under load or unavailable, the system degrades gracefully without crashing and how to offload non-critical tasks like sending emails or logging to background processes or task queues to free up resources for more critical operations. Then we have concurrency and parallelism. What are the difference between concurrency and parallelism? And how concurrency helps in IO bound tasks and how parallelism helps in CPU bound tasks. Then we have object storage and large files, we'll look at some of the common use cases when we use object storage like AWS S3 and how to manage large files with chunking and streaming. And we'll look at multipart file uploads, then we have real-time backend systems where we take a look at web sockets or server side events and pub sub architecture. After that, we have testing and code quality. Here we take a look at different types of testing, unit testing, integration testing, end-to-end testing, functional testing, regression testing, performance testing, load and stress testing, user acceptance testing, security testing. We'll take a look at what is test-driven development, how to automate tests in CI/CD environments, how to manage code quality with external linting and formatting tools. And what are the measures of code quality and coverage like quality metrics like cyclomatic complexity, which measures complexity of a function by counting the number of possible paths through the code. And we have maintainability index, which basically quantifies how easy it is to maintain a code base based on the complexity, lines of code and other factors. Then we'll take a look at a very interesting set of principles, which is called 12 Factor App. After that, we'll move onto OpenAPI standards, what is the need of these standards and why should we stick to it, what are the benefits of it, what are the use cases like documentation, automation, and the ecosystem surrounding it like Swagger CodeGen, Postman. And what is the history, the Swagger to OpenAPI transition, and what are the different versions that are currently active and what are the key concepts of OpenAPI documents. For example, there is API pass, the request and response definition, there is parameters, there is schemas, and what is the structure of an OpenAPI document, there is metadata, there is paths, there is components, there is security definitions, and there is responses. We'll see what are the new features of OpenAPI 3.0 and 3.1. What are the tools surrounding OpenAPI, for example, Swagger UI, CodeGen, Postman? What are the best practices like avoiding duplication and sticking to standards. We'll look at a very interesting development method, which is API first development. Where you define your OpenAPI standard or write your OpenAPI spec first and then you start creating the APIs. After that, we'll move on to webhooks, what are the use cases of webhooks, like sending notification, third-party integrations. What are the differences between API versus webhook for the same use case? For example, for API, we might have to use polling, which is client-side initiated compared to webhooks, which is pushing, which is server initiated. What are the key components of webhooks, for example, the webhook URL, event triggers, payload, HTTP method, the response handling? What are the best practices around webhooks, like webhook signature verification, like using HTTPs, and quick response, retry logic, logging. How to test webhooks with NGROK, real-world use cases like Stripe payment processing, GitHub webhooks, Slack, Discord, Twilio, etcetera. And at last, we'll take a look at what are some DevOps concepts that backend engineers should be familiar with. For example, some of the core concepts like continuous integration, continuous delivery, continuous deployment, the DevOps practices like infrastructure as code, config management, version control. The different tools surrounding DevOps, for example, creating containers with Docker, or orchestrating containers with Kubernetes, and CI/CD pipelines, and how to scale your service, horizontal scaling versus vertical scaling, and different deployment strategies like red-green deployment, rolling deployment, etcetera. And that's about it. This is all the concepts that we are going to cover in the next 30 or 40 videos. So stay tuned.

Need another transcript?

Paste any YouTube URL to get a clean transcript in seconds.

Get a Transcript