How AWS Shared VPC could help you design a scalable cloud network ?
Networking is one of the most important part of a landing zone, and it’s very difficult to modify once it’s implemented. That’s why it’s important to design a solution that allows you to scale for years to come, so you’re not constrained.
The AWS Well Architected Framework recommends that you split your workloads into multiple accounts to address the pillars of operational excellence, security, reliability, and cost optimization. The approach is to isolate workloads and environments into different AWS accounts.
Designing a scalable network solution for hundreds or even thousands of AWS accounts can be difficult using traditional approaches. I’ll explain all the solutions, their benefits, their limitations, and why shared VPCs might be the best solution for this use case.
Traditional approaches to multi-account networking
The first solution that comes to mind when talking about a multi-account network is one VPC per account, connected by a central transit gateway, also known as « hub and spoke« . This allows routes to be defined between different VPCs so that some accounts can communicate with others. It’s also easy to connect a datacenter directly to the transit gateway using a VPN or Direct Connect.
The downside is that each time you create a new VPC, you must assign it a non-overlapping CIDR range and size it correctly, neither too small nor too large, to avoid wasting IP addresses.
This architecture isn’t very scalable either, since each connection to the transit gateway costs about $40 per month. In the diagram below, the cost would be 7x$40 = $280, and that’s just for 6 VPCs and one on-premise connection!
The transit gateway also has data transfer charges, so if you have two applications on different accounts that need to transfer a lot of data, you’ll have to pay for that.
As far as security is concerned, since security group referencing is not available through TGW (it’s currently on the roadmap on the AWS side), we have to put the CIDR ranges of the desired VPCs into the inbound security groups rules, which is not good practice. As with IAM, it’s recommended to follow the principle of least privilege in networking.
We can see that the best practices of the Well Architected Framework are difficult to follow with the traditional hub and spoke architecture. Increasing the number of accounts and VPCs will significantly increase the price of the network architecture while increasing its complexity.
How can Shared VPCs address these issues?
Shared VPC is a feature that AWS launched in 2018, and the concept is to share subnets from a parent account to child accounts. These child accounts can then launch resources by creating network interfaces in those subnets, but cannot change the VPC, subnets, NACL, and routing tables themselves, leaving configuration and routing to the parent account.
The Resource Access Manager (RAM) service is used to share resources between accounts. You can only share subnets with accounts in the same organization.
Because all resources end up in the same VPC, they can communicate with each other without further configuration, as the VPC uses the local default route.
It’s also possible to reference security groups between accounts in a shared VPC configuration by adding <account_id>/<security_group_id>
to any security group rule.
Since traffic from one account to another does not leave the VPC, you don’t pay any additional network charges, only the eventual inter-AZ network charges. It’s important to note that all AWS accounts have random availability zone mapping, so AZ A on account A may be AZ B on account B. When using a shared VPC, it’s important to take this into account to limit inter-zone network charges.
There are two main options for designing your shared VPC solution: different subnets per account, or the same subnets for all accounts.
The first option, using different subnets for each account, allows you to separate your workloads into specific CIDR ranges and use NACL features as they apply at the subnet level.
The second option, using the same subnets for each account, means that you don’t have to worry about defining a CIDR range for each workload. All accounts will share the same pool of IP addresses, so all you have to do is define a CIDR range large enough for your subnets, and all workloads will be able to get the number of IP addresses they need. You won’t be able to use NACLs to restrict traffic from one account to another because NACLs are defined at the subnet level. In the event of a network intrusion, it would be possible to determine which component was compromised by analyzing VPC flow logs or other network logs. Since you have different CIDR ranges, you know which IP belongs to which account.
The best of both worlds : Shared VPC + Transit Gateway
Why use either architecture when you can use both together? This is the best network architecture we’ve found for companies migrating to AWS with a large number of applications. It allows each workload to be distributed across different accounts, but deployed in the same VPC. It can connect an on-premises datacenter using site-to-site VPN or Direct Connect. Both environments can contact the datacenter because they are all connected to the transit gateway. You can create « black hole » routing rules to prevent one environment from reaching the others.
It is also possible to connect additional VPCs to the Transit Gateway, such as a shared services VPC that contains resources that do not belong to either environment, such as a Github or GitLab instance. This VPC could be connected to both environments when configuring TGW routing.
Limitations of Shared VPC
Shared VPCs have one major drawback: it is not possible to create a VPC endpoint on a subnet that does not belong to the account. In fact, creating a new route outside of a VPC is considered a privileged operation by AWS. Some services such as Amazon MQ and Amazon Managed Workflows for Apache Airflow (MWAA) use VPC endpoints, which means we can’t deploy them on shared subnets.
AWS released shared VPC support for MWAA in November 2023, which allows Airflow to be deployed on shared subnets. They’ve provided a way to create the endpoint in two phases, the first by AWS, then you have 72 hours to create the client side of the endpoint in the account that owns the subnets. This can be automated using EventBridge and a Lambda function.
Although AWS has released a solution for MWAA, it is still not possible to create Amazon MQ clusters in shared subnets. While waiting for similar functionality, we need to find other solutions to deploy MQ clusters in child accounts.
Since they require a VPC that belongs to the account, the solutions start by creating another VPC in the account that requires a non-overlapping CIDR range. We call this new VPC an « Owned VPC » because it is owned by the account.
We can then use VPC peering to connect the Owned VPC to the Shared VPC. VPC peering has no additional cost and supports security group referencing. We can then access our MQ cluster from the entire Shared VPC. Note that since VPC peering is not transitive, it would not be possible to access the MQ cluster from a datacenter or a Shared VPC connected to the Transit Gateway.
Conclusion
In summary, AWS shared VPCs provide a scalable, cost-effective solution for network management across multiple accounts.
By centralizing control and simplifying connectivity, shared VPCs streamline network management and promote agility and scalability in cloud environments. However, it’s important to understand the limitations and plan accordingly to ensure a robust and efficient network infrastructure.