5 things you may not know about AWS IAM
When it comes to AWS IAM I recall a time, after less than 2 years of experience, when I told myself: “that’s it, I have nothing more to learn about how it works”. And since then, I learned new things and told myself the exact same thing over and over again.
AWS IAM is the gift that keeps on giving, so here are the TOP5 IAM facts that took me ages to figure out! And you, do you know them all?
NB: In this blog post, I assume that you already have a good working knowledge of AWS IAM. Terms like identity-based policies, resource-based policies, Service Control Policies (SCP), Statements, etc… will be thrown around with little to no explanation.
SCPs are not inherited like you would expect them to be
I learned this one the hard way: by breaking 200+ AWS accounts of my company’s AWS Organization, a few years ago, during the 25 very long minutes it took me to figure out my mistake.
As an illustrative example, imagine you want AWS accounts of your organisation to use only the EC2 service, except for some AWS accounts that can also use S3. Given this use case, you could be tempted to use a SCP allowing only EC2 at the root of your AWS Organization, with an Organizational Unit (OU) allowing S3 (with the appropriate SCP), like that:
NB: DO NOT do that
After all, the AWS Organizations console seems to make it very clear that there is inheritance between the OUs. After creating this configuration (please don’t), you look at the Service Control Policies of ou-allow-s3 and you see that:
Apparently, ou-allow-s3 inherits the only-ec2 SCP from Root, as expected. So you can legitimately assume that any account placed inside this OU (such as AWS account #3) will have access to both EC2 and S3, right?
But the reality is that you have access to neither:
In fact, you don’t have any access at all in this account. As often, the console display is misleading: in reality, there is no inheritance at all, that is just not how SCP works.
The evaluation of AWS Organizations Service Control Policies works as follows:
- For any given API call, consider all the SCP directly attached to the AWS account itself and verify that the call is allowed; then
- Consider all the SCP directly attached to the parent OU and verify that the call is allowed; then
- Repeat until there is no more parent (the root of the Organization has been reached).
Given that evaluation process, we understand why no service is accessible in the AWS account #3 of our example schema:
- If we try to access something that is not S3, the call is denied because at the level of ou-allow-s3 only S3 is allowed by the SCP;
- If we try to access something that is not EC2, the call is denied because at the level of the root only EC2 is allowed by the SCP.
And of course, any AWS API call will fall either into the “not S3” and/or the “not EC2” category and therefore will be denied.
If you really want this use-case to work, you must restrict services as you go down in the hierarchy, which is very counter-intuitive:
But intuitive or not, this is just how AWS Organizations SCP works.
Note that you now understand why AWS automatically directly attaches the default FullAWSAccess SCP to every AWS accounts and every OU of your AWS Organizations and then asks you to work by only adding SCP that Deny things: it forces you to conceive your AWS Organization by thinking “further down further restrict”; which is the only way it can work anyway.
Resource policies can give permissions by themselves
You have an AWS account with an IAM Role and a S3 bucket. The IAM role is called role-without-policy and does not have any policy attached or inlined. The S3 bucket has a bucket policy that gives role-without-policy the permission to call s3:GetObject.
Question: in that situation, can you successfully perform a GetObject on the bucket using the role?
In that particular case, yes you can!
But if I ask in general if a resource-based policy alone can give permissions, it depends. If you look at the IAM evaluation process in the AWS documentation, you find this schema:
It is not very helpful for our question because an “Allow” inside a Resource-based policy leads to an orange box. The AWS documentation provides a confusing table to explain this “orange box”, which can be translated into this decision flowchart:
First note that the behaviour depends on whether or not the Principal making the request is in the same AWS account as the Resource.
If the Principal and the Resource are in different AWS accounts, the resource-based policy of the Resource is necessary but not sufficient to give permissions.
But when the Principal and the Resource are in the same AWS account, things get interesting and depend on the nature of the Principal (IAM User, Role session or Federated session) and how it is referenced in the resource-based policy (what Principal ARN is written in the policy). If the resource-based policy references an IAM User or an IAM Session (assumed-role for example), it will even bypass the boundary and session policies evaluation!
NotPrincipal evaluation may not do what you expect
You have a S3 bucket example-bucket filled with security information and you want to ensure that only the IAM role role-security is allowed to access that bucket. As you are an advanced AWS IAM user, you know about the possibility to use NotPrincipal in a policy statement, so you write a bucket policy saying:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"NotPrincipal": {
"AWS": [
"arn:aws:iam::123456789012:role/role-security"
]
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::example-bucket/*"
},
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::123456789012:role/role-security"
]
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::example-bucket/*"
}
]
}
Seems legit, right? But when you try to perform a GetObject using the role-security role, it fails with an explicit deny.
Well, when evaluating permissions for a Session (an assumed-role), the IAM engine checks for two distinct Principal ARNs:
- The role ARN:
arn:aws:iam::<account>:role/<role-name>
- The session ARN:
arn:aws:sts::<account>:assumed-role/<role-name>/<session-name>
Therefore, the Deny statement of the bucket policy with:
"NotPrincipal": {
"AWS": [
"arn:aws:iam::123456789012:role/role-security"
]
}
ignores the Role ARN, as expected, but matches the Session ARN and applies. That’s why it does not work!
If you want this use-case to work, you must choose in advance a session name, let’s say secu, and write the following bucket policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"NotPrincipal": {
"AWS": [
"arn:aws:iam::123456789012:role/role-security",
"arn:aws:sts::123456789012:assumed-role/role-security/secu"
]
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::example-bucket/*"
},
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::123456789012:role/role-security"
]
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::example-bucket/*"
}
]
}
As long as you assume the role with secu as session name, it will work as expected. Of course, it is not convenient to have to set the session name in advance, but unfortunately wildcards are not allowed in Principals so there is no alternative.
A permission can be granted by a combination of statements
In order to introduce the concept, let’s begin with a question. You are using an IAM role which only has this policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:*:*:instance/*"
}
]
}
Can you use RunInstances to launch a new EC2 instance?
Since I asked, you probably guessed that there is a catch! No, you can’t launch an EC2 instance with this policy alone. If we read the AWS documentation on EC2 actions, we can see for each action what resources are expected and which ones are mandatory (they end with a star). For RunInstances, there are multiple mandatory resources:
- image
- instance
- network-interface
- security-group
- subnet
- volume
In order for a policy to grant the RunInstances permission, it must allow it on each of these resources, therefore our previous example is not complete.
In order to grant RunInstances, the three following policies are valid:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "*"
}
]
}
Of course, simply using a wildcard as Resource will work, as it matches all resource types.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*:*:instance/*",
"arn:aws:ec2:*:*:image/*",
"arn:aws:ec2:*:*:network-interface/*",
"arn:aws:ec2:*:*:security-group/*",
"arn:aws:ec2:*:*:subnet/*",
"arn:aws:ec2:*:*:volume/*"
]
}
]
}
Also, listing every mandatory resources ARN format will work.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*:*:instance/*",
"arn:aws:ec2:*:*:image/*",
"arn:aws:ec2:*:*:network-interface/*"
]
},
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*:*:security-group/*",
"arn:aws:ec2:*:*:subnet/*",
"arn:aws:ec2:*:*:volume/*"
]
}
]
}
And finally, listing mandatory resource ARNs in different statements also works! You could argue that this last one is just a pointless curiosity, but there is a family of use-cases where it is unavoidable: tags conditions!
Let’s imagine you want to allow RunInstances if and only if the AMI is tagged “ccoe-approved: true”. You could naively write this policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/ccoe-approved": "true"
}
}
}
]
}
But it will fail! For this statement to apply, you would need to tag with “ccoe-approved: true” every mandatory resources for RunInstances (the subnet, the security group, etc…), but that is not what you want. You want the tag to be required only on the AMI. Therefore you have to write this policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:*:*:image/*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/ccoe-approved": "true"
}
}
},
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": [
"arn:aws:ec2:*:*:instance/*",
"arn:aws:ec2:*:*:network-interface/*",
"arn:aws:ec2:*:*:security-group/*",
"arn:aws:ec2:*:*:subnet/*",
"arn:aws:ec2:*:*:volume/*"
]
}
]
}
And it works as expected! You need to split the permission in two separate statements:
- one with a condition to allow RunInstances on the image resource only if it is adequately tagged;
- one to allow RunInstances on the other mandatory resources without condition.
Of course, more advanced use-cases could require to split the permission into even more statements.
KMS grants are like detached resource policy statements
There is a particular use-case that is very enlightening on the profound nature of KMS grants. Before we dive into it, let’s do a quick reminder about grants.
Imagine that you have an EC2 instance with an EBS volume encrypted by some KMS key. I suppose you know that you must have permissions on the KMS key in order to start the instance. But there is something strange here: it is not really you who will decrypt the EBS volume, it is the EC2 instance, right? So how does it work?
When you use StartInstances, the EC2 service will forward a request to KMS in your name. But not a Decrypt: a CreateGrant! By starting the EC2 instance, you will also create a KMS Grant for the instance, giving it the permission to call Decrypt, and then the instance itself will call Decrypt. This sequence is clearly visible in CloudTrail:
It is also possible to list currently existing grants on a KMS key, but only through the CLI (or some SDK). We can see the resulting grant from the CreateGrant of the previous screenshot:
GranteePrincipal is the Principal being granted the Operations, here the EC2 instance is granted the permission to Decrypt, with a constraint on the EncryptionContext (grant constraints are like statement conditions).
Also note there is a RetiringPrincipal that has the permission to remove the grant, here it is the EC2 service.
There is also info about the IssuingAccount, which is just my account here. We will come back to this one 😉
Now it is time to dive into a specific use-case. You have a KMS key in the Security account, and you want to use it to protect EBS volumes in the Project account for EC2 instances launched by an Autoscaling Group (ASG).
Note that it is not even far fetched: KMS keys centralization in a central security account is not a recommendation I would make, but it is reasonably common nonetheless; and ASGs are of course used a lot, with good reasons.
So, how do we make it work? KMS Keys have a resource-based policy: the Key policy. We can use it to give permissions to the Project account, but we are in a cross-account scenario: a resource-based policy cannot give permissions by itself, the identity-based policy of the ASG in Project account must also allow KMS operations. And there we got a problem: autoscaling groups use an IAM service-role that does not have any KMS permission and we cannot change its policies! What can we do? Grant.
Here is how we do it with the Security account being 111122223333 and the Project account 444455556666:
First, we use the Key policy to allow the Project account to manage grants on the key by adding this statement:
{
"Sid": "Allow grant",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::444455556666:root"
},
"Action": [
"kms:CreateGrant",
"kms:RevokeGrant",
"kms:ListGrants"
],
"Resource": "*"
}
NB: We only need CreateGrant, but List/Revoke eases the cleanup after your test.
Then, we use a CLI AccessKey/SecretKey of an IAM User or Role inside the Project account that also has the CreateGrant permission on its identity-based policy. We are able to use CreateGrant cross-account because both our identity-based policy and the resource-based policy of the key allow it.
Our CLI interaction looks like that:
$># We are in the Project account
$> aws sts get-caller-identity
{
"UserId": "AIDAEXAMPLEEXAMPLE",
"Account": "444455556666",
"Arn": "arn:aws:iam::444455556666:user/jrodon"
}
$># From the Project account we create the grant
$> aws kms create-grant --key-id arn:aws:kms:eu-west-1:111122223333:key/51b0ad7a-aaaa-bbbb-cccc-dddddddddddd --grantee-principal arn:aws:iam::444455556666:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling --operations "Decrypt" "GenerateDataKeyWithoutPlaintext" "DescribeKey" "CreateGrant"
{
"GrantToken": "<blablabla>",
"GrantId": "aabbccddeeff"
}
$># We can see it
$> aws kms list-grants --key-id arn:aws:kms:eu-west-1:111122223333:key/51b0ad7a-aaaa-bbbb-cccc-dddddddddddd
{
"Grants": [
{
"KeyId": "arn:aws:kms:eu-west-1:111122223333:key/51b0ad7a-aaaa-bbbb-cccc-dddddddddddd",
"GrantId": "aabbccddeeff",
"Name": "",
"CreationDate": "2023-07-25T14:16:43+02:00",
"GranteePrincipal": "arn:aws:iam::444455556666:role/aws-service-role/autoscaling.amazonaws.com/AWSServiceRoleForAutoScaling",
"IssuingAccount": "arn:aws:iam::444455556666:root",
"Operations": [
"Decrypt",
"GenerateDataKeyWithoutPlaintext",
"CreateGrant",
"DescribeKey"
]
}
]
}
Pay attention to the IssuingAccount of the grant: because we created it using a principal of the Project account, the IssuingAccount is the Project Account! Therefore, from the grant perspective, it is no longer a cross-account scenario: the principal being granted is in the same AWS account as the grant itself! And that’s it: as long as this grant exists, it is sufficient to give the ASG role permissions to use the KMS key (and launch encrypted EC2 instances), even if the role itself does not have any KMS permissions!
And that was my TOP5 non-obvious facts about AWS IAM!