So we have an AWS EKS cluster built with AWS CDK and Python – AWS: CDK and Python – building an EKS cluster, and general impressions of CDK, and we have an idea of how IRSA works – AWS: EKS, OpenID Connect, and ServiceAccounts.
The next step after deploying the cluster is to configure the OIDC Identity Provider in AWS IAM and to add two controllers – ExternalDNS to work with Route53, and AWS ALB Controller to create AWS Load Balancers when creating Ingress in an EKS cluster.
For authentication in AWS, both controllers will use the IRSA model – IAM Roles for ServiceAccounts, that is, in the Kubernetes Pod with the controller we’ll connect a ServiceAccount that will allow the use of an IAM role with an IAM Policies with the necessary permissions.
The WorkerNodes autoscaling controller will be added later: previously, I’ve used the Cluster AutoScaler, but this time I want to try Karpenter, so I’ll make a separate post for that.
We continue to eat cactus use the AWS CDK with Python. It will be used to create IAM resources and deploy Helm charts with controllers directly from the CloudFormation stack of the cluster.
I tried to deploy the controllers as a separate stack but spent an hour or trying to figure out how to get the AWS CDK to pass values from one stack to another via CloudFormation Exports and Outputs, but finally, I gave up and did it all in one stack class. May be will try another time.
Contents
EKS cluster, VPC, and IAM
Creating a cluster is described in one of the previous posts – AWS: CDK and Python – building an EKS cluster, and general impressions of CDK.
What do we have now?
A class to create a stack:
... class AtlasEksStack(Stack): def __init__(self, scope: Construct, construct_id: str, stage: str, region: str, **kwargs) -> None: super().__init__(scope, construct_id, **kwargs) # egt AWS_ACCOUNT aws_account = kwargs['env'].account # get AZs from the $region availability_zones = ['us-east-1a', 'us-east-1b'] ...
The aws_account
is passed from the app.py
when creating an AtlasEksStack()
class object:
... AWS_ACCOUNT = os.environ["AWS_ACCOUNT"] ... eks_stack = AtlasEksStack(app, f'eks-{EKS_STAGE}-1-26', env=cdk.Environment(account=AWS_ACCOUNT, region=AWS_REGION), stage=EKS_STAGE, region=AWS_REGION ) ...
And we will continue to use it for the AWS IAM configuration.
We also have a separate VPC:
... vpc = ec2.Vpc(self, 'Vpc', ip_addresses=ec2.IpAddresses.cidr("10.0.0.0/16"), vpc_name=f'eks-{stage}-1-26-vpc', enable_dns_hostnames=True, enable_dns_support=True, availability_zones=availability_zones, ... ) ...
And the EKS cluster itself:
... print(cluster_name) cluster = eks.Cluster( self, 'EKS-Cluster', cluster_name=cluster_name, version=eks.KubernetesVersion.V1_26, vpc=vpc, ... ) ...
Next, we need to add the creation of OIDC in IAM, and the deployment of Helm charts with controllers.
OIDC Provider configuration in AWS IAM
We’ll use boto3
(this is one of the things that I don’t really like about AWS CDK – that a lot of things have to be done not with the methods/constructs of the CDK itself, but with “crutches” in the form of boto3
or other modules/libraries).
We need to get the OIDC Issuer URL, and get its thumbprint – then we can use the create_open_id_connect_provider
.
OIDC Provider URL can be obtained using boto3.client('eks')
:
... import boto3 ... ############ ### OIDC ### ############ eks_client = boto3.client('eks') # Retrieve the cluster's OIDC provider details response = eks_client.describe_cluster(name=cluster_name) # https://oidc.eks.us-east-1.amazonaws.com/id/2DC***124 oidc_provider_url = response['cluster']['identity']['oidc']['issuer'] ...
Next, with the help of the libraries ssl
and comhashlib
we get the thumbprint of the oidc.eks.us-east-1.amazonaws.com endpoint’s certificate :
... import ssl import hashlib ... # AWS EKS OIDC root URL eks_oidc_url = "oidc.eks.us-east-1.amazonaws.com" # Retrieve the SSL certificate from the URL cert = ssl.get_server_certificate((eks_oidc_url, 443)) der_cert = ssl.PEM_cert_to_DER_cert(cert) # Calculate the thumbprint for the create_open_id_connect_provider() oidc_provider_thumbprint = hashlib.sha1(der_cert).hexdigest() ...
And now with boto3.client('iam')
and create_open_id_connect_provider()
we can create the IAM OIDC Identity Provider:
... from botocore.exceptions import ClientError ... # Create IAM Identity Privder iam_client = boto3.client('iam') # to catch the "(EntityAlreadyExists) when calling the CreateOpenIDConnectProvider operation" try: response = iam_client.create_open_id_connect_provider( Url=oidc_provider_url, ThumbprintList=[oidc_provider_thumbprint], ClientIDList=["sts.amazonaws.com"] ) except ClientError as e: print(f"\n{e}") ...
Here, we wrap everything in a the try/except
, because during further updates of the stack boto3.client('iam')
sees that the Provider already exists, and it crashes with an error EntityAlreadyExists
.
Installing ExternalDNS
Let’s add the ExternalDNS first – it has a fairly simple IAM Policy, so we’ll test how CDK works with Helm charts.
IRSA for ExternalDNS
Here, the first step is to create an IAM Role that our ServiceAccount can assume for ExternalDNS, and which will allow ExternalDNS to perform actions with the domain zone in Route53 because now ExternalDNS has a ServiceAccount, but it gives an error:
msg=”records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403
Trust relationships
In the Trust relationships of this role, we must specify a Principal in the form of the ARN created by the OIDC Provider, in the Action – sts:AssumeRoleWithWebIdentity
, and in the Condition – if the request comes from a ServiceAccount that will be created by the ExternalDNS Helm-chart.
Let’s create a couple of variables:
... # arn:aws:iam::492***148:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/2DC***124 oidc_provider_arn = f'arn:aws:iam::{aws_account}:oidc-provider/{oidc_provider_url.replace("https://", "")}' # deploy ExternalDNS to a namespace controllers_namespace = 'kube-system' ...
The oidc_provider_arn
is formed from the variable oidc_provider_url
obtained earlier in response = eks_client.describe_cluster(name=cluster_name)
.
Describe the creation of a role using iam.Role()
:
... # Create an IAM Role to be assumed by ExternalDNS external_dns_role = iam.Role( self, 'EksExternalDnsRole', # for Role's Trust relationships assumed_by=iam.FederatedPrincipal( federated=oidc_provider_arn, conditions={ 'StringEquals': { f'{oidc_provider_url.replace("https://", "")}:sub': f'system:serviceaccount:{controllers_namespace}:external-dns' } }, assume_role_action='sts:AssumeRoleWithWebIdentity' ) ) ...
As a result, we should get a role with the following Trust relationships:
The next step is an IAM Policy.
IAM Policy for ExternalDSN
msg=”records retrieval failed: failed to list hosted zones: AccessDenied: User: arn:aws:sts::492***148:assumed-role/eks-dev-1-26-EksExternalDnsRoleB9A571AF-7WM5HPF5CUYM/1689063807720305270 is not authorized to perform: route53:ListHostedZones because no identity-based policy allows the route53:ListHostedZones action\n\tstatus code: 403
So we need to describe two iam.PolicyStatement()
– one for working with the domain zone, and the second for accessing route53:ListHostedZones
API call.
Make them separate, because for route53:ChangeResourceRecordSets
in the resources
we have to have restrictions on only one specific zone but for permission on route53:ListHostedZones
the resources
block should be in the form of "*"
. i.e. “all”:
... # A Zone ID to create records in by ExternalDNS zone_id = "Z04***FJG" # to be used in domainFilters zone_name = example.co # Attach an IAM Policies to that Role so ExternalDNS can perform Route53 actions external_dns_policy = iam.PolicyStatement( actions=[ 'route53:ChangeResourceRecordSets', 'route53:ListResourceRecordSets' ], resources=[ f'arn:aws:route53:::hostedzone/{zone_id}', ] ) list_hosted_zones_policy = iam.PolicyStatement( actions=[ 'route53:ListHostedZones' ], resources=['*'] ) external_dns_role.add_to_policy(external_dns_policy) external_dns_role.add_to_policy(list_hosted_zones_policy) ...
Now we can add the ExternalDNS Helm chart itself.
AWS CDK and ExternalDNS Helm chart
Here we use the aws-cdk.aws-eks.add_helm_chart()
.
In the values
enable the serviceAccount
, and in its annotations
pass the 'eks.amazonaws.com/role-arn': external_dns_role.role_arn
:
... # Install ExternalDNS Helm chart external_dns_chart = cluster.add_helm_chart('ExternalDNS', chart='external-dns', repository='https://charts.bitnami.com/bitnami', namespace=controllrs_namespace, release='external-dns', values={ 'provider': 'aws', 'aws': { 'region': region }, 'serviceAccount': { 'create': True, 'annotations': { 'eks.amazonaws.com/role-arn': external_dns_role.role_arn } }, 'domainFilters': [ f"{zone_name}" ], 'policy': 'upsert-only' } ) ...
Let’s deploy and look under ExternalDNS – we can see both our domain-filter
and environment variables for the IRSA work:
[simterm]
$ kubectl -n kube-system describe pod external-dns-85587d4b76-hdjj6 ... Args: --metrics-address=:7979 --log-level=info --log-format=text --domain-filter=test.example.co --policy=upsert-only --provider=aws ... Environment: AWS_DEFAULT_REGION: us-east-1 AWS_STS_REGIONAL_ENDPOINTS: regional AWS_ROLE_ARN: arn:aws:iam::492***148:role/eks-dev-1-26-EksExternalDnsRoleB9A571AF-7WM5HPF5CUYM AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token ...
[/simterm]
Check the logs:
[simterm]
... time="2023-07-11T10:28:28Z" level=info msg="Applying provider record filter for domains: [example.co. .example.co.]" time="2023-07-11T10:28:28Z" level=info msg="All records are already up to date" ...
[/simterm]
And let’s test if it’s working.
Testing ExternalDNS
To check – let’s create a simple Service with the Loadbalancer type, in its annotations
add the external-dns.alpha.kubernetes.io/hostname
to trigger the ExternalDNS to create a DNS record in the Route53:
--- apiVersion: v1 kind: Service metadata: name: nginx-service annotations: external-dns.alpha.kubernetes.io/hostname: "nginx.test.example.co" spec: type: LoadBalancer selector: app: nginx ports: - name: nginx-http-svc-port protocol: TCP port: 80 targetPort: nginx-http --- apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: nginx spec: containers: - name: nginx image: nginxdemos/hello ports: - containerPort: 80 name: nginx-http
Check ExternalDNS logs:
[simterm]
... time="2023-07-11T10:30:29Z" level=info msg="Applying provider record filter for domains: [example.co. .example.co.]" time="2023-07-11T10:30:29Z" level=info msg="Desired change: CREATE cname-nginx.test.example.co TXT [Id: /hostedzone/Z04***FJG]" time="2023-07-11T10:30:29Z" level=info msg="Desired change: CREATE nginx.test.example.co A [Id: /hostedzone/Z04***FJG]" time="2023-07-11T10:30:29Z" level=info msg="Desired change: CREATE nginx.test.example.co TXT [Id: /hostedzone/Z04***FJG]" time="2023-07-11T10:30:29Z" level=info msg="3 record(s) in zone example.co. [Id: /hostedzone/Z04***FJG] were successfully updated" ...
[/simterm]
And check the domain itself:
[simterm]
$ curl -I nginx.test.example.co HTTP/1.1 200 OK
[/simterm]
“It works!” (c)
All code for OIDC and ExternalDNS
All the code together now looks like this:
... ############ ### OIDC ### ############ eks_client = boto3.client('eks') # Retrieve the cluster's OIDC provider details response = eks_client.describe_cluster(name=cluster_name) # https://oidc.eks.us-east-1.amazonaws.com/id/2DC***124 oidc_provider_url = response['cluster']['identity']['oidc']['issuer'] # AWS EKS OIDC root URL eks_oidc_url = "oidc.eks.us-east-1.amazonaws.com" # Retrieve the SSL certificate from the URL cert = ssl.get_server_certificate((eks_oidc_url, 443)) der_cert = ssl.PEM_cert_to_DER_cert(cert) # Calculate the thumbprint for the create_open_id_connect_provider() oidc_provider_thumbprint = hashlib.sha1(der_cert).hexdigest() # Create IAM Identity Privder iam_client = boto3.client('iam') # to catch the "(EntityAlreadyExists) when calling the CreateOpenIDConnectProvider operation" try: response = iam_client.create_open_id_connect_provider( Url=oidc_provider_url, ThumbprintList=[oidc_provider_thumbprint], ClientIDList=["sts.amazonaws.com"] ) except ClientError as e: print(f"\n{e}") ################### ### Controllers ### ################### ### ExternalDNS ### # arn:aws:iam::492***148:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/2DC***124 oidc_provider_arn = f'arn:aws:iam::{aws_account}:oidc-provider/{oidc_provider_url.replace("https://", "")}' # deploy ExternalDNS to a namespace controllers_namespace = 'kube-system' # Create an IAM Role to be assumed by ExternalDNS external_dns_role = iam.Role( self, 'EksExternalDnsRole', # for Role's Trust relationships assumed_by=iam.FederatedPrincipal( federated=oidc_provider_arn, conditions={ 'StringEquals': { f'{oidc_provider_url.replace("https://", "")}:sub': f'system:serviceaccount:{controllers_namespace}:external-dns' } }, assume_role_action='sts:AssumeRoleWithWebIdentity' ) ) # A Zone ID to create records in by ExternalDNS zone_id = "Z04***FJG" # to be used in domainFilters zone_name = "example.co" # Attach an IAM Policies to that Role so ExternalDNS can perform Route53 actions external_dns_policy = iam.PolicyStatement( actions=[ 'route53:ChangeResourceRecordSets', 'route53:ListResourceRecordSets' ], resources=[ f'arn:aws:route53:::hostedzone/{zone_id}', ] ) list_hosted_zones_policy = iam.PolicyStatement( actions=[ 'route53:ListHostedZones' ], resources=['*'] ) external_dns_role.add_to_policy(external_dns_policy) external_dns_role.add_to_policy(list_hosted_zones_policy) # Install ExternalDNS Helm chart external_dns_chart = cluster.add_helm_chart('ExternalDNS', chart='external-dns', repository='https://charts.bitnami.com/bitnami', namespace=controllers_namespace, release='external-dns', values={ 'provider': 'aws', 'aws': { 'region': region }, 'serviceAccount': { 'create': True, 'annotations': { 'eks.amazonaws.com/role-arn': external_dns_role.role_arn } }, 'domainFilters': [ zone_name ], 'policy': 'upsert-only' } ) ...
Let’s go to the ALB Controller.
Installing AWS ALB Controller
In general, everything is the same here, the only thing I had to mess with was the IAM Policy, because if we have only two permissions for ExternalDNS, and we can describe them directly when creating this Policy, for the ALB Controller the policy must be taken from GitHub, because it is quite large.
IAM Policy from a GitHub URL
Here we use requests
(crutches again):
... import requests ... alb_controller_version = "v2.5.3" url = f"https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/{alb_controller_version}/docs/install/iam_policy.json" response = requests.get(url) response.raise_for_status() # Check for any download errors # format as JSON policy_document = response.json() document = iam.PolicyDocument.from_json(policy_document) ...
Here, we receive the policy file, form it in JSON, and then from this JSON we form the policy document itself.
IAM Role for the ALB Controller
Next, we create an IAM Role with the similar to ExternalDNS Trust relationships, only change its conditions
– specify the ServiceAccount that will be created for the AWS ALB Controller:
... alb_controller_role = iam.Role( self, 'AwsAlbControllerRole', # for Role's Trust relationships assumed_by=iam.FederatedPrincipal( federated=oidc_provider_arn, conditions={ 'StringEquals': { f'{oidc_provider_url.replace("https://", "")}:sub': f'system:serviceaccount:{controllers_namespace}:aws-load-balancer-controller' } }, assume_role_action='sts:AssumeRoleWithWebIdentity' ) ) alb_controller_role.attach_inline_policy(iam.Policy(self, "AwsAlbControllerPolicy", document=document)) ...
AWS CDK та AWS ALB Controller Helm-чарт
Now, install the Helm chart itself with the necessary values
– enable a ServiceAccount, in its annotations
specify the ARM role that was created above, and set the clusterName
:
... # Install AWS ALB Controller Helm chart alb_controller_chart = cluster.add_helm_chart('AwsAlbController', chart='aws-load-balancer-controller', repository='https://aws.github.io/eks-charts', namespace=controllers_namespace, release='aws-load-balancer-controller', values={ 'image': { 'tag': alb_controller_version }, 'serviceAccount': { 'name': 'aws-load-balancer-controller', 'create': True, 'annotations': { 'eks.amazonaws.com/role-arn': alb_controller_role.role_arn }, 'automountServiceAccountToken': True }, 'clusterName': cluster_name, 'replicaCount': 1 } ) ...
Testing AWS ALB Controller
Let’s create a simple Pod, a Service, and an Ingress which must trigger the ALB Controller to create an AWS ALB LoadBalancer:
--- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: nginx-ingress annotations: kubernetes.io/ingress.class: alb spec: rules: - http: paths: - path: / pathType: Prefix backend: service: name: nginx-http-svc-port port: number: 80 --- apiVersion: v1 kind: Service metadata: name: nginx-service annotations: external-dns.alpha.kubernetes.io/hostname: "nginx.test.example.co" spec: selector: app: nginx ports: - name: nginx-http-svc-port protocol: TCP port: 80 targetPort: nginx-http --- apiVersion: v1 kind: Pod metadata: name: nginx-pod labels: app: nginx spec: containers: - name: nginx image: nginxdemos/hello ports: - containerPort: 80 name: nginx-http
Deploy and check the Ingress resource:
[simterm]
$ kubectl get ingress NAME CLASS HOSTS ADDRESS PORTS AGE nginx-ingress <none> * internal-k8s-default-nginxing-***-***.us-east-1.elb.amazonaws.com 80 34m
[/simterm]
The only thing here that didn’t work the first time is the aws-iam-token
attach to the Pod: that’s why in the values
I’ve set the 'automountServiceAccountToken': True
, although it already has a default value true
.
But after several redeploys with cdk deploy
, the token was created and connected to the Pod:
... - name: AWS_ROLE_ARN value: arn:aws:iam::492***148:role/eks-dev-1-26-AwsAlbControllerRole4AC4054B-1QYCGEG2RZUD7 - name: AWS_WEB_IDENTITY_TOKEN_FILE value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token ...
In general, that’s all.
As usual with CDK it’s a pain and suffering due to the lack of proper documentation and examples, but with the help of ChatGPT and the tutorials it did work.
Also, it would probably be good to move the creation of resources at least to separate functions instead of doing everything with the AtlasEksStack.__init__()
, but that can be done later.
The next step is to launch VictoriaMetrics in Kubernetes, and then we will start working on Karpenter.