In the last post, we compared kiam and kube2iam head-to-head. While kube2iam was declared the winner of that comparison, I feel that the case for kiam too compelling, and the setup too complicated, to not share my experience setting it up in production.
Other posts in the IAM blog series
This post will cover everything you need to get kiam running in production:
For an overview on the motivation behind the creation of kiam, read this blog post by its creator.
The first step to using kiam is to create IAM roles for your pods. Kiam recommends using an IAM role for the server deployment to further control access to your other roles, so we will start with that role. The steps here are mostly regurgitated from the IAM docs page in the kiam github project.
Create a role named kiam_server with the following trust relation and inline policy, where YOUR_MASTER_ROLE is replaced with the arn of the role your master nodes run with.
Trust relation:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "YOUR_MASTER_ROLE"
},
"Action": "sts:AssumeRole"
}
]
}
Inline Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sts:AssumeRole"
],
"Resource": "*"
}
]
}
Get the arn of the role you just created for the next steps. It should look something like arn:aws:iam::111111111111:role/kiam_server
Now create the roles for your pods. Each role will need a policy that has only the permissions that the pod needs to perform its function e.g. listing s3 objects, writing to DynamoDB, reading from SQS, etc. For each role you create, you need to update the trust relationship so that the kiam_server role you created above can assume the individual pod roles. Replace KIAM_SERVER_ARN with the arn you retrieved previously.
Trust relationship:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Effect": "Allow",
"Sid": ""
},
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "KIAM_SERVER_ARN"
},
"Action": "sts:AssumeRole"
}
]
}
Next, make sure the master nodes in your cluster are able to assume the kiam_server role. This will allow you to restrict the master node from assuming other roles and prevents pods running on the master from arbitrarily assuming any role.
Inline policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sts:AssumeRole"
],
"Resource": "KIAM_SERVER_ARN"
}
]
}
Follow the instructions on this page to install the main cert-manager resources with either regular manifests or helm.
Once cert-manager is configured in its namespace, you can then create your first CA issuer. There are instructions on the cert-manager documentation site, but I will go through these steps here as well to include steps specific to kiam’s TLS setup.
First, generate a CA private key and self-signed certificate:
openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -subj "/CN=kiam" -out kiam.cert -days 3650 -reqexts v3_req -extensions v3_ca -out ca.crt
Next, save the CA key pair as a secret in Kubernetes:
kubectl create secret tls kiam-ca-key-pair \
--cert=ca.crt \
--key=ca.key \
--namespace=cert-manager
Create a ClusterIssuer so that certificates can be issued in multiple namespaces using the CA key pair we just created:
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
name: kiam-ca-issuer
spec:
ca:
secretName: kiam-ca-key-pair
Issue a certificate for the kiam agent:
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: kiam-agent
namespace: kube-system
spec:
secretName: kiam-agent-tls
issuerRef:
name: kiam-ca-issuer
kind: ClusterIssuer
commonName: kiam
Next, issue a certificate for the server. Since cert-manager does not support IP SANs at this time, we will change the cert to use localhost instead of 127.0.0.1:
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: kiam-server
namespace: kube-system
spec:
secretName: kiam-server-tls
issuerRef:
name: kiam-ca-issuer
kind: ClusterIssuer
commonName: kiam
dnsNames:
- kiam-server
- kiam-server:443
- localhost
- localhost:443
- localhost:9610
You can check that everything is set up correctly by looking at the secrets created by cert-manager to ensure they exist in the correct namespace:
kubectl -n kube-system get secret kiam-agent-tls -o yaml
kubectl -n kube-system get secret kiam-server-tls -o yaml
Kiam requires that you annotate namespaces and pods for roles to be assumed. The namespace configuration uses a regular expression to limit which roles can be assumed per namespace, and the default is to not allow any roles.
apiVersion: v1
kind: Namespace
metadata:
name: default
annotations:
iam.amazonaws.com/permitted: ".*"
For pods, you just add an annotation in the pod metadata spec. Kiam will automatically detect the base arn for your role using the master’s role, but you can also specify a full arn (beginning with arn:aws:iam) if you need to assume roles in other AWS accounts.
annotations:
iam.amazonaws.com/role: MY_ROLE_NAME
Now you are ready to deploy the kiam server component. First, configure RBAC. The reference file for the DaemonSet and service can be found here but we need to modify it since the default configuration will not work with cert-manager.
First, we change the --cert, --key, and --ca options to point to the file names matching those created by cert-manager. Then, we must change the --server-address from 127.0.0.1:443 to localhost:443 in order for health checks to pass. This is because of the IP SANs issue with cert-manager I mentioned earlier. Next, set the --assume-role-arn flag with the KIAM_SERVER_ARN from earlier so that the server pods will use that role to get credentials for your other roles. Pick a tagged release from here to set as the image tag, since latest should not be used in production. The ssl-certs mounted volume will likely need the host path changed depending on the OS of your Kubernetes masters. Since my cluster was installed using kops on Debian images, the correct hostPath for me was /etc/ssl/certs. All together, we end up with:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
namespace: kube-system
name: kiam-server
spec:
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: kiam
role: server
spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
serviceAccountName: kiam-server
nodeSelector:
kubernetes.io/role: master
volumes:
- name: ssl-certs
hostPath:
nodeSelector:
nodeSelector:
kubernetes.io/role: master
volumes:
- name: ssl-certs
hostPath:
path: /etc/ssl/certs
- name: tls
secret:
secretName: kiam-server-tls
containers:
- name: kiam
image: quay.io/uswitch/kiam:b07549acf880e3a064e6679f7147d34738a8b789
imagePullPolicy: Always
command:
- /kiam
args:
- server
- --level=info
- --bind=0.0.0.0:443
- --cert=/etc/kiam/tls/tls.crt
- --key=/etc/kiam/tls/tls.key
- --ca=/etc/kiam/tls/ca.crt
- --role-base-arn-autodetect
- --assume-role-arn=arn:aws:iam::111111111111:role/kiam_server
- --sync=1m
volumeMounts:
- mountPath: /etc/ssl/certs
name: ssl-certs
- mountPath: /etc/kiam/tls
name: tls
livenessProbe:
exec:
command:
- /kiam
- health
- --cert=/etc/kiam/tls/tls.crt
- --key=/etc/kiam/tls/tls.key
- --ca=/etc/kiam/tls/ca.crt
- --server-address=localhost:443
- --gateway-timeout-creation=1s
- --timeout=5s
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 10
readinessProbe:
exec:
command:
- /kiam
- health
- --cert=/etc/kiam/tls/tls.crt
- --key=/etc/kiam/tls/tls.key
- --ca=/etc/kiam/tls/ca.crt
- --server-address=localhost:443
- --gateway-timeout-creation=1s
- --timeout=5s
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: kiam-server
namespace: kube-system
spec:
clusterIP: None
selector:
app: kiam
role: server
ports:
- name: grpclb
port: 443
targetPort: 443
protocol: TCP
Deploying the kiam server by itself should not cause any changes in your cluster. It is the agent which modifies iptables and will start causing requests for the metadata API to ultimately go to the servers.
Note: If your code uses the AWS Java SDK to make API calls, you must specify the --session-duration flag to be longer than 15 minutes (e.g. 60 minutes). This is because the AWS Java SDK will try to refresh credentials that are expiring within 15 minutes, and the default session duration set by kiam is set to 15 minutes. You can keep up on this issue here and here. If this is not configured correctly, any API call using the AWS Java SDK will attempt to retrieve credentials, putting tons of load on kiam agent, kiam server, and your pods.
Since the agent will modify iptables on your Kubernetes nodes, I would advise adding a node to your cluster that is tainted so you can do a controlled test of the agent and server together. With such a complex setup, there is a high chance that something is configured incorrectly with TLS or the IAM roles, and you will want to be able to handle that without affecting your production workload. So first add a node, and then taint it so other pods will not run on it:
kubectl taint nodes NEW_NODE_NAME kiam=kiam:NoSchedule
Now we can configure the agent component using this reference file. First, we will again update the --cert, --key, and --ca options to point to the file names matching those created by cert-manager. Set the hostPath for the ssl-certs volume mount as you did before, and use the same image tag for the container image as you did in the server config. The --host-interface argument in the command args must be updated to match the interface name for your CNI. A table of the options supported by kiam is on github. Lastly, modify the file by replacing NEW_NODE_NAME with the name of your node so that the agent only runs on our newly-added tainted node so that other nodes will not be affected if you have issues. You should end up with something like:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
namespace: kube-system
name: kiam-agent
spec:
template:
metadata:
labels:
app: kiam
role: agent
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
nodeSelector:
kubernetes.io/role: node
nodeName: NEW_NODE_NAME
tolerations:
- key: kiam
value: kiam
effect: NoSchedule
volumes:
- name: ssl-certs
hostPath:
path: /etc/ssl/certs
- name: tls
secret:
secretName: kiam-agent-tls
- name: xtables
hostPath:
path: /run/xtables.lock
type: FileOrCreate
containers:
- name: kiam
securityContext:
capabilities:
add: ["NET_ADMIN"]
image: quay.io/uswitch/kiam:b07549acf880e3a064e6679f7147d34738a8b789
imagePullPolicy: Always
command:
- /kiam
args:
- agent
- --iptables
- --host-interface=cali+
- --json-log
- --port=8181
- --cert=/etc/kiam/tls/agent.pem
- --key=/etc/kiam/tls/agent-key.pem
- --ca=/etc/kiam/tls/ca.pem
- --server-address=kiam-server:443
- --gateway-timeout-creation=1s
env:
- name: HOST_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
volumeMounts:
- mountPath: /etc/ssl/certs
name: ssl-certs
- mountPath: /etc/kiam/tls
name: tls
- mountPath: /var/run/xtables.lock
name: xtables
livenessProbe:
httpGet:
path: /ping
port: 8181
initialDelaySeconds: 3
periodSeconds: 3
Now you can create the agent and verify that only a single agent is running on your new node. There should be no change to your pods running on other nodes.
At this point, you will want to get started with testing that everything works. You can do this by deploying a pod to the quarantine node and then using the AWS CLI to test access to resources in your pod. While you are doing this, check the logs of the kiam agent and server pods to debug any issues you encounter. Here’s an example of a deployment where you can specify a role and then test access:
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: aws-iam-tester
labels:
app: aws-iam-tester
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: aws-iam-tester
template:
metadata:
labels:
app: aws-iam-tester
annotations:
iam.amazonaws.com/role: TEST_ROLE_NAME
spec:
nodeSelector:
kubernetes.io/role: node
nodeName: NEW_NODE_NAME
tolerations:
- key: kiam
value: kiam
effect: NoSchedule
containers:
- name: aws-iam-tester
image: garland/aws-cli-docker:latest
imagePullPolicy: Always
command:
- /bin/sleep
args:
- "3600"
env:
- name: AWS_DEFAULT_REGION
value: us-east-1
The pod will exit after an hour, and you can get use kubectl to get a TTY to the pod:
kubectl exec -it POD_NAME /bin/sh
Once you are satisfied that your roles work, and that the kiam agent and server are correctly set up, you can then deploy the agent to every node.
Remove the nodeName key and kiam:kiam tolerations from your agent DaemonSet to allow it to run on every node. I also recommend modifying the server deployment to only log warning-level messages using the --level=warn command arg, or else you will end up with a very large amount of info level logs in a production set up.
Once the agent is installed on each node, you should roll out an update to critical pods to ensure that those pods begin using kiam for authentication immediately. Check your application logs and the kiam server logs for any IAM errors. If you encounter issues, you can delete the agent from all nodes, which will automatically remove the iptables rule and allow your pods to authenticate in the way they did previously.
As I mentioned in the previous post, I saw a performance drop using kiam. Monitor the pods that use AWS services heavily to see if there is an impact. Based on the number of calls being made between the kiam agent and servers, you may see an increase in cross-AZ traffic in AWS, which is billed. Billing totals are updated at least daily, so check for a few days to make sure there is nothing unusual on that front.
Finally, remove the tainted node that we created for testing, or remove the taint so that pods can be scheduled to it to include it in your cluster.
You should now have kiam running in your production cluster. The kiam setup is very long and very painful, but hopefully you were able to get through it without too much IAM or TLS debugging. I found the #kiam slack channel useful when setting it up in my cluster, and recommend you ask specific implementation questions there.
In the next post, we will cover setting up kube2iam in production. Remember to follow the kiam and cert-manager projects on github to support their efforts.