Implementanding

Wing it

Mariano González — Tue, 26 Sep 2023 16:34:47 GMT

A month ago, I got my DevOpsWeekly (I highly recommend it) newsletter Sunday email (issue #661). I went through it, reading some interesting stuff and I saw one particular link that caught my eye: a brief introduction to a new cloud language called Winglang.

The idea behind it is very simple: how can we combine both Infrastructure and application code in the same codebase? Enter Winglang's preflight and inflight concepts.

New language, new syntax, some familiar, some new

Preflight vs Inflight

Preflight is the Infrastructure side mode. This code runs just once at creation time and will not only generate all required (Infra) resources but also wire all up using pertinent configurations, linkages, permissions and environment setup.

Inflight is the application code per se. This is the application software that will be hosted in the Infrastructure resources (preflight).

More on these and more Winglang concepts, here.

Features

Some of the cool things it offers include (just a couple of them):

IaC ready (compiled code).
Web-based console.
Define classes to modify inflight built-in cloud libraries.
JS/TS and CDK support to import external resources/methods.

IaC ready

Another cool thing Winglang does is to compile your code into cloud-ready IaC like Terraform or CDK to be deployed right away.

$ wing compile --helpUsage: wing compile [options] Compiles a Wing programArguments:  entrypoint                 program .w entrypointOptions:  -h, --help                 display help for command  -p, --plugins [plugin...]  Compiler plugins  -r, --rootId       App root id  -t, --target       Target platform (choices: "tf-aws", "tf-azure", "tf-gcp", "sim", "awscdk", default: "sim")

Default compile mode: sim = the Wing console

By default, the local simulator is the target compiler - when you run it as:

wing it .w

It will open a web browser with the Console for a visual representation (and interaction with your resources). It has an automatic reloader built in, so it monitors your changes and applies them on the fly, just save it and see it in the browser.

If you select one of the resources, like the check function, you can make use of the interactive console on the right to invoke it, pass parameters to it and get feedback (logs) from the execution. Same with a bucket, you can see the objects in it.

Yeah, live debugger FTW.

Custom classes

You can also define classes to modify inflight default modes for a given object - much like a TF module. You create a wrapper around native methods and create your own composable object with custom methods, outputs, etc.

Sample app

Since Winglang is based on JS and Typescript, it is still a learning curve for me - but I built a simple working app to play around and apply some of its core concepts.

This sample app is here as well.

As a prerequisite, install Winglang:

npm install -g winglang

(And the VSCode extension if you want, but it's not required).

I wrote all the code in main.w, as follows.

bring cloud;bring aws;bring "./classes.w" as customThings;let b = new cloud.Bucket() as "the_bucket"; // create a bucketlet bucket_funct = new cloud.Function(inflight (data: str) => { // create a sample function    b.put("some-file.txt","some text inside");    log("added ${data}");}) as "bucket_function";let s = new cloud.Secret(name: "username1") as "the_secret";let secret_funct = new cloud.Function(inflight () => {    let sVal = s.value();    b.put("${sVal}.txt",sVal);    log("added secret");}) as "secret_function";let custom_bucket: customThings.CustomBucket = new customThings.CustomStorage() as "CustomBucket"; // create a bucket object from the CustomStorage classlet fput = new cloud.Function(inflight () => {    custom_bucket.store("It works!");}) as "put";// next policy is not required (Winglang will populate policies by itself), but I had to dig a bit to find how to, so I'm adding it here for future referenceif let putFn = aws.Function.from(fput) {     putFn.addPolicyStatements(        aws.PolicyStatement {            actions: ["s3:PutObject*"],            effect: aws.Effect.ALLOW,            resources: ["*"] // could not yet find a way of referencing the target bucket ARN        }    );}let fcheck = new cloud.Function(inflight () => { // declare the "check" function    custom_bucket.check("upload.txt");    custom_bucket.check("upload.json");    custom_bucket.check("unexistent.file");}) as "check";

And a custom class definition in classes.w:

bring cloud;bring aws;interface CustomBucket extends std.IResource {   inflight store(data: str): void;  inflight check(data: str): bool;}class CustomStorage impl CustomBucket {    bucket: cloud.Bucket;    init() { // Create a (cloud) bucket      this.bucket = new cloud.Bucket() as "custom-bucket";    }    pub inflight store(data: str): void { // create a custom store method to upload a couple example files to the bucket      let file = "upload";      this.bucket.put("${file}.txt", data);      this.bucket.putJson("${file}.json", Json { "data": data});    }    pub inflight check(data:str): bool { // another custom method to check the content of a given file(s)        if (this.bucket.exists(data)) { // check if the file exists in the bucket            let fileData = "";            try {                let fileData = this.bucket.getJson(data);                assert(fileData.get("data") == "It works!");                log("a JSON file");            } catch e {                if e.contains("is not a valid JSON") {                    let fileData = this.bucket.get(data);                    assert(fileData == "It works!");                    log("a TXT file");                } else {                    log(e);                }            }        } else { // if it doesn't exist, log an error            log("File ${data} not found");        }    }}

Preflight

These are the Infrastructure resources defined:

A standalone bucket.
Two standalone functions.
A CustomStorage class containing two functions and a bucket. These functions are new interfaces using native methods like put and get.

Inflight

For the app code:

A function uploads a sample .txt file to the bucket.
The other function uploads a (local) secret to that same bucket.
In the CustomStorage class, a new bucket is created and two functions will: upload some files to the new bucket and run a simple check for the file names/contents in it.

Wing it, locally

wing it main.w

It opens a web browser showing all the components and relations, ready to be inflight run.

The compilation into TF (AWS)

Now let's compile this into Terraform-ready code, in my case for AWS.

wing compile --target tf-aws main.w

This command will create a target directory with the following schema:

 target     main.tfaws         assets            bucket_function_Asset_859DBBF7               F4D309B0C55317888FECA9461027AD14                   archive.zip            check_Asset_BAE7D2BE               22B6A7822E4F7722DF50A1AA6A5ED16C                   archive.zip            put_Asset_3BF5C371               DD164E414B5BA274E67C872B892F3B19                   archive.zip            secret_function_Asset_729CDAD8                CF5A69D348E2C2D58B2C6799DC8BD025                    archive.zip         connections.json         main.tf.json         tree.json

As you can see in there, the inflight code (for each of the 4 defined functions) is zipped, ready to be referenced from the TF code in main.tf.json.

You can have a look at the main.tf.json contents (Terraform resources definitions) either in plain text or by having a look at the Outline section in VS Code (sorry for the newbie excitement, this was new to me!).

In there, you can see for example the pre-populated IAM roles and policies for the Lambda functions.

Now, to see what's to be created, let's run Terraform within the target/main.tfaws directory. Of course, you will have to have your AWS credentials already set up.

cd target/main.tfawsterraform initterraform plan# Plan: 23 to add, 0 to change, 0 to destroy.

Creating the AWS resources

Now if we wanted to see those resources deployed in AWS, we will need to first create the secret in AWS Secrets Manager. In the local simulation mode, the secret resource is created from a local FS directory, but in the cloud context, it expects that to be created in AWS Secrets Manager as Terraform will treat it as a data resource to be fetched internally.

More about secret data resource in Terraform. And datasources in general.

Then, apply:

terraform apply

This will end up creating:

4 Lambda functions.
3 buckets:
- code for the Lambda(s) code (4 zip files).
- The the_bucket standalone.
- The custom_bucket (from the CustomBucket class we created).
3 IAM roles.
3 IAM policies.

You can then test the Lambda functions that will upload files to the buckets. Same as in the Web Console.

Clean up

As always, don't forget to clean up whatever you created. Remember to first delete manually any uploaded file(s) from the buckets. Then run:

terraform destroy

Conclusion

Winglang is in pre-release status, so there's a lot to be added. There is still a minimal library toolset available but it looks very promising.

One thing to notice is the amount of assumptions it makes. Let's see a (AWS) Lambda function. Winglang will populate all the IAM roles and their policies to create a code repository bucket, upload the zip code and then access it, perform any action the function needs to do (like fetch the secret content or put objects to a given bucket) and give this role assume permission for the AWS Lambda service.

I mean it all makes sense as the language was created with "all batteries included" so there has to be a way of abstracting all these configurations and details out of the code somehow. But, if you are curious or concerned about low-level Infrastructure management, then you should keep an eye on this. You can always attach new policies - I added an example implementation.

And if you are coming from a TS/JS/Java background, Winglang should be a piece of cake for you - for me...still a long way to go.

References

Thank you for stopping by!

The 3 Rs: Reduce, Recycle, Repeat

Mariano González — Tue, 19 Sep 2023 18:28:02 GMT

Why

One of (Software) Development's best practices is to make code reusable - meaning we want to make sure to reutilise pieces and bits of code so as not to write repetitive lines in our program.

What

A module in Terraform is a grouping of resources, an object that allows us to create a bunch of resources from a given set of input parameters used to process and further manage the output infrastructure.

How

There are two ways of using a module: referencing a published one or writing your own.

In the Terraform registry, a lot of modules are published and available for you to use in your code. These are free, community-maintained (in most cases), tested and documented, which makes them a very good resource to use and reuse code.

Or (AND) you could write your own, based either on a published one or from scratch by declaring native resources.

Let's see why you would want to use a module

The following example(s) assumes you have already used Terraform to manage Infrastructure resources and you know the basics. Also, this guide is based on a preexisting AWS EKS cluster.

Say you have 3 services (apps) hosted in Kubernetes (EKS) that upload files to a (AWS) S3 bucket and you want to enable these EKS services to interact with S3 using dynamic credentials (IRSA).

Psst, here is an IRSA implementation. Also, all these examples are here.

What do you need to create?

An S3 bucket with its base ACLs, then an IAM role with its policies to both allow the application in EKS to assume the role via AssumeRoleWithWebIdentity and the permissions required to upload files to the bucket. Those are in total 4 different Terraform resources (per application) and some additional code to get information/parameters.

Using plain resources

You would need to do something like:

locals {  apps = {    app-1 = { "permissions" : ["s3:PutObject"] },    app-2 = { "permissions" : ["s3:GetObject", "s3:PutObject"] },    app-3 = { "permissions" : ["s3:GetObjectVersion"] }  }}# S3 resourcesresource "aws_s3_bucket" "the_bucket" {  for_each = local.apps  bucket_prefix = join("", [each.key, "-"])}resource "aws_s3_bucket_ownership_controls" "the_bucket_oc" {  for_each = {    app-1 = { "permissions" : ["s3:PutObject"] },    app-2 = { "permissions" : ["s3:GetObject", "s3:PutObject"] },    app-3 = { "permissions" : ["s3:GetObjectVersion"] }  }  bucket = aws_s3_bucket.the_bucket[each.key].id  rule {    object_ownership = "BucketOwnerPreferred"  }}resource "aws_s3_bucket_acl" "the_bucket_acl" {  depends_on = [aws_s3_bucket_ownership_controls.the_bucket_oc]  for_each = local.apps  bucket = aws_s3_bucket.the_bucket[each.key].id  acl = "private"}resource "aws_s3_bucket_public_access_block" "the_bucket_ab" {  for_each = local.apps  bucket = aws_s3_bucket.the_bucket[each.key].id  block_public_acls       = true  block_public_policy     = true  ignore_public_acls      = true  restrict_public_buckets = true}# IAM resources## Some prerequisitesdata "aws_eks_cluster" "this" { # get EKS cluster attributes to use later on.  name = "test-eks-cluster"}data "aws_caller_identity" "current" {} # get current account IDdata "aws_iam_policy_document" "assume-policy" { # create an assume policy for STS  statement {    actions = ["sts:AssumeRoleWithWebIdentity"]    principals {      type = "Federated"      identifiers = [        replace(          data.aws_eks_cluster.this.identity[0].oidc[0].issuer,          "https://",          join("", ["arn:aws:iam::", data.aws_caller_identity.current.account_id, ":oidc-provider/"])        )      ]    }  }}## Role definitionresource "aws_iam_role" "the_role" { # create the IAM role and attach both assume and inline identity based policies.  for_each = local.apps  name = each.key  path               = "/"  assume_role_policy = data.aws_iam_policy_document.assume-policy.json  inline_policy {    name = "s3-put"    policy = jsonencode({      Version = "2012-10-17"      Statement = [{        Action = [for permission in each.value["permissions"] : permission]        Effect = "Allow"        Resource = aws_s3_bucket.the_bucket[each.key].arn      }]    })  }  tags = {    # always add some tags!  }}resource "aws_iam_policy" "the_policy" {  for_each = local.apps  name_prefix = join("",[each.key,"-"])  path        = "/"  policy      = jsonencode({    Version   = "2012-10-17"    Statement = [      {        Action   = [ for permission in each.value["permissions"]: permission ]        Effect   = "Allow"        Resource = join("",[aws_s3_bucket.the_bucket[each.key].arn,"/*"])      },    ]  })}resource "aws_iam_role_policy_attachment" "the_policy_attachment" {  for_each = local.apps  role       = aws_iam_role.the_role[each.key].name  policy_arn = aws_iam_policy.the_policy[each.key].arn}output "the_bucket" {  value = values(aws_s3_bucket.the_bucket).*.arn}output "the_role" {  value = values(aws_iam_role.the_role).*.arn}

Total: 131 lines of code.

Now let's do the same using upstream (registry) modules

There are published modules for S3 and IAM roles with IRSA support.

locals {  apps = {    app-1 = { "permissions" : ["s3:PutObject"] },    app-2 = { "permissions" : ["s3:GetObject", "s3:PutObject"] },    app-3 = { "permissions" : ["s3:GetObjectVersion"] }  }}data "aws_eks_cluster" "this" { # get EKS cluster attributes to use later on.  name = "test-eks-cluster"}data "aws_caller_identity" "current" {} # get current account IDmodule "s3_bucket" {  for_each = local.apps  source = "terraform-aws-modules/s3-bucket/aws"  bucket_prefix    = join("",[each.key,"-"])  acl              = "private"  control_object_ownership = true  object_ownership         = "BucketOwnerPreferred"}module "iam_assumable_role_with_oidc" {    for_each = local.apps  source      = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"  create_role = true  role_name   = each.key  tags = {    # add some tags!  }  provider_url = data.aws_eks_cluster.this.identity[0].oidc[0].issuer  role_policy_arns = [    aws_iam_policy.the_policy[each.key].arn,  ]  number_of_role_policy_arns = 1}resource "aws_iam_policy" "the_policy" {    for_each = local.apps  name_prefix = join("",[each.key,"-"])  path        = "/"  policy      = jsonencode({    Version   = "2012-10-17"    Statement = [      {        Action   = [ for permission in each.value["permissions"]: permission ]        Effect   = "Allow"        Resource = join("",[module.s3_bucket[each.key].s3_bucket_arn,"/*"])      },    ]  })}output "the_bucket" {    value = values(module.s3_bucket).*.s3_bucket_arn}output "the_role" {    value = values(module.iam_assumable_role_with_oidc).*.iam_role_arn}

Total: 69 lines.

With less (half) code, you will end up with the same set of resources as before (21).

A module using modules

Now, let's go a bit further and write our own module using two published ones, so it makes more sense to our use case and avoids code repetition.

To achieve reusability you need to create code that:

Accepts a set of names/IDs for each bucket and role to be created.
Loops through that given set.
Calls the upstream modules the less amount of times.
Is flexible enough to be reused with as many inputs as needed.
For this topic, let's think about what attributes all apps share and what is specific for each of them.
- Whatever is common will be set statically in the module and only use a name or identifier to be managed (e.g. a bucket ACL or a role assume policy).
- But if we had a parameter that may vary per app, we would populate it dynamically in the module (e.g. the role permissions over the bucket objects).

Also as a general guideline, we will be referencing a locally written module.

The directory schema will look like the following:

. my_awesome_module    iam_role.tf    outputs.tf    s3_bucket.tf    variables.tf with_local_module     the_apps_local_module.tf

Where:

the_apps_local_module.tf is the main Terraform file, where we call our module with a set of input applications' parameters.
my_awesome_module is the local module directory.
- s3_bucket.tf calls the upstream S3 bucket module.
- iam_role.tf references the upstream IAM role module and defines a standalone role policy.
- variables.tf where we define the input variables expected by our local module.
- outputs.tf defining a couple of return values from the created resources, which I highly recommend but it's optional.

The module and its files

s3_bucket.tf (required)

module "s3_bucket" {  source = "terraform-aws-modules/s3-bucket/aws"  bucket_prefix    = join("",[var.bucket_name,"-"])  acl              = "private"  control_object_ownership = true  object_ownership         = "BucketOwnerPreferred"}

iam_role.tf (required)

data "aws_eks_cluster" "this" { # get EKS cluster attributes to use later on.  name = "test-eks-cluster"}data "aws_caller_identity" "current" {} # get current account IDmodule "iam_assumable_role_with_oidc" {  source      = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"  create_role = true  role_name   = var.role_name  tags = {    # add some tags!  }  provider_url = data.aws_eks_cluster.this.identity[0].oidc[0].issuer  role_policy_arns = [    aws_iam_policy.the_policy.arn,  ]  number_of_role_policy_arns = 1}resource "aws_iam_policy" "the_policy" {  name_prefix = join("",[var.role_name,"-"])  path        = "/"  policy      = jsonencode({    Version   = "2012-10-17"    Statement = [      {        Action   = [ for permission in var.permissions: permission ]        Effect   = "Allow"        Resource = join("",[module.s3_bucket.s3_bucket_arn,"/*"])      },    ]  })}

variables.tf (required)

variable "bucket_name" {    description = "S3 bucket name"    type        = string    default     = ""}variable "role_name" {    description = "IAM role name"    type        = string    default     = ""}variable "permissions" {    description = "List of permissions to attach to the IAM role"    type        = list    default     = []}

outputs.tf (optional)

output "the_bucket" {    value = module.s3_bucket.s3_bucket_arn}output "the_role" {    value = module.iam_assumable_role_with_oidc.iam_role_arn}

The call

Now that we have our module, we can reference it by passing all the apps we need with their corresponding input (variable) values.

the_apps_local_module.tf

module "the_apps" {  for_each = {    app-1 = { "permissions" : ["s3:PutObject"] },    app-2 = { "permissions" : ["s3:GetObject", "s3:PutObject"] },    app-3 = { "permissions" : ["s3:GetObjectVersion"] }  }  source = "../my_awesome_module" # reference to our local module  bucket_name = each.key  role_name   = each.key  permissions = each.value["permissions"]}output "the_apps" {  value = module.the_apps}

Total: 85 lines for the same 21 resources as before.

Testing all up

git clone https://github.com/marianogg9/tf-modulingcd tf-modulingcd with_standalone_resourcesterraform initterraform plan# Plan: 21 to add, 0 to change, 0 to destroy.cd ../with_upstream_modules/terraform initterraform plan# Plan: 21 to add, 0 to change, 0 to destroy.cd ../with_local_module/terraform initterraform plan# Plan: 21 to add, 0 to change, 0 to destroy.

Cleanup

If you went ahead and created any resources, please don't forget about this, your future self will thank you.

Delete any file uploaded to the S3 bucket(s).
Then delete TF-managed resources by running terraform destroy from the corresponding directory.

Conclusion

As always, everything depends on your use case. While it is easier to use a well-documented mainstream solution, it may not make sense to you, your team or even your apps.

If you have a specific Infrastructure need to allocate custom resources, you can write your module by either reusing upstream Terraform registry modules or plain native resources.

And it is also true that you do not have to use modules at all, some applications' Infrastructure don't need complicated IaC code bases, just define plain resources and make it as simple as possible for you.

Whatever makes sense to you: try it out, fail often, iterate and you will for sure create extensible, maintainable, reusable and overall better code.

References

Thank you for stopping by! Do you know other ways to do this? Please let me know in the comments, I always like to learn how to do things differently.

The Contra questions

Mariano González — Sun, 10 Sep 2023 15:56:10 GMT

I participated in a few interview processes during my career and also conducted some more. It is, to me, a very exciting experience where I get to measure my knowledge and a way of gamifying the learning process. In my experience, most of the interviews where I was the answering party, were truly a trigger to either research some new tool or go deeper into what I already had experience in.

But the best interviews I did were the ones I got asked about why NOT using X platform.

I am a Cloud Engineer, so most of my examples and quotes come from an Infrastructure background. But this article applies to any field to be fair.

Traditionally

I have this setup, with these constraints and this expected performance; where do you recommend hosting this? Why?

So one would go over the pros of a certain platform and why it is the best suited for the job. And that has value in itself, given a set of requirements, being able to identify and match strong arguments with the expected outcome is, at the end of the day, what makes a good solution.

But what if you started with cons instead?

I have this setup, with these constraints and this expected performance. Where do you NOT recommend hosting this? Why is that? Can you tell me a couple of deal breakers of X platform?

Every solution, platform, tool or any involved process depends on a specific use case. You wouldn't overcomplicate a solution just because it involves a buzzword.

Right? Right?

So maybe a good way of understanding how a platform would actually fit a given problem is what not to choose. It will show a couple of good traits.

The Pros of the Cons

Deep analysis of the problem.
Weighing in the possible outcomes.
Trade-off discussion and affordable compromises.

What makes this way valuable?

If you can articulate against a few solutions, it shows you have the experience to compare different platforms, which not only demonstrates actual practical hands-on but also that you can discuss approaches and disadvantages to get closer to the best solution.

Example: Kubernetes

Kubernetes is arguably the de facto platform to host and orchestrate most (containerised) applications. I'm not discovering anything.

Is it technically applicable to anything?

Sure, you can use it for 99% of modern use cases and it will just work. Even for a simple API serving a JSON response for a given product metadata? Of course, you can make it happen using a simple setup of required objects (a deployment, secrets, and a couple more).

Here's an example of a Kubernetes implementation using Helmfile, if you are feeling curious...

But is it worth the overhead?

For every use case? Certainly not.

On the one hand, Kubernetes is excellent in a distributed environment, HA, autoscaling, resource management, security and 3rd party tools integration. Lots of documentation out there, everyone is using it, battle-tested, reliable and mature.

On the other, it also introduces some questions that you will have to deal with and think about, naming a few:

Will you self-host it? Are you using any orchestrator, like Kind? Are you maintaining it yourself? Are you doing it in a cloud provider instead?
Have you considered the (additional) costs involved?
Have you planned a new deployment method? Are you willing to change your current one?
Is the practical knowledge already available in your team?

Then why would you not use it?

Well, if these sounds like your application:

Single instance, no need for high availability.
It serves atomic actions, with no interdependencies or moving parts.
Short-lived processing, no intensive computing requirements.

And/or your team:

Does not have Kubernetes experience.
Has a somewhat short budget.

Maybe you can consider other options to host your app. Serverless might be a good candidate.

Everything is a matter of your use case, completely constrained to your app and business requirements.

Conclusion

If you have the chance to conduct an interview, maybe explore this approach. I am sure it will give you much richer information about someone else's experience.

But you know, this is not only applicable to interviewing.

And as a final takeout, do interviews. It keeps you up to date with the market and pushes your learning forward. It is also a great opportunity to discover new ways of doing things, a new technology you haven't used or new features of a tool you use daily and did not know (because every use case is different, remember?).

But also if you do, be prepared to consider an eventual offer - interview processes are ($$) costly and time-consuming for both parties, so please be mindful of others' work too.

References

The STAR method.
The Socratic method.

Thank you for stopping by! Do you know other ways to do this? Please let me know in the comments, I always like to learn how to do things differently.

Do you even JSONPath?

Mariano González — Wed, 24 May 2023 09:28:04 GMT

I like the CLI and a black background console when it comes to checking on things and debugging, it makes me feel more confident in what I'm seeing and closer to the basics.

A big chunk of my recent years has been all about Kubernetes. Kubernetes this, Kubernetes that, mostly EKS professionally, here and there playing around GKS and of course, Minikube is my bro on local.

And kubectl is also a close friend of mine, the first tool kinda everyone learns when starting with Kubernetes, but perhaps the easiest and cleanest too.

JSONPath

But do you know kubectl offers JSONPath support natively? yeah, JSONPath as in a JSON query language that enables you to interact with a JSON structured data set.

Using a set of JSONPath expressions, one could query, parse and format a JSON structure easily to get whatever information needed in a readable and clear way.

Enough with the written words

Let's query some Kubernetes thingy(s).

The usual stuff

Minikube (check this article on how to install it in your local).
Kubectl.

Once both are installed, let's deploy a sample pod + service + service account (for this example, I will be using the one to rule them all: httpbin).

# create a new namespacekubectl create ns jsonpath-playground# get httpbin's YAML (from Istio examples)curl -s https://raw.githubusercontent.com/istio/istio/master/samples/httpbin/httpbin.yaml -O# deploy httpbin resourceskubectl apply -f httpbin.yaml -n jsonpath-playground

Now we have a set of resources created in our local cluster:

A pod.
A service.
A service account.

Check them out with:

$ kubectl get pod,svc,sa -n jsonpath-playgroundNAME                          READY   STATUS    RESTARTS   AGEpod/httpbin-bf5fc5d74-p9h2g   1/1     Running   0          65sNAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGEservice/httpbin   ClusterIP   10.100.156.45           8000/TCP   65sNAME                     SECRETS   AGEserviceaccount/default   1         83sserviceaccount/httpbin   1         65s

By default, kubectl outputs its data in plain-text columns, nice and readable, with a limited amount of information that gives an overview of what we are querying. This is indeed intended to give a quick look at the resources, but not much in-depth.

If we wanted to go beyond that, we could use the -o wide flag, making kubectl output a bit more information.

But let's go deeper. We want to see the service account's raw JSON output. Let's use -o json flag now.

kubectl get sa -n jsonpath-playground httpbin -o json

We get something like this:

{    "apiVersion": "v1",    "kind": "ServiceAccount",    "metadata": {        "annotations": {            "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"ServiceAccount\",\"metadata\":{\"annotations\":{},\"name\":\"httpbin\",\"namespace\":\"jsonpath-playground\"}}\n"        },        "creationTimestamp": "2023-05-22T11:08:34Z",        "name": "httpbin",        "namespace": "jsonpath-playground",        "resourceVersion": "859",        "uid": "uid"    },    "secrets": [        {            "name": "httpbin-token-uid"        }    ]}

This is a very simple object, which does not have that many attributes. Let's make it a bit richer by adding an AWS IRSA annotation (see my IRSA article if you are curious about it).

$ kubectl annotate sa -n jsonpath-playground httpbin eks.amazonaws.com/role-arn=arn:aws:iam::123456:role/irsa-roleserviceaccount/httpbin annotated

This will add an annotation to the httpbin service account, mapping it to a dummy AWS IAM role. But fear not my fellow reader, it can be any annotation for this example purpose.

Let's check how it looks now, with:

kubectl get sa -n jsonpath-playground httpbin -o json

Getting something like:

{    "apiVersion": "v1",    "kind": "ServiceAccount",    "metadata": {        "annotations": {            "eks.amazonaws.com/role-arn": "arn:aws:iam::123456:role/irsa-role",            "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"ServiceAccount\",\"metadata\":{\"annotations\":{},\"name\":\"httpbin\",\"namespace\":\"jsonpath-playground\"}}\n"        },        "creationTimestamp": "2023-05-22T11:08:34Z",        "name": "httpbin",        "namespace": "jsonpath-playground",        "resourceVersion": "1501",        "uid": "uid"    },    "secrets": [        {            "name": "httpbin-token-uid"        }    ]}

What if we wanted to see just the service account's annotations? Enter JSONPath.

Let's modify the above command using a (JSONPath) query expression:

kubectl get sa -n jsonpath-playground httpbin -o jsonpath='{.metadata.annotations}'

Obtaining:

{"eks.amazonaws.com/role-arn":"arn:aws:iam::123456:role/irsa-role","kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"ServiceAccount\",\"metadata\":{\"annotations\":{},\"name\":\"httpbin\",\"namespace\":\"jsonpath-playground\"}}\n"}

Not the prettiest, but still way more condensed than the first version.

Let's get just one of them. Remember you can get into any JSON structure by specifying an index (or a key in this case since annotations attribute is a dictionary).

Here you also need to escape . and / as they are interpreted JSONPath expressions.

$ kubectl get sa -n jsonpath-playground httpbin -o jsonpath='{.metadata.annotations.eks\.amazonaws\.com\/role-arn}'arn:aws:iam::123456:role/irsa-role

See? Nice and clear. I use this, for example, whenever I want to check if a specific value is set, and matches whatever I need it to be. It's a very quick way to do so.

Also, you could use this in a script or an automated test.

Above and beyond

Let's do something else, like recursive querying and parsing the httpbin pod structure for its tolerations. Given the pod's tolerations attribute is a list, we can traverse it and extract its elements. Then we can also add some external information to make it more readable.

Let's first check the pod name:

$ kubectl get pod -n jsonpath-playgroundNAME                      READY   STATUS    RESTARTS   AGEhttpbin-bf5fc5d74-zgvtl   1/1     Running   0          34m

And then use its name for the next query:

$ kubectl get pod -n jsonpath-playground httpbin-bf5fc5d74-zgvtl -o jsonpath='{range .spec.tolerations[*]}{"key name: "}{.key}{"\n"}{end}'key name: node.kubernetes.io/not-readykey name: node.kubernetes.io/unreachable

Feeling the power of the CLI on your side already? I am.

We have done three things:

Traversed the .spec.tolerations list and extracted only the key named key from all items, with {range .items}{.key_name}{end}.
Added the prefix key name: to make it more readable, by adding a {"key name: "} in between query parameters.
Printed the output in different lines to make it even more readable, adding {"\n"}.

Cleaning up

Don't forget to delete Minikube's cluster from your local with:

minikube delete

Also online, because it's not always about Kubernetes

And, if you don't want to spin up a local cluster or you are already tired of kubectl and Kubernetes, you can play around with one of the many online debuggers, like this one. There is a sample JSON input provided, but feel free to use the ones above as well.

Keep in mind the syntax is slightly different than the one we saw before.

Conclusion

I really like kubectl. That's pretty obvious at this point.

Though I checked other tools, like k9s, I found myself always coming back to it.

When I discovered it has JSONPath support, at first it was not that usable for me as I was not using kubectl in that depth. But after a while, it became clear that it is a pretty neat tool to use if you like JSON better than YAML.

Kubectl also allows YAML output with -o yaml flag.

Of course, there are way more options than the ones we used in this example overview, and you can combine them to create much simpler or richer outputs.

References

Thank you for stopping by! Do you know other ways to do this? Please let me know in the comments, I always like to learn how to do things differently.

All right then, keep your secrets in Git with SOPS

Mariano González — Mon, 08 May 2023 10:13:22 GMT

Wait, what? Don't panic, SOPS will encrypt them for you.

Big NO NO

Secrets (a.k.a. sensitive values), such as passwords or any other potentially harmful information must be protected, we all know that.

That's why you should avoid sharing those or pushing them to a public (or even private) code repository. Right? Well, yes and no.

But where do you store them?

There are many vault-like services such as Hashicorp Vault, AWS Secrets Manager, GCP KMS, Azure Key Vault and the well-known credential storages out there like 1Password.

All of the above offer a separate secure location to isolate the secrets from the code referencing them, but also all of them need an extra fetch process; how do you get the actual value? you add a call to the secrets store using the corresponding method and you will have the secret value exported to your app.

Wanna save a step? Enter SOPS.

Why SOPS?

Because it offers a simplified version of keeping sensitive information. You can directly store the values in your code and use one of AWS KMS, PGP, GCP KMS, Azure Key Vault, Hashicorp Key Vault and age (I did not know this one) to encrypt and decrypt. No need to push/pull or store the secrets somewhere else.

Example

Let's sample this in a Kubernetes local scenario, using:

Helm (Helmfile) to deploy a sample chart + Minikube hosting a simple Kubernetes cluster in your local.
- Please refer to this article about Helmfile + Minikube for a full installation walkthrough. For this tutorial, I will assume both are installed in your environment.
AWS KMS key.
SOPS.

Here you can find the reference code and a few Terraform templates to create KMS and IAM resources. I will be using the AWS console for this tutorial, but I added a walkthrough for Terraform here.

KMS

Manually in the AWS console:
1. Access AWS KMS service.
2. Go to Customer-managed keys and click on Create key.
3. Select Symmetric + Encrypt and decrypt options, then Next.
4. Give it an alias and Next.
5. Select a Key administrator and Next.
6. Select a Key user (this step can be done later, after creating the IAM user), Next.
7. Finish.

IAM

Manually in the AWS console:
1. Access IAM service > Users.
2. Add Users.
3. Give it a name > Next.
4. Select Attach policies directly > and then Create policy (this will open the new policy wizard).
  1. In the new policy wizard, select JSON and then paste the following JSON policy definition > Next:
```
 {     "Version": "2012-10-17",     "Statement": [         {             "Sid": "VisualEditor0",             "Effect": "Allow",             "Action": [                 "kms:Decrypt",                 "kms:Encrypt",                 "kms:DescribeKey"             ],             "Resource": "arn:of:your:kms:key"         }     ] }
```
  2. Give the new policy a name and then click on Create policy.
5. Now go back to the IAM user creation wizard (previous tab) and click the refresh icon to update the available policies list. Then enter the new policy name in the search field and select it > Next > Create user.
6. Now go back to the KMS service, select the key you created previously and select Add in the Key Users section of the key.
  1. Select the IAM user created before > Add.

Baseline deployment with Helmfile

First, clone the repo:

git clone https://github.com/marianogg9/playing-with-sops.git

Once you have Minikube running and Helmfile installed, let's deploy the sample httpbin app in the cluster:

cd playing-with-sops/helmfile/helmfile -e example apply

This will deploy httpbin app in your local Kubernetes environment, including a pod, a service, a service account and a secret (we will use it later).

Let's see how environment variables look in the freshly created pod (we will use it to compare with a later step, trust me):

And now: THE SECRETS

Helmfile supports secrets by using helm secret plugin to manage and inject sensitive information into a given release values.

The caveat here is those secret values are going to be shown in plain text in the helmfile diff or helmfile apply outputs, so be careful!
Keep reading, there is a workaround for that.

Helm-secrets plugin

You will need to install helm-secrets plugin:

helm plugin install https://github.com/jkroepke/helm-secrets

SOPS

Installing:

brew install sops

Let's use it now! First, we need to create a .sops.yaml with the following content:

creation_rules:  - path_regex: \.yaml$    kms: 'arn:of:your:kms:key'

This is a global configuration file that SOPS will use as a default when encrypting/decrypting files. You can set a regex for filenames and locations and a KMS ARN to use. That KMS is the one we created before.

The above example will accept any names anywhere ending in .yaml so it doesn't matter where you create this specific config file.

Remember always to configure your local AWS CLI to use the KMS Key User we created before. See this article on how to set up and use it.

Next, create secrets.yaml file within helmfile/ directory, by running the following:

sops secrets.yaml

This will open a text editor with prefilled sample values. Replace the content with:

my_var: a top secret valuemy_other_var: not that sensitive, but still please don't tell anyone!

When you save and close this file, SOPS will encrypt it automatically. You can have a look at its content now, SOPS added metadata referencing the KMS (ARN) used, a timestamp and some more.

Helmfile takes over

Now for the Helmfile part, let's add secrets: section to the release in helmfile.yaml, as follows:

https://gist.github.com/marianogg9/a372b99b85f76175ab363a44477750b7

As a result, Helmfile will merge values.yaml and secrets.yaml as one and then use it to populate the charts templates to be deployed (in charts/httpbin/templates/).

Let's modify the deployment template to make use of the new secrets, by replacing the environment variables values with references to secrets in secrets.yaml:

https://gist.github.com/marianogg9/60498790876f7835e61bd5993f8c7ec9

Now let's see what Helmfile tries to do (helmfile diff):

As you can see in the output, it is first decrypting the secrets file we specified: Decrypting secret /your_local_path/playing-with-sops/helmfile/secrets.yaml.

Ok, let's apply (helmfile apply) and then see what changed in the deployment:

Now both environment vars values were changed to the secrets (encrypted) values.

A step further for CI/CD

What if you are trying to implement this in a CI/CD pipeline where logs are printed on stdout and so are your secrets?

Well, then you might want to use a Kubernetes secret definition. In this case, helm secrets will avoid showing secret values in plain text.

Use a Kubernetes secret object to manage a sensitive value that you do not want to be printed in plain text in a CI/CD pipeline log.

You can create a new secret.yaml template definition in helmfile/charts/httpbin/templates/ (I included an example in the repo as well):

https://gist.github.com/marianogg9/643d844ed1f557b0e5270667f1f72156

The data section contains both secrets (from secrets.yaml) and encrypts them to allow Kubernetes API to accept their format.

Then you can reference these secret keys in deployment.yaml template:

https://gist.github.com/marianogg9/fbbfecf845f0c6a3f1ddbe76d41fc474

If we check what Helmfile does:

Now the values are effectively taken from a Kubernetes secret instead of directly from secrets defined in the Helmfile release.

Go ahead and run helmfile -e example apply to deploy the changes.

Let's now modify secrets.yaml values

sops secrets.yaml

I removed the last character from both my_var and my_other_var in secrets.yaml - it doesn't really matter the change, it is just an example. Save and close the file to have SOPS re-encrypt it.

And now let's see what Helmfile tries to do:

There are no secrets shown in plain text anymore!

Conclusion

You can manage secret values directly in Git now, without extra steps to fetch from remote. It is a simple concept and it is practical if you don't have that many sensitive values to justify an external platform setup for them. Of course, it is a possible tool for a given use case, it all depends on the context.

I found out about this way of managing secrets when I started working with Helmfile, it is pretty well integrated and just works. Simplicity, I like it.

Please have a look at all SOPS features.

Helmfile also offers secrets remote fetch natively as well.

References

Thank you for stopping by! Do you know other ways to do this? Please let me know in the comments, I always like to learn how to do things differently.

A non-green view of (tech) Conferences

Mariano González — Sat, 22 Apr 2023 13:45:09 GMT

I attended KubeCon EU 2023 in Amsterdam and I have been having this recurring thought of how it looked like from my not-so-interactive side.

Money, money, money

Even though I am a huge fan of listening to and seeing new stuff, more so if they are about IT and Cloud, I often feel awkward with sales pitches. Mostly because I never consider myself as the person who calls the shots moneywise in a company - I love playing with tools, implementations, fixing problems, give me a pain point and I will make it better eventually.

(big) Conferences are all about sales pitching and badge scanning to send an email afterwards or have your information in a vendor/3rd DB - which is expected and completely ok, I mean, don't get me wrong, it is a business after all. Open source is backed up by big and no so big corporations that spend a lot of money to set up these events and sponsor other big Organizations like the Linux Foundation or CNCF.

So, all in all, anyone that puts a booth at a Conference is paying a fee, hoping to get something out of it. How do they do it? well, it is basically with your information and, in many cases, a sales pitch that eventually leads to a sales meeting and in some cases to a product demo/purchase. They will talk about how their product could help your software lifecycle, make it more secure or completely take over your infrastructure overhead.

And then you get a sticker. Who doesn't love stickers, right?

Hint: they put up a sticker wall for a lot of Open Source projects

Colour system

But what happens with people who are not so into this (or any) interaction? well, in this particular case, the CNCF implemented a colour-based system of pins that you could stick to your badge and would be quick info to anyone about what kind of interaction you are comfortable with:

Green is "talk to me".
Yellow means "only whomever I know".
Red "please, don't".

Pretty good and well thought in my opinion.

My experience

How did I like it? It was great, got to spend some time with my teammates out of the office, met and had a drink with the guys from the CNCF Glossary Spanish translation group (to which I recently started contributing), attended very interesting talks and got my phone Chrome tabs full of new terms and tools to try out.

I also shortlisted 4 new things, some of which I heard for the first time:

And lastly, I (re)confirmed that I have A LOT to learn. There are tons of tools and technologies I do not know about (remember impostor syndrome?), and it's OK, that's where the fun is - you will probably never get bored working in IT.

Happy surprise(s)

One of the speakers at a WASM workshop overheard a comment a colleague of mine said and came by our table to talk about it. This guy was so passionate about the topic that it was fantastic to hear the arguments, and not only the evident reduced overhead but the reasoning based on low level OS and containerisation architecture processing times.

"Forget about , I want you to implement WASM because it will reduce your deployment and operational times by 10x and...". It was a delight to be there hearing.

My point is: you never know what you will see or who you will meet. And I was not even an active participant in the conversation 😂.

My suggestions for non-green people

If you, like me, are between green and yellow, how can you enjoy the most? (At least this is what I did).

Go to a talk where the topic is something you never heard of or maybe did hear about and never worked with.
Even though we are all kinda doing the same thing, it is never the same. You can always find someone doing something different and you can learn from that.
It is always better if you attend with someone you know, but if that's not the case, it is still worth the experience.

Some takeout links

Thank you for stopping by! Have you attended tech Conferences? Please let me know in the comments, I am always up to reading about different experiences.

Helm + declaration + environments = Helmfile

Mariano González — Mon, 10 Apr 2023 11:58:42 GMT

There's this lame Argentinian saying: "What's the difference between Flores and Floresta?" it's kind of a dad joke and it feels like I'm now closer to that humour.

Know when you read a lot about a tool that is kind of the standard but for some reason you never used it? That was me a few years ago right after I got my CKA, ready to start working on real production Kubernetes implementations and Helm kept popping up wherever I looked.

But I had not had the chance to use it in my playground yet, it was mostly about declaring resources in YAML, no templating... it was very newbie stuff.

A few months forward, I joined a new company where we used AWS and got to work on a super exciting migration to EKS from on-premises. We had multiple environments and a few applications with customised configurations to move, which required some kind of automation not only to deploy but also to make it easier to manage, support and further extend.

Here comes Helmfile

Helm is a Kubernetes release management tool built in Go, that facilitates the creation and management of objects (grouped in charts) via templating and creation of Kubernetes manifests.

And Helmfile is a declarative implementation of Helm. It offers a way of defining a set of (Helm) charts or standalone objects deployment along with their dependencies, secrets and values file separation.

It also offers a diff feature to see the eventual changes to be applied, template to render a given release, write-values writing out specific environment-bound values and a lot more (many based on Helm's native methods and plugins).

Quick walkthrough

As always, the theory is nice and all but we are here to see it working. So let's do that.

Prerequisites

A Kubernetes cluster.
- Can be anywhere, for this overview I will use Minikube deployed locally.

Installation

I am running this on a Mac, so the below commands are scoped to that. If you are using a different one, all of them have their versions for other OSs.

Clone the examples repository.

  git clone https://github.com/marianogg9/helmfiling.git  cd helmfiling

Minikube.
```
  brew install minikube
```
And start a Minikube cluster.
```
  minikube start
```
You can customise it a bit (e.g. by specifying a Kubernetes version) but defaults are enough (at the moment v1.22.3).

Kubectl.

  curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/darwin/amd64/kubectl"

Helmfile.
```
  brew install helmfile
```

General overview

I added an MVP installation here.

. charts    httpbin        Chart.yaml        templates            deployment.yaml            service.yaml            serviceaccount.yaml default-values.yaml environments.yaml example-environment-values.yaml helmfile.yaml values.yaml.gotmpl

charts/ contains a minimal httpbin Helm chart definition and all the resources' templates.
default-values.yaml is the default environment set of values.
environments.taml defines both default and example environments with their respective values files.
example-environment-values.yaml is the example environment set of values.
helmfile.yaml is where all releases are defined.
values.yaml.gotmpl uses GO templating to reference different values. This template will be used as a centralised definition to be used by different environments.

Creating a release

A release is where we declare metadata for the deployment. In a very minimal version:

Name.
Values (list of single values or files).
Chart (path to a valid local or remote chart).
Namespace (where to deploy the resources defined in the above chart).
Labels (optional, as a way of targeting a specific release object).

Please have a look at this example.

Adding a value(s) file

There are many ways of passing values to a release, one of them being a values file. This contains a set of parameters that will be referenced as {{ .Values.parameterX }} in a given chart resource template definition.

We could also set values in a standalone way as in:

values:- parameter_1: "something meaningful"- parameter_2: 13

Using a custom environment

By default, Helmfile will assume a default environment. So if we do not specify one, Helmfile will use a set of values defined under a default environment.

But what if we had more than one environment or logic separation or cluster and wanted to reuse as much as possible? we can make use of GO templating and a template values file.

Let's use one of those bad boys for this example then, creating a values.yaml.gotmpl file, referencing values by {{ .Values.parameterX }} notation (same as within a chart resource template).

Once we run Helmfile specifying a different environment, it will pick up that custom environment's set of values and populate the values.yaml.gotmpl with them, instead of using the default ones.

We use an environments.yaml file to tell Helmfile what values we want to use for that specific environment; see an example here.

Running Helmfile

The custom environment I will be using is called example, as defined here. If you want to use a different one, declare it in environments.yaml and use it in the following steps.

The -e flag tells Helmfile which environment to use. If none is specified, it will assume default.

Dry-run

$ helmfile -e example diffBuilding dependency release=example-release, chart=charts/httpbinComparing release=example-release, chart=charts/httpbin********************    Release was not present in Helm.  Diff will show entire contents as new.********************example-ns, example-httpbin, Deployment (apps) has been added:-+ # Source: example-httpbin/templates/deployment.yaml+ apiVersion: apps/v1+ kind: Deployment+ metadata:+   name: example-httpbin+ spec:+   replicas: 1+   selector:+     matchLabels:+       app: example-httpbin+       version: v1+   template:+     metadata:+       labels:+         app: example-httpbin+         version: v1+     spec:+       serviceAccountName: example-httpbin+       containers:+       - image: docker.io/kong/httpbin+         imagePullPolicy: IfNotPresent+         name: example-httpbin+         ports:+         - containerPort: 80example-ns, example-httpbin, Service (v1) has been added:-+ # Source: example-httpbin/templates/service.yaml+ apiVersion: v1+ kind: Service+ metadata:+   name: example-httpbin+   labels:+     app: example-httpbin+     service: example-httpbin+ spec:+   ports:+   - name: http+     port: 8080+     targetPort: 80+   selector:+     app: example-httpbinexample-ns, example-httpbin, ServiceAccount (v1) has been added:-+ # Source: example-httpbin/templates/serviceaccount.yaml+ apiVersion: v1+ kind: ServiceAccount+ metadata:+   name: example-httpbin

Applying

helmfile -e example apply

You can also check the status of the release deployment by:

$ helmfile statusGetting status example-releaseNAME: example-releaseLAST DEPLOYED: Fri Apr  7 13:58:58 2023NAMESPACE: example-nsSTATUS: deployedREVISION: 1TEST SUITE: None

Let's port forward port 8080 from the service we just deployed to our localhost.

First, check the deployed service:

$ kubectl get svc -n example-nsNAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGEexample-httpbin   ClusterIP   10.96.41.153           8080/TCP   4m46s

Port forward to localhost:7000:

$ kubectl port-forward -n example-ns svc/example-httpbin 7000:8080Forwarding from 127.0.0.1:7000 -> 80Forwarding from [::1]:7000 -> 80

Now access your localhost:7000 in a browser and you will see httpbin UI:

Updating and deleting

Let's say we want to update the default service port.

First, modify its value in the values file to 8081.

Then we can see the modification to be applied by running:

$ helmfile -e example diffBuilding dependency release=example-release, chart=charts/httpbinComparing release=example-release, chart=charts/httpbinexample-ns, example-httpbin, Service (v1) has changed:  # Source: example-httpbin/templates/service.yaml  apiVersion: v1  kind: Service  metadata:    name: example-httpbin    labels:      app: example-httpbin      service: example-httpbin  spec:    ports:    - name: http-     port: 8080+     port: 8081      targetPort: 80    selector:      app: example-httpbin

Now let's apply that and verify port-forwarding again:

$ helmfile -e example applyBuilding dependency release=example-release, chart=charts/httpbinComparing release=example-release, chart=charts/httpbinexample-ns, example-httpbin, Service (v1) has changed:  # Source: example-httpbin/templates/service.yaml  apiVersion: v1  kind: Service  metadata:    name: example-httpbin    labels:      app: example-httpbin      service: example-httpbin  spec:    ports:    - name: http-     port: 8080+     port: 8081      targetPort: 80    selector:      app: example-httpbinUpgrading release=example-release, chart=charts/httpbinRelease "example-release" has been upgraded. Happy Helming!NAME: example-releaseLAST DEPLOYED: Fri Apr  7 14:09:33 2023NAMESPACE: example-nsSTATUS: deployedREVISION: 2TEST SUITE: NoneListing releases matching ^example-release$example-release    example-ns    2           2023-04-07 14:09:33.374019 +0200 CEST    deployed    example-httpbin-1.0.0UPDATED RELEASES:NAME              CHART            VERSIONexample-release   charts/httpbin

Rechecking the service, we can see its port changed to 8081:

$ kubectl get svc -n example-nsNAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGEexample-httpbin   ClusterIP   10.96.41.153           8081/TCP   11m

And finally, let's port forward using the new port:

$ kubectl port-forward -n example-ns svc/example-httpbin 7000:8081Forwarding from 127.0.0.1:7000 -> 80Forwarding from [::1]:7000 -> 80

Which will enable us to see the UI in the same localhost port as before (7000).

This also will reconcile deleted values/resources, meaning apply will delete any values/resources that are no longer defined (present) in the release YAML.

E.g. let's delete (or comment out all content in) the deployment template.

If we run apply, it will delete the deployment from the cluster:

$ helmfile -e example applyBuilding dependency release=example-release, chart=charts/httpbinComparing release=example-release, chart=charts/httpbinexample-ns, example-httpbin, Deployment (apps) has been removed:- # Source: example-httpbin/templates/deployment.yaml- apiVersion: apps/v1- kind: Deployment- metadata:-   name: example-httpbin- spec:-   replicas: 1-   selector:-     matchLabels:-       app: example-httpbin-       version: v1-   template:-     metadata:-       labels:-         app: example-httpbin-         version: v1-     spec:-       serviceAccountName: example-httpbin-       containers:-       - image: docker.io/kong/httpbin-         imagePullPolicy: IfNotPresent-         name: example-httpbin-         ports:-         - containerPort: 80+Upgrading release=example-release, chart=charts/httpbinRelease "example-release" has been upgraded. Happy Helming!NAME: example-releaseLAST DEPLOYED: Fri Apr  7 14:17:01 2023NAMESPACE: example-nsSTATUS: deployedREVISION: 2TEST SUITE: NoneListing releases matching ^example-release$example-release    example-ns    2           2023-04-07 14:17:01.64583 +0200 CEST    deployed    example-httpbin-1.0.0UPDATED RELEASES:NAME              CHART            VERSIONexample-release   charts/httpbin

Or we could also run destroy to delete all resources defined in a release YAML.

helmfile -e  destroy

Keep in mind the above destroy command will NOT ask for confirmation!

Cleaning up

As usual, don't forget to clean up!

First Helmfile release(s):

$ helmfile -e example destroyBuilding dependency release=example-release, chart=charts/httpbinListing releases matching ^example-release$example-release    example-ns    2           2023-04-07 14:17:01.64583 +0200 CEST    deployed    example-httpbin-1.0.0Deleting example-releaserelease "example-release" uninstalledDELETED RELEASES:NAMEexample-release

Then Minikube cluster(s).

$ minikube delete --all🔥  Deleting "minikube" in hyperkit ...💀  Removed all traces of the "minikube" cluster.🔥  Successfully deleted all profiles

Conclusion

Much like IaC, Helm gives you a way of declaring a desired state of things. You define your Kubernetes objects, their dependencies, how they should behave and the values you need them to have - and every time you need to modify or recreate your stack, just run Helmfile.

Helmfile is an improved version of Helm, offering a declarative approach where you visualize all involved parties and their dependencies.

Reproducibility, versioning, templating, easy environment separation and a toolset of functions. A lot to play around with. Helmfile also implements secrets definitions from a variety of sources, have a look!

There are also other similar tools, Kustomize being probably the one I mostly read about but did not use yet.

References

Thank you for stopping by! Do you know other ways to do this? Please let me know in the comments, I always like to learn how to do things differently.

Troposphere: make CloudFormation legible again

Mariano González — Sat, 18 Mar 2023 11:20:39 GMT

Let's go back a few years. I'd just joined a new team in a new company, all the infrastructure was in AWS, and I had very little IaC experience at the time.

Some background

CloudFormation is AWS's Infrastructure as Code service by default and many other internal services use it behind curtains to perform certain configuration tasks and deploy their own components. It is a very powerful tool, it is fully integrated with a lot of AWS APIs and services, it is a no-brainer if you are (and you really should) managing your infrastructure via code, not using 3rd party tools.

But CloudFormation can be a bit too much when it comes to creating big stacks (as in infrastructure resources grouped in a logical way that makes sense application-wise. E.g. a load balancer, security groups, EC2 autoscaling groups, RDS databases and S3 buckets all serving a single application).

As CloudFormation defines the resources either in a JSON or YAML file, the template file will grow as you add more AWS objects. This, as you may imagine, will get messy and prone to time-wasting, eventual errors and most of all: difficult to read.

Of course, one could argue, why not decouple resources into function stacks? Sure, valid point - but everything depends on the use case. Even though you have the perfect tool for 98% of infrastructure topologies and ways of managing almost all examples you read about, there is always something a bit different. And that's where the fun begins.

It is worth mentioning, these were pre-CDK times and we had not fully implemented Terraform because..reasons.

So how could we define resources in a programmatic, readable (short!) and maintainable way? Yes, maybe you already guessed from the article title, Troposphere.

Troposphere

This is a Python library that creates AWS CloudFormation definitions (templates). These will then be used as descriptors sent to CloudFormation API to create and manage resources.

In short, you define a resource using a given class and it creates a definition (template) out of it that you can then use to pass to a CloudFormation API method (like create stack). Sound pretty familiar huh? CDK vibes?

A couple of great things about Troposphere:

Active community.
Property and type check built in (meaning it will output errors if a resource is malformed or badly initialized).
It is written in (and uses) Python.
There are a lot of implementation examples.

But not everything is roses: this is not an official AWS tool, which means it depends on community contribution to support any AWS service updates - so it can get behind, as expected.

Quick walkthrough

Prerequisites

An AWS account with at least a VPC in any region.
AWS CLI configured with IAM credentials allowing to create/delete EC2 resources (security groups, subnets and instances).
(option) An existing EC2 KeyPair, if you want to test out a SSH connection to the created instances.
Python >= 3.11.2 (that's the one I tested this with).

Installation

I always recommend using a (Python) virtual environment to avoid messing around with any other locally installed version and dependencies you may have..but yeah, it's a personal preference, not required.

python3 -m venv new_envsource new_env/bin/activate

pip install troposphere

Example template(s)

I added a couple of example scripts in this repo to showcase the functionalities.

git clone https://github.com/marianogg9/troposphering.git

There are two folders:

. README.md instances    instances.py    instances_input subnets     subnets.py     subnets_input

Each folder contains a script and an input file, where you will add your current AWS account values, such as:

VPC id.
Resources names.
CIDRs.
Region.
(optional) Common tags.
EC2 KeyPair name.
(optional) Your local IP CIDR (if you want to test out the SSH connection).
Etc.

Once you added the corresponding values, change the input files names to be: subnets_input.json and instances_input.json.

One important detail: "instances" stack is dependent on "subnets" stack as it references the subnet names from the latter.

Run the scripts to generate both templates:

cd subnetspython3 subnets.pycd instancespython3 instance.py

The first script will create a template along these lines (defining a set of subnets in a given input VPC + exporting the values to be used by the second template later on):

Outputs:  a:    Export:      Name: !Sub '${AWS::StackName}-a'    Value: !Ref 'a'  b:    Export:      Name: !Sub '${AWS::StackName}-b'    Value: !Ref 'b'Resources:  a:    Properties:      AvailabilityZone: a      CidrBlock: some-cidr      MapPublicIpOnLaunch: true      Tags:        - Key: Name          Value: subnet-a        - Key: Description          Value: Playing around with CloudFormation and Troposphere        - Key: CommongTag2          Value: Just to add one more      VpcId: vpc-abcdefgh    Type: AWS::EC2::Subnet  b:    Properties:      AvailabilityZone: b      CidrBlock: 172.30.124.80/28      MapPublicIpOnLaunch: true      Tags:        - Key: Name          Value: subnet-b        - Key: Description          Value: Playing around with CloudFormation and Troposphere        - Key: CommongTag2          Value: Just to add one more      VpcId: vpc-abcdefgh    Type: AWS::EC2::Subnet

And this will be the second generated file (creating an EC2 instance, in one of the above to-be-created subnets, and a set of security groups):

Outputs:  firstIntanceInstanceID:    Value: !Ref 'firstIntance'  firstIntancePrivateIP:    Value: !GetAtt 'firstIntance.PrivateIp'  firstIntancePublicIP:    Value: !GetAtt 'firstIntance.PublicIp' # we will use this value to test SSH connection  instanceNumberTwoInstanceID:    Value: !Ref 'instanceNumberTwo'  instanceNumberTwoPrivateIP:    Value: !GetAtt 'instanceNumberTwo.PrivateIp'  instanceNumberTwoPublicIP:    Value: !GetAtt 'instanceNumberTwo.PublicIp' # we will use this value to test SSH connectionResources:  defaultSG:    Properties:      GroupDescription: This is the default Security Group      SecurityGroupEgress:        - CidrIp: someOtherCidr          FromPort: 80          IpProtocol: tcp          ToPort: 80      SecurityGroupIngress:        - CidrIp: someCidr # You can add your local public IP CIDR to test the SSH connection          FromPort: 22          IpProtocol: tcp          ToPort: 22        - CidrIp: someOtherCidr          FromPort: 123          IpProtocol: tcp          ToPort: 123      Tags:        - Key: Name          Value: defaultSG        - Key: Description          Value: Playing around with CloudFormation and Troposphere        - Key: CommongTag2          Value: Just to add one more      VpcId: vpc-abcdefgh    Type: AWS::EC2::SecurityGroup  firstIntance:    Properties:      ImageId: ami-123      InstanceType: t2.micro      KeyName: YourExistingKeyPair      SecurityGroupIds:        - !Ref 'defaultSG'      SubnetId: !ImportValue 'AddingSubnetsWithTroposphere-someregiona'      Tags:        - Key: Name          Value: firstIntance        - Key: Description          Value: Playing around with CloudFormation and Troposphere        - Key: CommongTag2          Value: Just to add one more    Type: AWS::EC2::Instance  instanceNumberTwo:    Properties:      ImageId: ami-123      InstanceType: t2.micro      KeyName: YourExistingKeyPair      SecurityGroupIds:        - !Ref 'defaultSG'      SubnetId: !ImportValue 'AddingSubnetsWithTroposphere-someregionb'      Tags:        - Key: Name          Value: instanceNumberTwo        - Key: Description          Value: Playing around with CloudFormation and Troposphere        - Key: CommongTag2          Value: Just to add one more    Type: AWS::EC2::Instance

These YAML templates use CloudFormation ImportValue, GetAtt, Ref intrinsic functions to fetch and reference values either defined externally (like Subnet Names from the first stack) or locally.

Now use those templates to create the resources:

$ cd subnets$ python3 subnets.py # to create the YAML template$ aws cloudformation create-stack --stack-name AddingSubnetsWithTroposphere --template-body file://subnets_template.yaml

The stack name AddingSubnetsWithTroposphere is part of the instances_input.json, so if you want to use a different stack name, please remember to update it in the values before running:

$ cd instances$ python3 instances.py # to create the YAML template$ aws cloudformation create-stack --stack-name AddingInstancesWithTroposphere --template-body file://instances_template.yaml

Each of the above will output the CFN stack id, something like:

{    "StackId": "arn:aws:cloudformation:::stack/AddingInstancesWithTroposphere/"}

(optional) After a few minutes (to give some time for the instances to be ready), you can test the SSH connection by:

ssh -i your_key_pair ec2-user@

Where is an output value available in the AddingInstancesWithTroposphere CFN stack. You can check all output values in CloudFormation console > AddingInstancesWithTroposphere stack > Outputs tab.

Don't forget to clean up!

Delete both CloudFormation stacks, either via the console or using the CLI:

aws cloudformation delete-stack --stack-name
- It does not print any output, but you can always check in the console.

Conclusion

This was my first proper IaC tool deep dive experience, and although my teammates were the ones who implemented it from scratch, it was a great first step.

Looking back I think Terraform and mostly CDK have taken over this approach, but again this is a very simple concept, pretty easy to use once you get a hold of it. The Community is super active and they are open to getting help. If you know Python and like the tool, don't think it twice and open an issue!

This implementation can use a couple more iterations like automating CFN stacks creation directly from a parent script and adding some more code reusing into the scripts regarding mapping regions or AMIs and instance types. I will be updating the repo, stay tuned!

Last but not least as a suggestion, if you are working with CloudFormation or IaC on AWS resources, remember to always check the CloudFormation reference documentation on a given resource, e.g. for an EC2 instance. It explains which attribute can be updated on the fly without interruption (replacement) - it has saved me more than once.

References

Troposphere official docs.
- And its repository.
CNCF glossary: Infrastructure as Code.
AWS CDK.
Boto documentation.
AWS CloudFormation intrinsic functions.
Example implementation repo.

Thank you for stopping by! Do you know other ways to do this? Please let me know in the comments, I always like to learn how to do things differently.

AWS estimated charges - get notified in a few steps!

Mariano González — Sun, 12 Mar 2023 16:19:38 GMT

Whenever we deploy resources on AWS, we need to keep in mind that costs are something to monitor closely.

AWS offers a Free Tier, with the most used services for you to start exploring. And some of them are always free (within a given quota)!

If you are curious, here is the free tier offering vs quotas vs services.

AWS also offers a quick way of monitoring your usage and costs, which consists in enabling an alert whenever your forecasted costs reach a certain threshold using CloudWatch alarms and AWS SNS service (to send email notifications).

To add this feature, you will need to have access to your AWS Billing console and logged as a user with proper permissions or use the root account (not recommended).

Enabling notifications

In Billing Preferences > under Cost Management Preferences tick the box Receive Billing Alerts and save preferences. Also, tick Receive Free Tier Usage Alerts if you want to be notified about Free Tier usage.

Creating an alarm

Before starting, change your current region to us-east-1 (N. Virginia). This is where the Billing information is stored and represents worldwide charges.

In CloudWatch console, go to Alarms in the left pane and choose All Alarms.

Create alarm (top right).
Then Select metric.
In Browse, select Billing and then Total Estimated Charge.
Select the EstimatedCharges metric row and then Select metric.
For Statistics, select Maximum.
For Period, select 6 hours.
Next, for Threshold type, choose Static.
For Whenever EstimatedCharges is..., choose Greater.
For than... enter a threshold that will trigger this alarm. For this example, I used 2 USD as the value threshold.
- For reference, I have mine set to 5 USD and it is more than enough to play around with my implementations.
In Additional configuration, leave Datapoints to alarm default 1 out of 1 and Treat missing data as missing.
Next.
In Notification section:
- under Alarm state trigger, select In Alarm.
- Send a notification to the following SNS topic, you can select an existing topic or create a new one (or use a topic ARN in a different account).
  - (In my case, I created a new topic).
  - Give it a descriptive name.
  - Add an email account to send emails to.
  - Create topic.
    - You will get an email asking to confirm the subscription. If you don't receive it in 5 minutes, check your Spam folder (mine got filtered by Gmail).
    - If for some reason you cannot confirm it via email, you can do it manually in the SNS console using the link generated after you click on Create topic.
Next.
Add a name for the alarm, and an optional description.
Preview the alarm and click Create alarm.

Then you will be able to see the alarm.

First in an Insufficient data status, then OK in a few minutes.

If you click on the alarm name you will be able to see its metrics over time. Keep in mind it is fetching data every 6 hours.

FIN! You now have a reliable alarm checking every 6 hours for your estimated charges.

Conclusion

Keep in mind the threshold you set for the alarm is a forecast value, based on current and past usage in your account. AWS will calculate this value as an estimation of what could be the final number on your current month's bill.

The actual charge could be way less. To check on the actual vs forecast, please visit your Billing console.

References

Create Billing alarms with CloudWatch on estimated charges.
AWS Free Tier.

Thank you for stopping by! Do you know other ways to do this? Please let me know in the comments, I always like to learn how to do things differently.

Airflow in ECS with Redis - Part 3: docker compose

Mariano González — Fri, 03 Mar 2023 09:36:10 GMT

Previously in Deploying Airflow in ECS using S3 as DAG storage via Terraform, I described how to deploy all components in AWS ECS using a hybrid EC2/Fargate launch type and S3 as DAG storage.

Now let's do the same, but with three main differences:

Using docker compose integration with AWS ECS, which uses AWS CloudFormation behind curtains, instead of doing it via Terraform.
- Fewer hybrid tools, you just need Docker Desktop installed locally.
All components running on Fargate ECS launch type.
AWS EFS as DAG storage instead of S3.
- No need to worry about S3 mount drivers, docker compose natively integrates with AWS EFS.

Components

Deploy

Prerequisites

Docker Desktop.
AWS CLI configured in your local using a set of credentials with:
- Baseline required permissions.
- Additional permissions:
  - ec2:DescribeVpcAttribute
  - elasticfilesystem:DescribeFileSystems
  - elasticfilesystem:CreateFileSystem
  - elasticfilesystem:DeleteFileSystem
  - elasticfilesystem:CreateAccessPoint
  - elasticfilesystem:DeleteAccessPoint
  - elasticfilesystem:CreateMountTarget
  - elasticfilesystem:DeleteMountTarget
  - elaticfilesystem:DescribeAccessPoints
  - elasticfilesystem:DescribeMountTargets
  - elasticfilesystem:DescribeFileSystemPolicy
  - elasticfilesystem:DescribeBackupPolicy
  - logs:TagResource
  - iam:PutRolePolicy
  - iam:DeleteRolePolicy
VPC + subnets.

Security Group to be used in this deployment.

Create the Security Group.

  aws ec2 create-security-group --group-name Airflow --description "Airflow traffic" --vpc-id ""

It creates a default egress rule to 0.0.0.0/0. Take note of the output GroupId.

  {      "GroupId": "take note of this value"  }

Add internal rules to it.

Self traffic.

  aws ec2 authorize-security-group-ingress --group-id "" --protocol all --source-group "

Internal VPC traffic.

  aws ec2 authorize-security-group-ingress --group-id "" --ip-permissions IpProtocol=-1,FromPort=-1,ToPort=-1,IpRanges="[{CidrIp=,Description='Allow VPC internal traffic'}]"

(optional) Add a rule for your public IP to access ports 5555 (Flower service) and 8080 (Webserver service).
```
  aws ec2 authorize-security-group-ingress --group-id "" --ip-permissions IpProtocol=tcp,FromPort=5555,ToPort=5555,IpRanges="[{CidrIp=,Description='Allow Flower access'}]" IpProtocol=tcp,FromPort=8080,ToPort=8080,IpRanges="[{CidrIp=,Description='Allow Webserver access'}]"
```
NOTE: there is currently no way (natively) of avoiding CloudFormation to create a 0.0.0.0/0 rule in the SG for exposed ports in declared services. If you need to narrow down this access, you will have to delete the additional rules from the SG while docker compose creates the ECS services.

Steps

Clone the repo:

  $ git clone https://github.com/marianogg9/airflow-in-ecs-with-compose local_dir  $ cd local_dir

Set required variables in docker-compose.yaml:

  x-aws-vpc: "your VPC id"  networks:    back_tier:      external: true      name: ""

(optional) If you want to use a custom password for the Webserver admin user (default user airflow):
- This password will be created as an AWS Secrets Manager secret and its ARN will be passed as an environment variable with the following format:
```
  secrets:    name: _AIRFLOW_WWW_USER_PASSWORD    valueFrom: 
```
  - Add a custom password in a local file:
```
  echo 'your_custom_password' > ui_admin_password
```
  - Add a secrets definition block in docker-compose.yaml:
```
  secrets:    ui_admin_password:      name: _AIRFLOW_WWW_USER_PASSWORD      file: ./ui_admin_password.txt
```
  - Add a secrets section in each service to mount to:
```
  secrets:    - _AIRFLOW_WWW_USER_PASSWORD
```
  - Add the following required AWS Secrets Manager permissions to the IAM credentials you set docker context to use.
    - secretsmanager:CreateSecret.
    - secretsmanager:DeleteSecret.
    - secretsmanager:GetSecretValue.
    - secretsmanager:DescribeSecret.
    - secretsmanager:TagResource.
      - and please narrow down the above permissions to the secret ARN:
        arn:aws:secretsmanager:::secret:AIRFLOWWWWUSERPASSWORD*
Create a new docker context, selecting a preferred method of obtaining IAM credentials (either via environment variables, a named local profile or a set of key:secret):
```
  docker context create ecs new-context-name
```
Use the newly created context:
```
  docker context use new-context-name
```
(optional) Review CloudFormation template to be applied, via:
```
  docker compose convert
```
Deploy:
```
  docker compose up
```

Once the deployment starts, docker compose will show updates on screen. You can also follow up on the resources creation in the (AWS) CloudFormation console.

Web access

Get NLB name:

  aws elbv2 describe-load-balancers | grep DNSName | awk '{print$2}' | sed -e 's|,||g'

Or if you have jq installed:

  aws elbv2 describe-load-balancers | jq .LoadBalancers[].DNSName

Webserver: http://NLBDNSName:8080. Login with airflow:airflow or airflow:your_custom_created_password.
Flower: http://NLBDNSName:5555.

Running an example pipeline

I have included an example DAG (from Airflow's examples) in the repo. This DAG is also being fetched by airflow-scheduler task in startup time, so it will be available in Webserver UI.

As a prerequisite, create a PostgreSQL connection (to be then used by the DAG). In Webserver UI > Admin > Connections > Add +.

With the following parameters:

Connection Id: tutorial_pg_conn.
Connection Type: postgres.
Host: postgres.
Schema: airflow.
Login: airflow.
Password: airflow.
Port: 5432.

Then you can Test the connection, and if it passes, you can Save it.

Back to DAGs list, unpause the DAG process-employees and it will automatically run:

We can check the tasks being run (click in Last Run column link):

And have a look at Flower UI to followup on tasks vs workers:

FIN! Now you can start adding/fetching DAGs from other sources as well, by modifying the fetch step in airflow-scheduler startup script and running docker compose up again to apply changes.

Clean up

Once you are done, you can delete all created resources with:

docker compose down

Important: remember to delete EFS volumes manually!!
docker compose integration creates EFS volumes with retain policy, so that whenever a new deployment occurs, it can reuse them.
Please see the official documentation for more info.

Also, remember to delete the SG and custom rules.

First the rules.

  aws ec2 revoke-security-group-ingress --group-id "" --security-group-rule-ids

  aws ec2 revoke-security-group-ingress --group-id "" --ip-permissions IpProtocol=tcp,FromPort=5555,ToPort=5555,IpRanges="[{CidrIp=,Description='Allow Flower access'}]" IpProtocol=tcp,FromPort=8080,ToPort=8080,IpRanges="[{CidrIp=,Description='Allow Webserver access'}]"

  aws ec2 revoke-security-group-ingress --group-id "" --ip-permissions IpProtocol=-1,FromPort=-1,ToPort=-1,IpRanges="[{CidrIp=,Description='Allow VPC internal traffic'}]"

Then the SG.

  aws ec2 delete-security-group --group-id ""

Gotchas & comments

docker compose outputs are not very descriptive and only show one error at a time -> to understand AWS access errors, I used CloudTrail (Warning! heavy S3 usage).
docker compose sometimes fails silently using ecs context. If you happen to face this, go back to default context with docker context use default, try to debug the errors (now they will be shown on screen) and then go back to ecs context docker context use new-context-name. See this issue for more info.
servicediscovery:* permissions refer to CloudMap.
Workers and Webserver .25 vCPU | .5 GB is too few. Get it up to 2GB.
Don't forget to delete EFS volumes manually!! This docker compose ECS integration will define Docker volumes with retain policy, so they will not be deleted automatically with docker compose down.
service.deploy.restart_policy is not supported even though the documentation says it is.
You can configure POSIX permissions on the AWS EFS access points by adding volumes.your-volume.driver_opts as described in volumes section. Also, trying to figure out how to mount the /dags directory from AWS EFS correctly on the containers, setting rootdir and permissions is also allowed and supported.

Conclusion

This docker compose ECS integration allows rolling updates as well. Since it relies on AWS CloudFormation, you could get a working baseline version (e.g. PostgreSQL, Redis, Scheduler) and then add the Webserver or Worker services without having to start from scratch, or modify existing resources.

Simply updating your docker-compose.yaml locally and running docker compose up again will apply any new changes. This new run translates to a CloudFormation Stack update.

This integration gives a lot of flexibility and offers an abstraction layer if you don't want to deal with external dependencies. In doing so, it will assume and configure a lot of settings for you, which for some use cases will not be ideal. If this is your case, then it might make more sense to tweak the underlying CloudFormation template being generated.

Overall a great experience, with lots of unknowns and learnings. The documentation is not that extensive, it seems to be very practical as it won't go into deep details but gets the job done. Still some way to go, please check out this integration issues and enhancements.

References

Run Airflow in Docker.
Docker compose ECS integration.
This implementation repository.
Airflow on Fargate (AWS blog).
Run apps in ECS using docker compose (AWS blog).
Part 1 of this series.
Part 2 of this series.

Improvements

Add a reverse proxy https://airflow.apache.org/docs/apache-airflow/stable/howto/run-behind-proxy.html.
- HTTPS support.
Explore Fargate Spot.
Deploy in Kubernetes.

Thank you for stopping by! Do you know other ways to do this? Please let me know in the comments, I always like to learn how to do things differently.

Airflow in ECS with Redis - Part 2: Hands On

Mariano González — Fri, 24 Feb 2023 17:09:08 GMT

Previously on How to set up a containerised Airflow installation in AWS ECS using Redis as its queue orchestrator, I gave an overview of the infrastructure and Airflow components.

Now let's deploy all that.

This deployment will incur charges!!

Baseline

AWS ECS, with 6 services. Scheduler, Webserver, Workers and (Celery) Flower on EC2 launch type using a mix of on-demand (scheduler, webserver) and spot instances (workers, flower). PostgreSQL and Redis on Fargate launch type.
AWS S3 bucket as DAGs repository.
AWS EFS, as a backing storage service for PostgreSQL.
AWS NLB for Airflow Webserver and Flower services.
Terraform to create all resources.
- I did not use any modules for this version, but it's completely doable.

Prerequisites

Already created AWS resources:
- VPC + subnets (at least one).
- (optional) EC2 key pair.
AWS CLI configured in your local using a set of credentials with permissions to create (and delete) resources in your account.
Terraform (I'm using 1.3.0 at the time of writing this article).

Deploy

I have uploaded all the code I worked with to this repo.

Please keep in mind this will create resources in your AWS account that will consume credits and will be billed at the end of the month!

Clone to your local

git clone https://github.com/marianogg9/airflow-in-ecs.git local_dir

Fill in the required vars

local_dir/locals.tf

instance-type (for the EC2 ASGs).
ecs-ami (also for the EC2 ASGs launch templates, taken from AWS official documentation).
default_vpc_id (your existing VPC).
subnetsIDs (the existing subnets you want to deploy to, at least 1).
custom_cidr (your public IP, to access the Webserver and Flower services, and grants SSH access to - ECS - EC2 instances).
user_data_vars.region (AWS region where the S3 bucket is being created, used for s3fs-fuse mount configuration).
log_configuration.options.region (same as above, but for CloudWatch logs).
aws_key_pair_name (an existing EC2 key pair name. If left empty, SSH traffic is not allowed from outside the default VPC).

local_dir/terraform-backend.tf

fill in your S3 Terraform bucket, state file name and DynamoDB table.

local_dir/provider.tf

provider["aws"].region, the region you are deploying resources to.

Run terraform

Again, please check the resources being created and the billing details!

$ cd local_dir$ terraform init$ terraform apply

Once all of the above is deployed, you can get the NLB DNS name from nlb_cname from Terraform outputs.

Accessing Webserver UI

http://nlb_cname:8080

Give it a couple of minutes for the health checks to pass and authenticate with:

Username: airflow.
Password: you can check it in AWS Secrets Manager using Terraform output airflow_ui_password_secret_arn.

Accessing Flower (Celery queue UI)

http://nlb_cname:5555

Running an example DAG

Now, let's run an example DAG (from Airflow's pipeline examples), but instead of downloading the data from Github, we will create a local file with the input data to have the DAG import it into the DB. I included the example-dag.py DAG file here as well.

Let's populate the input data first, running the following in your local console:

curl -s  https://raw.githubusercontent.com/apache/airflow/main/docs/apache-airflow/tutorial/pipeline_example.csv > employees.csv

Then upload the data to the S3 bucket created before (use s3_bucket Terraform output):

aws s3 cp employees.csv s3:///files/employees.csv

Now upload the example DAG to the same S3 bucket, making it available to all Airflow components:

aws s3 cp example-dag/example-dag.py s3:///example-dag.py

After around 5 minutes, the DAG should be available in the UI. You might want to reload the browser to have it listed.

In the meantime, as a prerequisite, create a PostgreSQL connection (to be then used by the DAG). In Webserver UI > Admin > Connections > Add +.

With the following parameters:

Connection Id: tutorial_pg_conn.
Connection Type: postgres.
Host: postgres.airflow.local.
Schema: airflow.
Login: airflow.
Password: airflow.
Port: 5432.

Then you can Test the connection, and if it passes, you can Save it.

Now we need to unpause the DAG in the UI (and it will automatically run):

In flower UI, we will see the DAG task(s) being executed on the worker(s):

Also, we can have a look at the DAG tasks running in Webserver UI (clicking on the Last Run column link):

FIN! Now you can play around with other DAGs or start running your own pipelines.

Some gotchas and comments

While (re) implementing this solution, there were some forgotten details I (re) found on my way.

Modify health check retries for Redis container definition to =< 10.
Modify all containers environment variables to point to local service discovery entries (service.airflow.local).
- In the original Airflow docker-compose, there is a sidecar container running a service discovery resolution for local networking and DNS. I replaced this container with a hostname fix, from service to service.airflow.local in each container environment variables.
Add internal traffic rule (VPC CIDR, all ports) to the default Security Group.
- Also a big one! In the beginning, the NLB health checks against webserver/flower services were not passing. After a while of debugging, I realized there were no internal traffic rules in the Security Group attached to the ECS service(s) allowing the NLB to connect to the tasks. Yeah, it happens.
Give at least 512cpu + 1024 ram to Webserver task definition.
- Otherwise, it will start shutting down workers out of the blue, with no OOM errors (at least that I could see).
Add ECS_INSTANCE_ATTRIBUTES environment variable in the ASG launch template UserData to set a custom value and differentiate core (on-demand) and worker (spot) instances.
- And then use that attribute as a placement_constraint filter in each ECS service definition.
Ah, yes, s3fs-fuse driver. When trying to use iam_role flag, it only works by compiling from source using a specific version/commit: https://github.com/s3fs-fuse/s3fs-fuse/issues/1162#issuecomment-536864032 + https://github.com/s3fs-fuse/s3fs-fuse/wiki/Installation-Notes#amazon-linux.
- This one took a while. Turns out there is a bug (https://github.com/s3fs-fuse/s3fs-fuse/issues/1196 closed in Feb 2020) preventing the -o iam_role flag to work due to a recursive call to CheckIAMCredentials method.
  The problem is that running s3fs-fuse mount command with -f -o curldbg flags will just hang and not show any useful information. After a while of trying different flags, I gave it a try using a set of IAM credentials (access:secret); it worked instantly. Now this implementation is using version 1.84.
  v1.86 was released in Feb 2020, that's why I never faced this issue up until now.
- The working set of steps is included in the UserData script for both core and workers ASG launch templates.
In both core and worker ASG launch template UserData, I used https://developer.hashicorp.com/terraform/language/functions/templatefile to add external vars (like S3 bucket name and IAM instance profile name).
As done with the UI admin user, PostgreSQL connection credentials can be parametrized using AWS Secrets Manager secrets + env vars in the task definitions:
- For PostgreSQL tasks: POSTGRES_USER & POSTGRES_PASSWORD.
- All Airflow's container definitions: AIRFLOW__CORE__SQL_ALCHEMY_CONN by passing in a Secrets Manager secret ARN.
Many more Airflow configurations can be overwritten using environment variables.
To have a guideline for the resources and dependencies I had to create, I used the docker compose version of this implementation. This native Docker Compose integration with AWS using CloudFormation will give you a way of deploying directly in AWS from docker compose command line.
It also comes with a dry-run (docker compose convert) type of feature where you can get a look at the actual CloudFormation template that is to be applied. I used this template to create all the Terraform resources.
The official Airflow documentation example uses a DAG fetching the CSV input data from a GitHub repo. When I was running this implementation, I faced an issue where every task run would be (rendered and) passed to workers replacing airflow with ***. Since the get_data task uses an airflow repository to fetch data from, the used URL would get rendered with *** instead of the word airflow, making it invalid, which then made the task hang and fail.
In favour of practicality, I decided to replace that fetch with a local CSV as input.
You can always check the task logs in /opt/airflow/logs/ where all components will add specific run information.

Cleaning up

When you are ready to clean this all up, you will need to delete any files and folders in the S3 bucket manually first (otherwise, Terraform will complain about the bucket not being empty and fail with BucketNotEmpty: The bucket you tried to delete is not empty).

Then run:

$ terraform destroy

It will take a while to finish up, mostly because of the connection drain times on the NLB target groups vs ECS services and the EFS volume/mountpoints.

Improvements

Add a reverse proxy https://airflow.apache.org/docs/apache-airflow/stable/howto/run-behind-proxy.html.
- HTTPS support.
Deploy in Kubernetes.
Use EFS as DAG storage instead of S3.

References

Thank you for stopping by! Do you know other ways to do this? Please let me know in the comments, I always like to learn how to do things differently.

Airflow in ECS with Redis - Part 1: Overview

Mariano González — Sat, 28 Jan 2023 18:19:15 GMT

How to set up a containerised Airflow installation in AWS ECS using Redis as its queue orchestrator.

A bit of background

A few years ago I joined a Data team where we processed a lot of analytics information coming from online search engines. This ETL process consisted of three main stages: fetch raw data from external APIs, transform it into something meaningful for our applications and load that data into a database.

All this was carried on with in-house developed DAG scripts that were orchestrated via Airflow. The problem was that this intake was so lengthy and unstable that it was getting really painful for our daily performance.

Having previous ECS experience, I thought we could try improving the process using two main changes:

Decouple the Airflow controller from the runners, using ECS.
Add a queue orchestrator for improved parallelism that could shorten the processing times, using Airflow's native Celery executor (integration) with Redis.

What is Airflow?

Airflow is a DAG orchestrator, a platform that provides management for workflows.

You can schedule, monitor and even create workflows to execute a set of tasks in a predefined order and dependent on each other. Airflow will control these tasks execution while giving insights and a UI where this can all be monitored, among many other features.

Have a look at Airflow documentation for more information.

What is ECS?

AWS ECS is a container orchestrator to deploy, manage and scale your containerised applications. It integrates with a lot of AWS services, which makes it very flexible and easy to use if your infrastructure is already in AWS.

Check out a few ECS uses cases.

What is Redis?

Is an in-memory data store, used as a database, message broker, cache, etc.

AWS ElastiCache offers Redis as one of its available engines.

A bit more on Redis.

How everything glues together?

Remote execution using Redis

Airflow has many executor integrations. An executor is the mechanism by which tasks get run, either locally or remotely. So you can configure Airflow to have it locally executed - for small, standalone machine installations - or remotely if you are planning to run a considerable amount of tasks per workflow and have access to a pool of resources, like in this case, ECS.

One of the remote executors is Celery, which allows you to span out to many workers and supports different backends, such as RabbitMQ and Redis.

DAGs storage

In this decentralised pattern, every worker has to have access to the scripts (DAGs) to be able to run their assigned task. To do so, you can mount an S3 bucket - containing all the scripts - as a local volume to all ECS tasks, using drivers such as s3fs or s3fs-fuse, rexray/s3fs, etc.

This S3 access is possible by adding the required IAM permissions to the ECS tasks, allowing both scheduler and worker to interact with the mounted bucket.

Decoupling Airflow scheduler and the workers

The whole idea of this pattern is to separate the brain from the muscle. Using ECS services, you can create a service for the scheduler (controller) and another for the worker tasks, controlled by the executor.

The former would run with a minimum of 1 task to keep things running, along with access to its webserver (UI). The latter will scale up/down depending on the scheduler sent workload.

Some additional comments

The IAM credentials are fetched from AWS SecretsManager or Parameter Store and mounted as environment variables on each task.
The S3 mounting driver installation is added to each ECS EC2 instance userData so it gets installed, initialised and mounted.

Improvements

Use a git repository as DAGs datastore and have each runner download that repo in startup time.
Docker added an ECS integration for its docker compose. This is done by previously configuring Docker context to ECS.
- With this integration, there is no need to create ECS tasks definitions, you can directly work with Airflow's docker-compose.yaml file and create all infrastructure resources in AWS (via native CloudFormation integration).
  - One of the key differences is that it will use AWS EFS as filesystem instead of S3, so there is no need to mount buckets as tasks volumes - no need for external drivers either.
- This integration was added in 2020 (way after my experience), so I will also include this new deployment approach in the next posts of this series.
Explore Airflow's Kubernetes executor.

References

What's next?

I will be sharing a working example of the original implementation and an improved version using docker compose ECS integration.

Thank you for stopping by! Do you know other ways to do this? Please let me know in the comments, I always like to learn how to do things differently.

Dev Retro 2022: a Cloud Engineer perspective

Mariano González — Thu, 29 Dec 2022 10:45:39 GMT

This year has been pretty intense in terms of learning, from moving countries to getting used to a new way of working with (almost) the same tools.

Join me in my 2022 journey, when I discovered you do not have to dive into some new technology to keep on learning.

Jan-Feb-March

2022 started pretty high for me, I was living in Krakw, spent the holidays in Argentina and had the good news that the Kubernetes migration I worked on last year shipped successfully to Production. I was ready for the next adventure!

I am a person who really enjoys what I do professionally, and I feel privileged for that. I like it so much that I got a Big Tech Company round of interviews on my birthday (March) - those 5 in one day types of interviews. It felt pretty awesome, this is a company I am very curious about and want to work there someday. I got an offer from them, but I decided to decline in favour of moving to Amsterdam.

Apr-May-Jun

So I started to look for an Amsterdam based job, I participated in some interview processes and finally decided to join Quin. The offer was excellent, the people I met were super kind and professional, and the product/industry was new and attractive to me; so everything went perfectly.

Philip Myrtorp on Unsplash " class="image--center mx-auto" />

We moved in June with my family; it was a bit stressful at the beginning, naturally expected in such situations, with all the coordination, paperwork, flights, housing, etc. Fortunately, we had help from a group of fantastic professionals.

Jul-Aug-Sept

These were the highest learning months, a lot to take in, first three months in a new job. People to know, teams to get familiar with, the new product and the Infrastructure behind all that.

I know it can be difficult to keep up and sometimes it becomes a stressful situation. If you feel it is too much, please relax for a bit. Take a walk, go grab a tea. Nobody (should) expects you to be 100% productive in your first months.

The most challenging for me was getting used to the new approach to handling application deployments, specifically with ArgoCD. I never worked with it but it is a platform to manage Kubernetes deployments, among many other capabilities.

I was used to handling all deployments using Helm (Helmfile) with no other additional orchestrator. But ArgoCD offers an abstraction layer on top of Kubernetes and it structures declarative deployments and k8s resources, along with application lifecycles, resource pruning and a set of deployment policies to pick from. This gives a wide range of possibilities for you to adopt, depending on your use case.

If you want to know more, please check out ArgoCD documentation.

Along with that, I learned a new way of managing Terraform modules in a structured way. I plan to write about that as well!

I was also able to add value to the product by implementing security best practices for Kubernetes in AWS. That first contribution always feels awesome, it is like giving back a bit of the trust the company put in you.

I wrote my first blog post on IRSA. Check it out!

Oct-Nov-Dec

I always had the curiosity to share content about what I do, mostly because I enjoy helping people, and I have been googling and reading a lot (as we all do) to solve stuff or implement fixes on my daily job. So, why not giving back?

I got this idea from my current Manager - the importance of having people around you that pushes you forward, and more so if they are people in charge. I found Hashnode while googling about IRSA, so I decided to create an account and started getting familiar with it.

I gathered a list of topics that I specifically worked on in the past, and things I wanted to share. I read a lot of advice from content creators in Hashnode, which is a really good starting point; I think one of the huge advantages of being on IT is the number of resources, information and people willing to help.

And finally started to write blog posts. It is a pretty good feeling to be able to share experiences with fellow people and more with someone with no contact with technology.

I started my website as well. I was looking to create one as a presentation way of what I do, who I am and how to contact me.

This is my blog post about creating a site with Hugo & Blowfish on Gitlab Pages, check it out!

I also started a Udemy course in GO that I recommend. It is very well explained for beginners (in GO) like myself, and there are a lot of exercises and projects to go through.

What's next?

I would like to explore public speaking. There is a lot of good advice and many people share their experiences with it, so I think I will start writing about topics I want to talk about and getting reviews from my peers. I think I can get a grasp of how my confidence level before committing and starting practising with my family 😁.
I would also like to get Certifications in AWS and GCP.
Keep on learning GO.
Start mentoring people willing to learn or start in Cloud Engineering. I have created a profile in CodingCoach. If you are interested in talking about Infrastructure for free, please have a look!

Conclusion

I am happy someone suggested blogging, it was something I did internally with documentation and presentations, but never publishing. It feels scary at first, but I am getting used to being out there.

Even though technology could not change from changing jobs, there are plenty of learning opportunities. From a new approach, a new way of managing secrets, a different authorisation method, or even a new way of peer review. There might always be an opportunity to learn stuff you already know.

I am always open to discussing and talking about technology, Cloud, football or even living abroad.

Thank you for reading. Reach out, I would love to help in your Cloud journey!

Your website in minutes: Gitlab, Hugo & Blowfish

Mariano González — Tue, 27 Dec 2022 19:44:23 GMT

I was looking for a way to create my first website as a way to tell a bit about myself and my IT journey when I stumbled first on Gitlab Pages templates, and then Blowfish - a Hugo theme that will give you a flexible running website in a couple of minutes.

Let's create a sample website that can be later customised. I will be using my site as a guiding example for all the steps.

Some background

What is Hugo?

Hugo is an SSG, meaning static site generator, a free framework for creating static websites using CSS, JS and HTML. The best part is that you do not need to have any knowledge of those, just adding your content and configuring its appearance with themes is all you need.

Please have a look at the various features Hugo offers.

What is Blowfish?

Blowfish is a (one of the many) theme for Hugo, built on Tailwind CSS, that will give you a range of customisations including hero view, background images, dark mode, pages tree, rich content and a lot more.

Have a look at Blowfish variations and possibilities.

Gitlab managed

In my case I was looking for a Gitlab managed deployment, so I looked into Gitlab Pages, a way to deploy a static website using a simple .gitlab-ci.yml pipeline, including a custom pages: stage definition.

There are many possible ways of deploying a website using Gitlab Pages: you can either deploy it from scratch, you can use a template, or a forked project.

Here is a lot more on how you can deploy yours.

Tutorial

I chose to use an existing project template, and there are a lot of them here. I went with Hugo as it looked promising and I wanted to try it out, and I liked one of its themes (Blowfish).

Create using a project template

From your Gitlab home page, select New Project and choose Create from template option. You can select any of the 31 available templates, let's use Pages/Hugo > select Use template.

This will open a Gitlab new project screen where you can add your new project name, slug (URL friendly name) and its visibility level (given this is a standalone project - if this is part of a group, then it will inherit its group visibility).

It will take some time to import and once it is created, the project contains a copy of the original template files, and it is almost ready to use. Now it is time to set your project name and Hugo's default theme base URL (where all assets will be served from).

Change your project name

In your project Settings > General > Advanced > Change path you will need to change your project's path to match the Gitlab pages URL convention. In this example, I created a project named hugo-blowfish-website. I will change its path to be hugo-blowfish-website.gitlab.io as I am using Gitlab.com to create it.

Keep in mind, if you are creating a group project (like me), then the modified path is going to look like this: https://gitlab.com/your-group/your-project.gitlab.io, and the final published URL will be https://your-group.gitlab.io/your-project.gitlab.io.

If you are creating a standalone project, then the project path will be https://gitlab.com/your-username/your-project.gitlab.io and the published URL https://your-username.gitlab.io/your-proyect.gitlab.io.

One of the above published URLs is the one to be set in the next step.

Change the current theme's base url

Once you changed the project path, update the baseurl parameter in your project root's config.toml with the published URL. By default, it is set to https://pages.gitlab.io/hugo/, you need to change it to https://your-group.gitlab.io/your-project.gitlab.io or https://your-username.gitlab.io/your-proyect.gitlab.io depending on if this is a group/project or a standalone one.

Ready to deploy

The last step can be done either in your local workstation (previous cloning of the project) or directly in the web IDE in Gitlab. Either way, when you push the change, it will automatically start a CI/CD pipeline. To access it, in your project (left pane) CI/CD will open the CI/CD pipelines view.

If you click on the most recent run, you will see two stages: test that performs a quick test on the files to check for eventual errors and deploy Gitlab custom job, running hugo and using public folder as the content source. You can have a look at the pipeline definition in .gitlab-ci.yml file.

If the pipeline finishes successfully, it means the site has been deployed and you can access it in the configured baseurl. If you go to your project (left pane) Settings > Pages you will access your Pages menu, where you will be able to see the website link.

If this is the first time you are deploying a page in Gitlab, it will require a credit card to avoid abusive usage of this Pages functionality. Don't worry, it will not charge anything (actually it will, $1 is the default but they will revert the payment in a few minutes).

Now you can access your website! it is a vanilla theme, by default it uses Ananke, but we will customise it later on.

Use a custom theme

Let's say you (like me) would like to use a different theme, how can we do that?

Hugo has a lot of themes to pick and use, the one I chose is called Blowfish.

To change the default theme, you can follow the Blowfish theme installation steps in combination with the default instructions to use a custom theme from Hugo's Gitlab Pages template, where you have a quick start tutorial. Let's go thru those steps.

My recommendation is to clone the project to your local so you can work easier.

Change the hugo mod get command within .gitlab-ci.yml to be hugo mod get -u github.com/nunocuracao/blowfish .
Create config/_default/module.toml (new directory/file) in your project's root, containing the following:
```
  [[imports]]    path = "github.com/nunocoracao/blowfish"
```
Commit and push these changes.

This will create a new pipeline run, and when it finishes you can access your site again. It will show an updated Blowfish theme, but still with default content.

Add your content

Let's customise the theme and add some content.

Basics

First of all, we need to create a default set of configurations. We already created a config/_default directory, now we can start populating it.

Start by deleting config.toml from the root directory.

Then download a copy of the required files from Blowfish's repo and copy all *.toml into your config/_default/ directory.

Remember not to override the config/_default/module.toml created before.

Now update config/_default/config.toml:

baseURL = "https://your_domain.com/"languageCode = "en"

Where baseURL is the same as we configured before and languageCode is the default language we will be showing content.

Let's now configure the language settings in config/_default/languages.en.toml:

title = "Your website"[author] name = "Your name" image = "img/author.jpg" headline = "A generally awesome human" bio = "Just someone sharing content" links = [   { twitter = "https://twitter.com/username" } ]

These are going to be generally available across the whole site, in particular, the Home page - depending on the home layout we use. The image comes from files in assets directory. If left commented it will be picked up from a default in the theme. Also the links list is configurable, you can add more if you like (there are a lot to use in the default file).

In addition, if you would like to serve content in a different language, you can rename this file to match the language ISO code you would like to use.

Content

Now for content, there is a content directory in the root of your project. That is the place where we will be creating pages (sections) to organise content into. It is as simple as creating subdirectories, e.g. blog or about. These will be new pages, accessible via baseurl/page-name (baseurl/blog or baseurl/about).

To add content, we will be adding an index.md within each subfolder and Hugo will render them as "articles" or our well known README.md type of documentation. It supports markdown, rich content and many other features you can check in the documentation. Let's add two pages: about and blog.

There is a special type of file _index.md that will inject content as default whenever you place it. E.g. if we would like to add default content to the home page, it would be in content/_index.md. Same scenario for subdirectories. A deeper explanation and usages, you can find here.

Menus

Lastly for basic customisation, we have the menus (menus.en.toml). This is the file where we define the menu items for the header and footer. Here we can reference local pages (defined in content/page-name) or external links:

[[main]] name = "Blog" pageRef = "blog" weight = 10[[main]] name = "External" url = "https://github.com/nunocoracao/blowfish" weight = 20[[footer]] name = "About" pageRef = "about" weight = 10

Finally, remove default (Ananke theme) language content directories: content/en and content/fr.

Now we are ready to deploy a more personalised site, so let's commit and push our changes. After the pipeline run, it will look something like this.

Home page:

Blog page (baseurl/blog):

We can see both header and footer menus, accessing any of those will take us to the page and its content. The author section is added by default on every page (but Home) and External header menu section will take us to whatever link we added in its menu url parameter.

Further customisations

There is a lot more that can be done by customising Blowfish config files.

The main ones:

config/_default/params.toml to control pretty much everything that is shown on each page, along with the theme hero feature, the sections, taxonomy, background, etc.
config/_default/config.toml general customisation and behaviour.
config/_default/menus..toml menu customisation.
config/_default/language..toml language and content settings.

And much more. Please refer to Blowfish documentation for more.

Visibility

Remember this page is still under its project default visibility settings, meaning that if it is set to private, then nobody can access it unless they are members of your project/group.

To make it publicly accessible, there is an option in your project Settings > General > Visibility, project features, permissions > Pages. Setting that to Everyone will do.

(optional) Build locally

For debugging purposes, I find it very helpful to build the website locally and then push changes to deploy in Gitlab.

To do so, and after you cloned your Gitlab project, you will need:

Hugo.
Initialise Hugo:
```
  hugo mod init gitlab.com/pages/hugo
```

Add Blowfish theme:

  hugo mod get -u github.com/nunocuracao/blowfish

Add config/_default directories and files (as above described).
Comment out baseurl from config/_default/config.toml.

Then run:

hugo server

It will build all static files (compiled from any assets added and content) - it will create public and resources directories - and start a server (defaults to localhost:1313) where you can access the built site. You can then modify/add/delete content/configs and it will auto-reload the site.

Whenever you are ok with the changes, remember to add a .gitignore to avoid pushing to Gitlab resources, public or any files/directories Hugo created additionally.

Bonus: TLS, DNS and custom domain

Gitlab offers the possibility to set up a custom domain for your website. E.g. let's say you want to have this Gitlab Page accessible under your domain your.domain - it is doable.

You will need a custom domain you own and control (meaning, you can modify its DNS records).

Gitlab offers TLS out of the box, so you won't need to worry about getting a certificate for HTTPS traffic either.

How to set it all up?

First, let's create a custom domain in the Gitlab Pages section of your project Settings > Pages > New Domain.

Here you will add your domain, leave the Certificate feature enabled and click on Create New Domain.

Gitlab will then ask for your domain to be verified. Gitlab generates a TXT record that you will have to add to your DNS in your domain. This is how Gitlab checks you rightly own the domain you are trying to add as custom.

When you add the TXT to your DNS, it will become active in the Pages section and a certificate will be generated. Default HTTPS redirection is enabled for this Page.

Finally, you will need to add the domain ALIAS DNS record, also provided by Gitlab in the same Pages section.

Remember to update baseurl in config/_default/config.toml with your newly added custom domain. If not, the site will not be accessible as it will be expected to be accessible in an outdated URL (former gitlab.io).

Conclusion & references

Now you have your website, running on Gitlab Pages, built using Hugo + Blowfish, populated with your content, in a custom domain you own. Mostly for free (depending on DNS service).

It is a straightforward process involving a couple of moving parts, that hopefully can be done fairly quickly.

References

IRSA in EKS: a Kubernetes - AWS bridge

Mariano González — Sat, 17 Dec 2022 17:37:08 GMT

This guide assumes you manage your infrastructure with Terraform.

Here is an example IRSA implementation using Terraform and kubectl.

Background

Let's say you want to allow your EKS-hosted app to access an AWS service. You have a couple of options to do so, depending on the application:

Using AWS IAM credentials (key/secret) injected into your pods as Kubernetes secrets or via environment variables.
Have your pods use the AWS IAM EC2 instance profile (EKS nodes).

But how can you achieve the same in a secure and scalable way? IRSA is your friend.

What is IRSA?

IRSA stands for IAM Roles for Service Accounts. It is the method of linking an AWS IAM role with a Kubernetes service account attached to a pod. This method offers some advantages:

You specify the Kubernetes service account (and namespace) that has access and trust to assume the corresponding IAM role. No other service account will be able to do so.
You can easily track the access events in AWS CloudTrail for a specific combination of IAM role and service account.
You can make the access as specific and granular as you need via IAM (role) policies.

Every EKS cluster natively hosts a public OIDC discovery (unique) endpoint for your workloads to authenticate and access other AWS services, via AWS STS.

Ok cool, what is OIDC then?

OIDC is an authentication protocol based on OAuth 2.0. It is designed to offer a handover layer to allow authenticating users/services without having to maintain credentials.

Here you can find more info about OIDC.

Kubernetes supports various authentication strategies, and OpenID Connect is one of them.

What does the workflow look like?

Involved parties

The authentication is done using three main components:

SDK (client app) running in a pod.
OIDC discovery endpoint (hosted by EKS cluster).
(Kubernetes) service account.
- When a service account is created with an IRSA annotation, a JWT token is generated as a Kubernetes secret.
  - This annotation is the linkage with the IAM role it needs to assume: annotations: eks.amazonaws.com/role-arn: arn:aws:iam:::role/your-role
- When that service account is attached to a pod, it will automatically inject two new environment variables:
  - AWS_ROLE_ARN - the IAM role ARN.
  - AWS_WEB_IDENTITY_TOKEN_FILE - the path to the JWT token.
    - and mount the JWT token as a volume to that pod.

The process

When a Kubernetes service account (with the IAM role ARN annotation) is attached to a pod, the control plane EKS mutating admission webhook injects two env variables: AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE and mounts a volume containing an OIDC-generated JWT token.
When the client SDK (running in a pod with the service account attached) performs the AWS API call (e.g. eks:ListClusters), it under the hood sends a AssumeRoleWithWebIdentity request to AWS STS service, including both the JWT token and the role ARN it wants to assume to do the AWS API call (e.g. eks:ListClusters) from the service account annotation.
AWS STS validates the IAM role trust (assume) policy condition, which contains both the OIDC discovery ID and a combination of the Kubernetes namespace and service account name where the original API call is coming from.
If all checks out, it sends back a set of temporary AWS IAM role credentials to the client SDK.
Now the client SDK is using temporary credentials for an AWS IAM role, which allows access to a set of AWS resources defined in its policy.

Implementation

You will need

EKS cluster.
Kubectl cli.
Terraform.

What we will create

IAM side.
- role.
- assume policy.
- identity-based policy.
Kubernetes side.
- namespace.
- service account.
- job.

Create an IAM role + policies

Add the following Terraform code and run terraform apply. It will create an IAM role with two policies attached.

locals {    namespace = "irsa-sample-ns" # this is the namespace we will create within the EKS cluster.    serviceaccount = "irsa-test" # this will be the service account the job will use.}data "aws_eks_cluster" "this" { # get EKS cluster attributes to use later on.    name = "your-eks-cluster-name"}data "aws_iam_policy_document" "assume-policy" { # create an assume policy for STS + a combination of namespace + service account.    statement {      actions = ["sts:AssumeRoleWithWebIdentity"]      condition {        test = "StringEquals"        variable = "${replace(data.aws_eks_cluster.this.identity[0].oidc[0].issuer, "https://", "")}:sub"        values = [            join(":",["system:serviceaccount",local.namespace,local.serviceaccount])        ]      }      condition {        test = "StringEquals"        variable = "${replace(data.aws_eks_cluster.this.identity[0].oidc[0].issuer, "https://", "")}:aud"        values = ["sts.amazonaws.com"]      }      principals {        type        = "Federated"        identifiers = [data.aws_eks_cluster.this.identity[0].oidc[0].issuer]      }    }}resource "aws_iam_role" "irsa-role" { # create the IAM role and attach both assume and inline identity based policies.    name = "irsa-role"    path = "/"    assume_role_policy = data.aws_iam_policy_document.assume-policy.json    inline_policy {      name = "eks-list"      policy = jsonencode({        Version = "2012-10-17"        Statement = [{            Action = ["eks:ListClusters"]            Effect = "Allow"            Resource = "*"        }]      })    }}

As described, the assume policy will contain two conditions:

data.aws_eks_cluster.this.identity[0].oidc[0].issuer:aud: sts.amazonaws.com allowing AWS STS interaction.
data.aws_eks_cluster.this.identity[0].oidc[0].issuer:sub: system:serviceaccount:irsa-namespace:irsa-sa matching with the correct namespace + service account name.

And a federated principal pointing to the EKS OIDC provider. Those three combined are the logic involved to facilitate the final assume action.

The second policy (identity-based) is to allow the role to perform a eks:ListClusters API call.

Create a test namespace

kubectl create ns irsa-sample-ns

Create a kubernetes service account

cat < serviceAccount-eks.yamlapiVersion: v1kind: ServiceAccountmetadata:  name: irsa-test  namespace: irsa-sample-nsEoFkubectl apply -f serviceAccount-eks.yaml

Create a Kubernetes job

cat < job-eks.yamlapiVersion: batch/v1kind: Jobmetadata:  name: eks-iam-test-eks  namespace: irsa-sample-nsspec:  template:    metadata:      labels:        app: eks-iam-test-eks    spec:      serviceAccountName: irsa-test      containers:      - name: eks-iam-test        image: amazon/aws-cli:latest        args: ["eks", "list-clusters"]      restartPolicy: Never  backoffLimit: 0EoFkubectl apply -f job-eks.yaml

This job will create a pod using the official aws-cli image, in irsa-sample-ns namespace, with an app=eks-iam-test-eks label.

It will then run $ aws eks list-clusters and finish. In this first iteration, the job should fail as there is no explicit policy or permission allowing this job (pod) to run any AWS API calls.

Let's check its logs to see the error:

$ kubectl logs -n irsa-sample-ns -l app=eks-iam-test-eksAn error occurred (AccessDeniedException) when calling the ListClusters operation: User: arn:aws:sts:::assumed-role// is not authorized to perform: eks:ListClusters on resource: arn:aws:eks:::cluster/*

I have redacted the AWS account, region, EKS node group and EC2 instance IDs.

The aws-cli session running in the pod (SDK, in this case, Python botocore) does not have a linked IAM role giving permissions for the requested API call. It is using the irsa-test service account we created in this namespace, which does not have any IRSA annotations, hence it is using the EKS node (instance profile) IAM role.

Now let's annotate the irsa-test service account we created, with the IAM role that has the permission(s) the job needs.

kubectl annotate sa -n irsa-sample-ns irsa-test eks.amazonaws.com/role-arn=arn:aws:iam:::role/irsa-role

This annotation is the linkage between the service account and the IAM role we created before.

Now, let's delete the job:

kubectl delete -f job-eks.yaml

And redeploy:

cat < job-eks.yaml apiVersion: batch/v1kind: Jobmetadata:  name: eks-iam-test-eks  namespace: irsa-sample-nsspec:  template:    metadata:      labels:        app: eks-iam-test-eks    spec:      serviceAccountName: irsa-test      containers:      - name: eks-iam-test        image: amazon/aws-cli:latest        args: ["eks", "list-clusters"]      restartPolicy: Never  backoffLimit: 0EoFkubectl apply -f job-eks.yaml

Let's check its logs to see what happened:

$ kubectl logs -n irsa-sample-ns -l app=eks-iam-test-eks{    "clusters": [        ""    ]}

Now that we annotated the irsa-test service account, the aws-cli running in the pod successfully assumed the irsa-role IAM Role that has the proper permissions to perform the required API call eks:ListClusters.

If you have CloudTrail enabled, you can also see these in Event history > search for User Name = system:serviceaccount:irsa-sample-ns:irsa-test - you will find an entry for this service account performing the required sts:AssumeRoleWithWebIdentity API call.

And searching for Event name = ListClusters you will find:

A first entry for the EKS node (EC2 instance ID) trying to do eks:ListClusters via arn:aws:sts:::assumed-role// with an AccessDenied error code.
And then, a second entry for the aws-cli session (botocore) being able to successfully do eks:ListClusters by assuming arn:aws:sts:::assumed-role/irsa-role/botocore-session- .

Cleanup

Delete the job
```
  kubectl delete -f job-eks.yaml
```

Delete the service account

  kubectl delete -f serviceAccount-eks.yaml

Delete the namespace
```
  kubectl delete ns irsa-sample-ns
```
Delete the IAM role by removing the Terraform code and running terraform apply.

Conclusion

IRSA helps secure your workloads by allowing them to use temporary credentials instead of using static IAM keys created for users or roles.

Also if you were to use the default EKS node IAM role (EC2 instance profile), then you would have to include every single service permission for every eventual app (pod) running on that specific EKS node - each application running on EKS might need to access a different AWS service, ergo different permissions. This will make accesses difficult to track down and it will get harder to maintain.

If you do not want to worry about mixed services policies or maintaining IAM credentials, consider implementing IRSA.

Thank you for stopping by! Do you know other ways to do this? Please let me know in the comments, I always like to learn how to do things differently.