Lessons learned after losing the Terraform state file

Terraform is a infrastructure-as-code (IaC) tool where you describe the desired pieces of infrastructure you need in some config files, then Terraform will create the infrastructure in your cloud provider of choice. Or delete them, or modify them – because Terraform compares the cloud infrastructure state with the expected one. And the expected state is kept in… a state file, right. So if the state file gets lost, Terraform will think it never created those resources in the first place and will try to duplicate everything. Do you see the problem here? Not only your infrastructure will cost double, but you’ll get also all kind of nasty overlappings and cross-pollination between previous and fresh resources and versions. We definitely don’t want that.

That is of course only if you don’t pay attention to what terraform plan and terraform apply are trying to do. And that is of course only if you already lost your state file – which can happen quite easily when you keep the file like me in the project directory and you just wiped that in order to clone the project again (unrelated story). Terraform recommends keeping the state remotely, be it in their (free) Terraform Cloud, in an AWS S3 bucket, anywhere else, where even though still not 100% safe it will still have better chances of recovery.

How did the day went on? After I noticed terraform plan wanted to create a few dozen resources where I expected three at max… ew, I said, and it didn’t take long to figure out what happened. A quick google revealed there’s an official tool for such cases called terraform import which would import the current state of existing resources into the state file, nice stuff apparently. Been there yesterday, done that, got these takeaways:

  • always name your resources even when names are optional. Also comment them where comments are available (not many). The more info, the easier to recognize what comes from where.
  • you’ll notice you always have more AWS components than you thought. Especially AWS API Gateway is a can of self-multiplicating quick-growing worms.
  • tagging is seriously overrated, at least in AWS. Not everything can be tagged, searching by tags doesn’t really exist and anyway the AWS Tag Editor won’t be able to export (aka find) all your resources, so you will still need to go through each service console and fetch resource IDs to import. That’s because terraform import needs the IDs of the existing resources in order to import their current state (makes fully sense).
  • those IDs are in about half of the cases the known name of the resource. The other half is made out of all kind of crazy naming schemes, which I suspect are required by AWS itself. Sometimes it’s ID/name/scope, sometimes it’s the HTTP method, sometimes it’s the ARN… all nicely documented by Terraform but if you have to search documentation by EACH resource type and you have dozens you’re off to a less than pleasant afternoon.
  • surprise surprise, not all resources are supported by terraform import. Not the S3 bucket objects, not the permissions… and some things I simply wasn’t able to find in the AWS console (where’s the API GW deployment ffs???). So you can’t get to 100% coverage anyway.
  • after a while you’ll realize that it’s easier to delete everything which can be recreated and focus to import only the (hopefully) few stateful components, basically those containing data. So I kept only my Elasticsearch domain and everything else, byes. It’s a bit of AWS console work but by orders of magnitude simpler than the above steps.
  • speaking of stateful components, a good advice is to protect them from accidental deletion anyway when you apply some Terraform changes. Databases come immediately in mind. Tell Terraform to:
    lifecycle {
    prevent_destroy = true

    and you’ll be safe(r) against misplaced command line actions.

The best way is still to NOT LOSE your state file. Put it in another directory, sync it in OneDrive, upload it in Terraform Cloud or S3, do something with it, don’t be me! Just don’t save it in your versioning system or you’ll get a zillion useless versions of your software and your CI/CD (Jenkins or whatever) will go crazy.


One thought on “Lessons learned after losing the Terraform state file

  1. Pingback: Java Weekly, Issue 380 | Baeldung

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.