I'm one of the creators of Terratest. Happy to answer questions.
The main question I've seen so far seems to be how Terratest compares with various "spec" tools (e.g., inspec, serverspec). Most of the spec tools focus on checking the properties of a single server or resource. For example, is httpd installed and running? Terratest is largely for end-to-end, acceptance style testing, where you deploy your real infrastructure, in a real environment (e.g., AWS), and test the infrastructure actually works as expected.
For example, let's say you wanted to test a module for running Vault (https://github.com/hashicorp/terraform-aws-vault), which is a distributed secret store. With a spec tool, you might test a single Vault node to check that Vault is installed and the process is running. With Terratest, you'd check that the whole Vault cluster deployed correctly, bootstrapped itself (including auto-discovery of the other nodes), that you can initialize the cluster, unseal it store data, retrieve data, and so on.
I use Packer and Terraform extensively for my consulting company. I appreciate the work on trying to make Terraform/Packer testable, but I wish this tool was Golang agnostic and written in HCL or a markup language.
The biggest concern with Terraform is that small changes (ex: changing Terraform variable names) often causes rebuilds of large chunks of resources, which when in production is scary and sometimes cannot be applied without downtime. Point being, not sure testing (expected results) is the core problem. The problem is confidence destroying and editing resources will not produce unexpected downtime (modifying ELB's, RDS, IAM, or security group's) are examples.
It's weird to think of infrastructure/deploy automation in terms of "testing". Yes, you are always testing things, but not always in the "func test" sort of way. There's automation testing, and there's testing of automation, and there's tests that are part of automation. Terratest seems to be the second, but it's the third one I think is most useful.
If you build your infrastructure/deploy automation correctly, you should be able to redeploy your full stack all the time, and when it succeeds, throw traffic at it, and if you don't detect any anomalies, make that the new production service. On the detection of anomalies you simply move the traffic back to the previously deployed incarnation (or re-deploy the old incarnation, if necessary). For sufficiently large systems this gets more complicated as you can't just duplicate your resources, but the smaller pieces that have actually changed can be shifted around.
The idea of a "rollback" is really just "return to a previously known good state", but it's misleading. It was previously good before now. Now things have changed, and it might not still be good. So just as much as you can test newly deployed changes before you make it the production service, you should probably also test the previously deployed changes to make sure they will work again if pressed back into service. So, regression testing for infrastructure, I guess. (You'd do this if you were building a physical product like a network appliance to make sure your old appliances still work with newer software releases, but we rarely think of software-derived infrastructure this way)
I think the fear of changing infrastructure code is usually not rooted in bad practices, but rather immature tools and practices.
Speaking from experience, it's really hard to get infrastructure as code deployments as bulletproof as application deployments, because you're so dependant on the toolchain, and its interaction with the provider (AWS, Azure, etc).
And in some cases, it's impossible to actually do a clean infrastructure deployment without some manual steps, which leaves you wondering what new changes might need manual steps as well. 'Which problems have I not encountered yet?'
I'm not sure I agree the dependency on the toolchain & provider, I am always able to read what API does (vendor side) and how it has been implemented (tool side) and make a educated decision. Caveat, I do not use bleeding edge features of any cloud provider and unless engineering requires - I do not use PaaS features of cloud providers - which i find have roughest edges for infra-as-code.
Secondly I think that part of that issue is maybe that the wrong toolchain is being used. eg: Trying to use HCL (hashicorp configuration language - used in Terraform) as if its turning complete. (and i've seen this before)
I've not run into an issue where manual steps are required to cleanly deploy infrastructure which I can not automate away. I can agree on dependency chains, where you may need to run your infrastructure deployment in sequence so that eg your network is up, before you provision instances.
All this being said, I mirror my production environment in staging. So 99% issues are found there.
I think we are mostly on the same page, I just took "scared" more literally than you.
The missing piece in all of this is the possibility of a clean rollback. If all infrastructure could be described "as code" and is version controlled in Git or something you could roll back. That doesn't take into account the omnipresent state in databases and so on. I have yet to see a system or environment that doesn't save state in databases or similar systems.
Well, we are moving towards more statelessness in infrastructure, meaning the state is only stored in databases and the rest of the infrastructure does not (or should not) contain any state. So you can have immutable servers which you can replace and upgrade/downgrade at will with much less worry about state.
You still need to care about the databases but it is much more confined.
The main question I've seen so far seems to be how Terratest compares with various "spec" tools (e.g., inspec, serverspec). Most of the spec tools focus on checking the properties of a single server or resource. For example, is httpd installed and running? Terratest is largely for end-to-end, acceptance style testing, where you deploy your real infrastructure, in a real environment (e.g., AWS), and test the infrastructure actually works as expected.
For example, let's say you wanted to test a module for running Vault (https://github.com/hashicorp/terraform-aws-vault), which is a distributed secret store. With a spec tool, you might test a single Vault node to check that Vault is installed and the process is running. With Terratest, you'd check that the whole Vault cluster deployed correctly, bootstrapped itself (including auto-discovery of the other nodes), that you can initialize the cluster, unseal it store data, retrieve data, and so on.