> clear problems when the original implementation is stretched in ways that weren't anticipated -- which is a common argument in favor of testing.
On this point in particular, I'm not sure of this is happens very often in data engineering (especially with data transforms), since I don't think tables experience the type of scope creep the way app components or APIs might. Once a table is shipped, almost all every subsequent change is either adding/removing columns (which I think should be written in a way that means it's impossible to change the grain), fixing bugs (in which case tests are not relevant), or internally refactoring for performance (tests can help, but usually only cover very basic issues).
The latter case is actually one where I think automated, generic testing would be helpful, but I'm not aware of any existing tools make easy? Ideally, I would want a test suite to run new and old versions of the code in parallel, and confirm that outputs are unchanged.
On this point in particular, I'm not sure of this is happens very often in data engineering (especially with data transforms), since I don't think tables experience the type of scope creep the way app components or APIs might. Once a table is shipped, almost all every subsequent change is either adding/removing columns (which I think should be written in a way that means it's impossible to change the grain), fixing bugs (in which case tests are not relevant), or internally refactoring for performance (tests can help, but usually only cover very basic issues).
The latter case is actually one where I think automated, generic testing would be helpful, but I'm not aware of any existing tools make easy? Ideally, I would want a test suite to run new and old versions of the code in parallel, and confirm that outputs are unchanged.