I find it interesting that one of their major pain points was data schema. After having worked at places that use plain json and places that used protobuf I can highly recommend anyone starting an even mildly complex data engineering project (complexity in data or number of stakeholders) to use something like protobuf, apache arrow or a columnar format if you need it.
Having a clearly defined schema that can be shared between teams (we had a specific repo for all protobuf definitions with enforced pull requests) significantly reduces the amount of headaches down the road.
Having a clearly defined schema that can be shared between teams (we had a specific repo for all protobuf definitions with enforced pull requests) significantly reduces the amount of headaches down the road.