My experience back in Netflix too. Elasticsearch (we didn't use the L or K) plus query engine on S3 with a catalog was more versatile and way cheaper than Splunk. Nowadays we get a slew of performant OLAP storages that can be used for log analysis as well, which further render Splunk unnecessary.
My experience at a big fintech I won't name: we had our own highly engineered in-house metrics system staffed by a big team. Custom pipeline, integrations in multiple languages, high resolution, custom aggregation and rollups. It was nice.
We also had in-house logging, exception tracing, alerting, service discovery, metrics dashboards, etc. It was all actually pretty good. All engineered by xooglers.
Someone (not to name names) got bitten by the "anti-weirdware" bug and started shifting us off of all our custom-built solutions. Every team got hit with major distractions from their roadmaps for each of these changes. None of the headcount dedicated to staffing the internal systems was freed up - they had to run the new integrations.
The decision was made one day to migrate all of our observability stuff over to SignalFx. Observability wasn't our "core competency" and our systems were "weirdware".
We had to rewrite our instrumentation, all of our reporting dashboards, and all of our alerting DSLs changed. They were not replaced 1:1 for every system and metric, so we emerged in a much worse, much less visible situation across the board. Outages happened or went unreported.
Splunk acquired SignalFx and dramatically raised prices. We scrambled to do the migration process yet again, impacting roadmaps and leading to more outages.
Leadership was changed.
There's one thing to be said about NIH, but when you write systems that are already working, inexpensive, and easy to maintain, you shouldn't throw them out because you're worried analytics isn't your "core competency". Yes - it is your core competency, because you're selling uptime to your customers.
Agreed. Costs plummet when you use S3 as the storage medium for these massive log data sets. I think S3 is much faster to query than most people realize. Just have to be smart about how you organize things.