2500 instances could be millions per months in AWS costs. The smallest instances with some disks and bandwidth fees can push 100k a month.
Spending a fraction of that to monitor that sort of infrastructure is absolutely justified. I can tell you from experience that datadog gives discount even for 100+ hosts, I don't know what they can do for 2500, but if it were me I wouldn't accept anything less than 50% off.
Honestly you need to forgot about your salary, it's irrelevant when it comes to running a company. Imagine a driver in a shipping company deciding to deliver on a scooter rather than a truck because the truck is worth more than his yearly salary.
It could also be less than $100k a month (e.g. 2500 c5a.large with a 1-year reservation). At that point you'd wonder why your monitoring bill was 40% of your compute bill.
Also, of course their salary is relevant. The cost of an engineer's time is an important factor to consider when making build vs buy decisions. Usually it's one that argues in favor of "buy", but not always.
It's not millions, but it's the many multiples of hundreds of thousands. (and it's mainly GCP/bare metal).
I guess the point I am driving at here is that there's such a thing as "business critical costs" (IE: can we ship our product or not) which is the majority of infra costs we have today, and then there's "optimisation costs".
Usually when we discuss things like optimisation costs its along the lines of: "Will this product save us enough time to justify it's expense". Often, sadly, the answer is no.
Terraform Enterprise is an example of a time where we said: Yes. -- because the API allows us to deploy CI/CD jobs which provision little versions of our infrastructure, saving us many man-days of time in provisioning and testing every year.
As eluded to in the sibling thread, there's almost no way that we can save 3 or more peoples worth of time every year, we're 3 people right now and we have metrics collection, log tracing and alerting already. -- so it's a hard sell to the business types.
Great choice on GCP! It's probably half of the price as AWS for the same thing.
Monitoring and logging are business critical. It's an integral part of infrastructure and it is very normal to spend 10% there. It's really not possible to operate stably and efficiently at a large scale like that without a trove of tooling.
Tools usually justify their costs by allowing to optimize the infra and helping to prevent/fix outages, though not all companies care about stability or hardware costs.
And it's not a choice of free vs paid. open source software costs a lot of money too, pairs of large instances to run it don't come cheap, they're probably more than a salary too if the company wants to have any sort of redundancy or geographic distribution.
May I ask what do you have for logging? I guess you must be screaming in horror at the price of elasticsearch/kibana/splunk :D
Looking at the pricing page the cost $15/instance so $37,500
Middle-of-the-road developer salary is like $35k in most of Europe, outside of the capitals.
Although it does say: "Volume discounts available (500+ hosts/mo). Contact us." at the bottom, so I guess 500 is a lot.