> It wasn't an "we do this at scale" talk, but I'd love to see more experiments ...

awj · on Feb 14, 2017

> I can't upgrade my Elasticsearch 1.7.5 to the newest version, because then I would need to upgrade this small(ish) 4MB Kibana 3.x to a monstrosity that weihgs more than whole Elasticsearch engine

...is that really a good reason to reinvent this whole solution, though? You're basically saying you're going to spend the time to replace your entire log storage/analysis system because you object to the disk size of Kibana. (Which, without knowing your platform specifically, looks like it safely sits under 100 megs).

The rest of your complaints seem to stem from not having upgraded elasticsearch, aside from possibly hitting query scenarios that continue to be slower-than-grep after the upgrade.

Maybe I'm misunderstanding your explanation, but if I'm not this sounds like a lot of effort to save yourself tens of megs of disk space.

dozzie · on Feb 14, 2017

> ...is that really a good reason to reinvent this whole solution, though?

The system being dependency-heavy and pulling an operationally awful stack (Node)? Yes, this alone is enough of a reason for me. And I haven't mentioned yet other important reasons, like memory requirements and processing speed (less than satisfactory), elasticity of processing (ES is mostly query-based tool, and whatever pre-defined aggregations it has, it's too constrained paradigm for processing streams of logs), and me wanting to take a shot at log storage, because our industry actually doesn't have any open source alternative to Elasticsearch.

> Kibana. (Which, without knowing your platform specifically, looks like it safely sits under 100 megs).

Close, but missed. It's 130MB unpacked.

> Maybe I'm misunderstanding your explanation, but if I'm not this sounds like a lot of effort to save yourself tens of megs of disk space.

I'm fed up with the outlook of the whole thing. Here ridiculous disk space for what the thing does, there slower-than-grep search speed, another place that barely keeps up with the rate I'm throwing data at it (single ES instance should not loose its breath under just hundreds of megabytes per day), upgrade that didn't make things faster or less memory-consuming, but failed to accept my data stream (I was ready to patch Kibana 3.x for ES 5.x, but then I got bitten twice in surprising, undocumented ways and gave up, because I lost my trust that it won't bite me again).

Sorry, but no, I don't see Elasticsearch as a state-of-the-art product. I would gladly see some competition for log storage, but all our industry has now is SaaS or paid software. I'm unhappy with this setting and that's why I want to write my own tool.

einhverfr · on Feb 14, 2017

The problem you run into is "we need some more information that is in the logs but we didn't thin to parse before." Here PL/Perl is awesome because you can write a function, index the output, and then query against the function output.

One reason I always store full source data in the db.

dozzie · on Feb 14, 2017

> The problem you run into is "we need some more information that is in the logs but we didn't thin to parse before."

Agreed, though with liblognorm rules you just shove every single variable field into JSON field and that mostly does the job. And in the case you were talking about logs with no matching rules, liblognorm reports all unparsed logs, and my logdevourer sends them along the properly parsed logs, so no data is actually omitted.

renesd · on Feb 14, 2017

Thanks for the tip about liblognorm. Looks quite useful!

dozzie · on Feb 14, 2017

Oh yes it is. The rules syntax is nice and is a big improvement over regexps that are popular with almost every other log parser out there, but the best thing is that if your rules fail, liblognorm reports precisely what part of the log could not be consumed, not just the fact that none of the rules matched.

Liblognorm has only one major user: rsyslog, for which it was written, but at some point I thought that it would be nice to have a separate daemon that only parses logs, so I wrote logdevourer (https://github.com/korbank/logdevourer).