Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is Spark still used anywhere? I haven't heard about it in a loooong time. At the time it seemed like a very nice abstraction, but while I played around with it, I never had a problem that needed such a complex solution.


Spark probably powers 90% of FANG batch data pipelines. Theres still lots of hive and Hadoop clusters out there that are just being maintained for reporting until they get funding to swap to some fancy DBT setup.

Spark is pretty much ubiquitous and the default solution for batch processing right now (especially pyspark). There’s also a lot of users of AWS Glue with spark.


PySpark instead of spark, but I had a job a couple years back using it in glue to generate financial reports. No longer on the project, but I'm pretty sure they're using it.

Honestly wasn't that bad of a model. But, then again, the job didn't actually need spark, someone just sold it that way before I was on the project. Fun to work with though


All over the places in cloud infra (I've worked for two large cloud companies). I keep hearing of more and more stuff being based around it and kafka.


Well, Databricks is likely to either go public or be bought by Microsoft for $40B or so in the next 12 months, so … yes?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: