Exactly, this is because of the coupling with DAGShub as remote.
Full disclosure - I'm associated with iterative (DVC creators) - we have nothing to do with DAGShub I'm afraid. They don't consult or collaborate with us at all (to this date) about how to build or optimize their server or workflows for their "hub"
Looks like in those benchmarks Oxen.AI makes a misguided assumption that benchmarking DVC is (roughly...?) the same as benchmarking DVC<>DAGShub (server side made by a different company). To my understanding DAGShub is a bottleneck there. They didn't care to benchmark DVC against an S3 bucket or a similar cloud storage that is more widely used. I wonder if it's because DAGShub makes this whole setup wayyy slower
Oxen dev here - let me add some benchmarks for DVC backed by an S3 bucket. I did it awhile back and we were still faster, but agree it's a good benchmark to have.
Fundamentally even adding and committing data locally is slower, even before the push. But I agree the remote matters too.
This looks pretty cool! I'm gonna keep my eyes on this for more deployment options. The world needs a (good) open-source swiss-army-knife for ML serving
Iterative.ai (Series A, US based) | REMOTE, WORLDWIDE | FULL-TIME | OPEN-SOURCE
We're building Open source (and SaaS) dev tools for ML engineers and Data Scientists (MLOps).
We're the folks behind DVC.org (9K+ stars on GH), CML.dev (3K+ stars on GH), have a SaaS product (studio.iterative.ai) and more tools in the works. We're going for "the Hashicorp for ML and MLOps"!
We are looking for senior Python & Go engineers (backend or systems programming experience) and senior front-end engineers.
Please apply via these links (so we know you came from HN):