Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Local development and huge databases?
1 point by mdomans on March 6, 2014 | hide | past | favorite | 1 comment
Question for guys working on any data analytics projects - local development and working on features when the database we touch is huge. For some time now I have the problem of working with huge datasets in relational databases where having a local copy of such DB is simply not possible. I have a way of working around the problem by utilising either a mix of local editor and remote iPython or having some mocked data. That of course causes some problems with performance, since you can quite easily write slow code. Any ideas?

TL;DR: How to develop code when you need a huge database for work (more than 40GB)?



There are two approaches that I would consider. The first is simply to ask: does your development work absolutely require large test data, or might it be possible to reduce the size through more precise selection? And more to the point, if such is possible, is it possible with a reasonable amount of effort? This is such a common problem that my guess is you're not the only one trying to solve it.

If the size of the data set is essential to your development, then could you set up a development database server on dedicated machine (could be an old workstation/laptop) on the local network that could act as common infrastructure for the whole team? This would only work if your application supports connection to a remote database, but it would solve the problem of externalizing the need for a database and sharing that resource among your development team rather than having each of you running a big DB locally.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: