Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Python's Dask out of core dataframe can also do that.


Dasks' out of core dataframes are just a thin wrapper around pandas dataframes (aided by the recent improvement in pandas to release the GIL on a bunch of operations)


Uh, no they are not. They lazy- scale pandas to on disk and distributed files.

http://dask.pydata.org/en/latest/dataframe.html

"Dask dataframes look and feel like pandas dataframes, but operate on datasets larger than memory using multiple threads."

http://blaze.pydata.org/blog/2015/09/08/reddit-comments/


Why doesn't Pandas have anything to save the entire workspace to disk (like .RData). There are all these cool file formats like Castra, HDF5, even the vanilla pickle - but I don't see anything with a one shot save of the workspace (something like Dill)

Is this an antipattern for Pandas?


You haven't refuted anything I said. Internally the dask dataframe operations sit on top of pandas dataframes. All dask does is automatically handle the chunking into in-memory pandas dataframes and interpret dask workflows as a series of pandas operations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: