This is exactly my go-to-move as well. pandas.read_hdf has beaten out ray.datafr... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		sfsylvester on March 4, 2018 \| parent \| context \| favorite \| on: Pandas on Ray – Make Pandas faster This is exactly my go-to-move as well. pandas.read_hdf has beaten out ray.dataframe.read_csv in terms of speed on the few files I've just initially tested now. But I imagine the programmable flexibility csvs have over hdfs (I've never used a Unix command to edit a hdf for example) is why this new approach could get some traction.

tavert on March 4, 2018 [–]

Try parquet if your data is tabular, pyarrow and related tools are getting parquet up to a pretty comparable speed to hdf5, with arguably more flexibility and a better multithreading story.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact