Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This was a case of convergent evolution, both projects ended up working simultaneously on similar ideas.

One issue with using Arrow directly in NumPy is PyArrow exposes an immutable 1D array, while NumPy exposes a mutable ND array.

See also https://numpy.org/neps/nep-0055-string_dtype.html#related-wo...



Are the pandas people considering this as the default string type? Seems like it would be a slam dunk.


That is something I’d like to see but I don’t want to wade into the already very complicated discussion around arrow strings in pandas. If a Pandas developer wanted to take this on I think that would make things easier since there’s so much complexity around strings in Pandas.

That said there is a branch that gets most of the way there: https://github.com/pandas-dev/pandas/pull/58578. The remaining challenges are mostly around getting consensus around how to introduce this change.

If NumPy had StringDType in 2019 instead of 2024 I think Pandas might have had an easier time. Sadly the timing didn’t quite work out.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: