Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Python packaging’s greatest challenge is 10 competing tools and standards.


Python packaging gets a lot of criticism. It's a meme. The thing is, it's actually improved dramatically over the years and continues to improve.

The problems it solves are very complex if one looks a little below the surface. It is solving different problems to the ecosystems that it's often compared to: golang, rust, java, js.


This is true, still: Everyday things that should be straightforward, or pythonic if you will, are way to convoluted and complicated.

As a python dev with experience since 2.6 I agree it has gotten better, but it is also rotten at it's core. The problems python has to solve there are hard, but this is why it should be priority number one to solve them elegantly and in a fashion so that 99% of python users never have to worry about them.

Right now packaging and dependency managment are my number one concern when I think about rexommending the language to beginners. The things I needed to learn just to figure out what is a good way of developing something with dependencies and deploying it on another machine is just way too much. When I was going through this there was no official "this is how it is done" guide.

Now there is poetry for example. Poetry is good. But there are still hard edges when you need to deploy to a system without poetry on it for example.


Well... The setuptools package was around for way too long.

I never managed to get data files packaged into a dist elegantly... annoys me to this day.


I wonder to what degree these complex problems are self inflicted due to more than a decade of flipflopping between a myriad packaging and environment management solution. In such an environment, you are bound to have so many different approaches between important projects that trying to bring them all under one umbrella becomes next to impossible. What is a python package is relatively well defined, but how you build one is completely up to you.

Edit: I've just read another comment which I think pointed out the most pertinent fact - that python has often served as mere glue code for stuff written in lower level languages. This then results in an explosion of complexity as a python package not only has to package the python part, but has to be able to build and package the lower level stuff.


I'd have to look at numbers to be sure, but I think that the number of popular packages that include compiled libraries is dramatically higher than it was 10 years ago, and that's about when wheels were taking over. The Data/Science stack has really exploded in that time, and it's very heavily skewed towards packaging c/fortran/rust libraries. Previously there was PIL, some database drivers, and numpy was just getting started. I think more of the big packages were pure python. The earlier egg/easy_install/msi/exe installers have all faded away, and now it's really just wheels and sdists.


What made it click for me is "Python packaging" means "how to install everything that I want to use from Python."

I wouldn't have considered "how to get BLAS onto a system" to be a "Python packaging" issue, but for people who want to rely on it via scipy/numpy/whatever, it is.


How is it very different from NodeJS? Cause I find npm way easier to deal with than Python packaging, and it's also dealing with native code. I used Python heavily for 6 years and still have no idea how the packages work as a user trying to install libs; I used to just thrash around till it works. I don't use it anymore at my new job.

The one thing I understand is npm installs everything locally by default (unless you -g), and in Python it's hard to stay local even if you use a venv.


> How is [Python] very different from NodeJS?

Unlike Node, Python is essentially older than modern package management. When Python developers first decided to tackle distributing their code, `apt-get` did not yet exist.

Early approaches which stuck around way too long let any package do more or less anything at install time, and didn't bother with static metadata (can't figure out what your deps are except by attempting an install!). Subsequent solutions have struggled to build consensus in an already vast and mature ecosystem. Backwards compatibility means compatibility with bad conventions.


That makes sense. Backwards compatibility can be a large burden. I'm still sad that they haven't managed to make a new thing that just works. NodeJS is repurposing a language and runtime originally meant for hacky web scripting, but it ended up ok in the end.

In general, if a community doesn't agree on how to fix a problem, someone else will provide a solution at a higher layer. Now it's common to install a whole Docker image to run some Python code instead of a few Python deps.


Not really. CPAN was already a thing when Python itself was released, long before there was any Python packages to share.


What did cpan.pm actually look like in 1993? Did packages ('distributions') have static metadata? Could the installer perform recursive dependency resolution? Was the resolver complete?

My impression is that none of that resembled contemporary CPAN, and isn't really what I have in mind with the admittedly ambiguous phrase 'modern package management'.

But I'd love to hear more! The history of package management is very interesting to me. Tales of ancient but sophisticated package management systems are very welcome. :)


Fundamentally the problems are the same, but the communities have very different priorities. The vast majority of Node is still web dev and the community focuses on dependencies that are deployed as JavaScript files (including transpiled code); this allows NPM to mostly ignore issues related to linking to native binaries and put more focus on those user-facing features you see and like. Python on the other hand has a lot of stakes on interacting with native binaries (scientific computing, automation, etc.) and with systems older than Node’s existence. This consumes up a ton of the community’s resource, and every new UX improvement also needs to take account of those use cases. If you’re ever tasked with keeping projects with native dependencies working over a moderate period of time in both languages, you’d gain a lot of respect on how well Python packaging works.


I don't know how common this is, but some NPM packages involve native binaries, so it must be doable. pg-native for example.


pg-native is a good example actually. Its readme lines out how you need to first get a compiler, libpq, and have some certain commands in your PATH. With psycopg2 (Python’s equivalent), the most common scenario is ‘pip install paycopg2-binary’ and you’re good to go.


So the difference is psycopg2-binary* bundles the libpq native code and pg-native doesn't? I'm no expert, but I think npm packages can include native code if they want thanks to node-gyp, only the node-libpq (which pg-native relies on) author seemingly decided not to package in libpq itself.

* Back when I used this, there was just psycopg2 which had the bin included.


Correct, but the challenges are to compile the dynamic library correctly so it runs on another machine, and on a given target machine choose to load a correct compiled artifact. Python has extremely good support for those compared to other similar ecosystems. The fact that most Python packages decide to do bundling on a comparably broard set of platforms, while packages on some other language are not, is a window to understand how the operation is made much easier by the ecosystem.


That's ... completely missing the point of the article. There's nothing about competing standards that solves the problem of the C-ABI and how to package non-python library dependencies.


At this point, there are 10 competing tools but no longer so many competing standards: the standards for Python packaging from 2015 onwards are PEP 517 (build system standardization), PEP 518 (using pyproject.toml to configure the build system), and PEP 621 (storing project metadata, previously standardized, in pyproject.toml). These standards build on top of each other, meaning that they don't offer conflicting advice.

The TL;DR for Python packaging in 2022 is that, unless you're building CPython extensions, you can do everything through pyproject.toml with PyPA-maintained tooling.


The problem is, as long as old python versions continue to exist, the competing standards also will continue to exist. From a user experience stand point it is horrible. Depending on the python, pip, or Setup tools version on a system the Install command might do something drastically different in each case. Often it’s not even clear what’s happening under the hood.


Many of the tools don't attempt to handle the challenges mentioned here


And 2 competing Python versions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: