Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
S6: A standalone JIT compiler library for CPython (github.com/deepmind)
102 points by tvrvt on Sept 17, 2022 | hide | past | favorite | 36 comments


In someone lands here seeking a maintained compiler for Python, there's a lot, on top of my head:

- Pythran (https://pythran.readthedocs.io) (ahead of time compiler) - mypyc (https://mypyc.readthedocs.io/en/latest/) (ahead of time compiler) - cython (ahead of time compiler) - Numba (JIT) - Pypy (An interpreter doing JIT compiling) - Nuitka (Ahead of time, IIRC)

At this point we should build a "Awesome Python Compilers" repo ... oh wait, it obviously already exists: https://github.com/pfalcon/awesome-python-compilers


Do any of these JITs/compilers help with machine learning model building? (Mentioned explicitly in the s6 repo)

e.g. PyPy doesn't work with PyTorch https://github.com/pytorch/pytorch/issues/17835



that's a good point, but it doesn't have full language support like the ones listed above


Yes, try Numba.


We are working on a Python runtime as well: https://github.com/oracle/graalpython


Would recommend checking out https://github.com/facebookincubator/cinder


mypyc is alpha at best, I wouldn't put it in list of recommondations.


Mypyc is stable for what it supports. Mypy itself is built with it. Black is built with it. I use it in production for some things, but use Nuitka for others. That decision is mostly down to a project's dependencies, not the project's own code. The more magic a Python module has, the harder it is to build AOT.


numba can now also support ahead-of-time compilation, in some cases


(tl;dr: it's now archived and not under active development)

> We have stopped working on S6 internally. As such, this repository has been archived and we are not accepting pull requests or issues. We open-sourced the code and provided a design overview below to spur conversations within the Python community and inspire future work on improving Python.


I wonder if anyone at Deep Mind can share why they stopped working on this. Technical issues or a business led decision?


I'm not sure but there are several active parallel efforts to speed up python (faster-cpython, cinder, pyston). The faster-cpython project is doing 100% upstream development, and I'm not sure but it seems like cinder and pyston are trying to align with them, to avoid becoming unmaintainable forks.


There is numba that does this at least for python JIT and GIL disabling. I haven’t experimented with other use cases. It is actively maintained.


I'd guess it got something to do with introduction of Jax.


Nope


> S6 was aiming to do for Python what V8 did for Javascript

> Python is slow, and a lot of researchers use Python as their primary interface to build models.

Why not just build models in JS?


python has a large ecosystem of scientific, mathematical, and machine learning libraries. there is no real replacement for that in the js world.


Other than the fact those libraries aren't written in Python, they are only bindings to native code libraries that can be used from any language with FFI.


This is a common trope that does not really match reality.

Pytorch readme quote

  pytorch is not a Python binding into a monolithic C++ framework. It is built to be deeply integrated into Python.
could be applied to many key libraries of this ecosystem.

For sure, a lot of those libraries rely on tons of compiled code in C/C++/Fortran, but the python code is not mere wrapping on top of those libs. And most of the compiled code inside those repos has a lot of internal logic as well.

Some examples (using cloc on git main branch, as tokei does not understand cython):

  * sklearn: 150 kloc python, ~10kloc cython, not much of other languages
  * NumPy: 200kloc python, 350kloc for C/C++
  * scipy: 800 kloc C, 200 kloc python (the 800kloc surprised me and it looks like a lot of it is C generated code from cython)


PyTorch might be the exception, except not quite, as proven by the bindings selection on the website, where C++ and Java are also an option.


But pytorch is not an exception. Scikit learn, scikit images, statsmodels, matplotlib are not mostly implemented in compiled languages.


It's massive work to reimplement say, Numpy and Pandas in javascript, in part because the language is different, and in part because the API is huge. It's technically possible, but not something that's available right now (nor there seems to be any real interest/serious effort going on).

It's more likely that something like Julia ends up replacing Python, even though I still believe it's a low probability thing.


More work than JIT compiling Python?


There's over 200k loc of Python in numpy alone, so yeah, it's the same order of magnitude of work to end up with a credible alternative.


hmm, there's 1.6M lines of C++ in V8. And that's not a collection of simple mathematical functions - it's an entire compiler with many interdependent components. I think there's this weird but popular idea that people think it's easier to build a fast language than port a couple hundred independent functions.


Do you think this will start to change? I often see posts on this site bemoaning the performance of Python (often referencing the order of magnitude performance differences of similarly dynamic languages like JavaScript).

I'm hopeful that honest efforts to port over useful bits of the scientific/mathematical ecosystem to JS will start to take off someday soon.


I'm betting on julia as the replacement personally, it seems to be attracting a growing fraction of the scientific community.


Have you considered why CS101 and data scientists may be adopting Python over JS?

And have a look at, say, Julia.


>Why not just build models in JS?

Because ML/Data source has particular numerical requirements. JavaScript has poor mathematical defaults and still misses useful datatypes like float16*.

* Yes, this can emulated. Emulation is slow. You've gained some speed and then paid it back for emulation.


Non-trivial numerics are done through FFI in python. Python doesn’t support fp16, but a native library behind it does. The exact same premise applies to JavaScript


If you're doing all the interesting bits via FFI, Javascript JIT does not give you much aside from a lot of extra work. Why should we reimplement the glue code for marginal improvements? If you are doing stuff without FFI, than JS's inadequacies do matter.

IMHO, you can't beat Python for ML by being just a slightly faster Python (there aren't enough newcomers for that, the Python ecosystem is enough to assimilate them before other languages can). You need to offer something more significant to make migration worthwhile. Julia tries to do that by allowing to implement the main stack in Julia, if it had JS speed and DX it would have eaten Python's lunch.


err, how is JS inadequate for that stuff? Python does not have fp16 support natively, neither does JS. It also doesn't have fp32, int64, int8, int32, uint32 etc. JavaScript does?

JavaScript is 43 times faster for this random workload:

    (env) bwasti@bwasti-mbp ~ % time python bleh.py
    python bleh.py  9.27s user 0.06s system 99% cpu 9.352 total
    (env) bwasti@bwasti-mbp ~ % time bun bleh.ts
    bun bleh.ts  0.16s user 0.02s system 83% cpu 0.216 total
    (env) bwasti@bwasti-mbp ~ % cat bleh.py
    import random
    v = [0 for i in range(1000000)]
    for i in range(int(1e8)):
        v[i % len(v)] += 1
    (env) bwasti@bwasti-mbp ~ % cat bleh.ts
    const v = new Float32Array(1000000)
    for (let i = 0; i < 1e8; ++i) {
      v[i % v.length] += 1
    }

Try as many as you want and you'll see that it just blows python out of the water


I understand that V8 JIT is way faster than raw python. The proper comparison is against python+FFI though. My point was that rewriting mere glue code is apparently not considered cost effective by the community.


Correction: Data science.


To those downvoting brrrrm, he's asking in earnest and is working on his own JS ML library that looks very promising.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: