Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
New breakthrough brings matrix multiplication closer to ideal (quantamagazine.org)
192 points by bertman on March 8, 2024 | hide | past | favorite | 28 comments


As expected: closer, but not much closer. FTA: “we give a new bound of ω<2.371866 […]. Our result breaks the lower bound of 2.3725”

AFAIK the current theoretical lower bound still is 2, so that’s only 600-ish more such steps to get there.

On the positive side, I don’t think anybody believes that 2 even remotely is a tight bound.


> AFAIK the current theoretical lower bound still is 2 ... I don’t think anybody believes that 2 even remotely is a tight bound.

It's not "still". It's trivial to prove that 2 is the lowest possible value. Simply because you can't even read all the inputs in less than n^2 operations. So it's a tight lower bound for sure.


It's not a tight lower bound unless it's proven that no greater value could be a lower bound, and that's very much an open problem if I understand correctly.


What's the difference between doing that and just straight up proving what the exact value is?


Depending on the nature of the problem and the proof technique, sometimes it can be (from a little to much) easier to prove something is no greater than or no less than a certain bound than proving what exactly it is. For example it's trivial to prove using triangles that the square root of two is somewhere between 7/5 and 3/2 but proving exactly the value of root two is quite a lot harder.


Indeed - a classic «upping the game» is to prove two bounds, then determine an exact value because you somehow get the bounds to intersect. https://en.wikipedia.org/wiki/Squeeze_theorem?wprov=sfti1


Hell yeah, squeeze theorem! I was so enchanted when I first learned this one that everyone I knew also learned about the squeeze theorem.

Sorry, nothing substantial to add, I just got excited about it again


It cannot be harder, and could be a lot easier.

Lots (well, relatively lots; this isn’t a research area with millions of practitioners) of people must have been looking at that problem ever since Strassen showed the exact value is less than 3 in 1969 (https://en.wikipedia.org/wiki/Strassen_algorithm), and there hasn’t been progress on moving that needle, so I doubt it’s even remotely easy, though.


To prove the exact value, you generally have to provide an algorithm that (provably) fulfills that value, and also prove that no other algorithm can possibly do better


It's very distressing to see this insightful and helpful comment[1] grayed out because of a pedandic argument about what "tight" means. The HN community used to be so much better than this.

[1] I'm sure everyone who clicked the downvote button thought it was obvious, but not everyone here has that level of intuition about big-O analysis. This isn't a math forum, we're all hackers here, right?


What is positive about the best-case scenario being worse than we thought?


It would mean our best algorithms are closer to optimal than otherwise, which at least says positive things about our researchers.


Who knows? It could be n^2 log n, which when we're playing with the exponent basically is n^2.


Currently: 2.37187.

Not remotely close to 2.

What is the limit you would set for remotely close to 2?


The previous state of the art was 2.371866, this improvement brings it down by 0.000314. If you're moving increments that small 2 looks quite far away.


Related ongoing thread:

New Bounds for Matrix Multiplication: From Alpha to Omega - https://news.ycombinator.com/item?id=39630949 - March 2024 (22 comments)


I suspect the limit will be O(n^2xlog(n)). It might also be easier to work out the complexity by considering the size of the problem to be the number of elements in a matrix instead, so O(nlog(n)) which would then be equivalent to O(n^2xlog(n)) using the n we use today. I also find it interesting that the current exponents seem to be closing in on 2+1/e.


Can anybody who knows or understands better, how applicable is this new technique to being used in a GEMM implementation in a BLAS library in mainstream numerical libraries?


It's been a great many years since I've touched this stuff but two rules of thumb probably haven't changed in the meantime:

1. in general, algorithms tend to slow down on small problems in exchange for the asymptotic speedup -- so you probably don't want to use this algorithm for anything that fits in memory (or perhaps the solar system)

2. in specific, asymptotically fast matrix multiplication tend to be numerically unstable. So you probably don't want to use this algorithm unless you're working with infinite precision.


> 1. in general, algorithms tend to slow down on small problems in exchange for the asymptotic speedup -- so you probably don't want to use this algorithm for anything that fits in memory (or perhaps the solar system)

This is not necessary true and very much depend on specifics of an algo. Some improvements could be very applicable for quite normally sized tensors. For example, Winograd algorithm is quite often a routine choice for 3x3xc convolutions.

> 2. in specific, asymptotically fast matrix multiplication tend to be numerically unstable.

This should be investigated in each particular case of an algorithm, but numeric stability is indeed a major concern for a practitioner. I skimmed very superficially through the article but didn't notice mentions of that.


These tend to be galactic algorithms. Furthermore, I don't know that they do any work involving parallelism, or considering the narrowing gap between time needed for addition vs multiplication - typically only multiplies are counted.


The presumption that additions are free, when, in current hardware, there's not a dramatic difference between the cost of an add and a multiply, seems particularly damning.

These algorithms reduce the number of multiplies, but seem to increase the total number of floating point ops (including both adds and multiplies), unless I'm mistaken.


I know nothing but read the article. These improvements are about analytic solutions and have no practical use. At least that's what it said.


Exact mat-mat-mul is fine and all (congrats to the authors), but I'm much more interested in approximate mat-vec-mul. Specifically:

(1) Pre-process an nxn matrix m into f(m). Any transformation is allowed, but to work well on all matrices and input vectors it's not possible to rely solely on compression techniques (random sampling, low-rank+sparse approximation, ...).

(2) Compute an approximation of m@v via some computation g(f(m), v) in less than O(n^2) time (ideally being able to trade off accuracy for speed). Above a certain performance threshold, this necessarily means you only use a part of f(m) for any one vector v.

There's a zero-error O(n) solution that computes a hash table of all possible matrix-vector products, so I'll be picky and add the constraint that pre-processing time/space are "reasonable".


What will happen to the world if we found a way to calculate matrix multiplication that uses 0s time and 0 resource?


Well, 0 isn't really possible, but assuming you're just asking "What would the impact be on the world if me made matrix multiplication trivial" in the same way that people ask what making a room temperature superconductor would do for us.

The answer is quite a lot in computing terms, matrix multiplication is used everywhere, most notably at the moment, neural networks use almost entirely matrix multiplication, so their power consumption would drop almost entirely, and correspondingly we could scale them up enormously, your phone could run GPT5 locally as long as it had the storage space, high fidelity computer vision everywhere would become trivial, Google Glass might even become useful.

Previously very limited engineering simulations like weather forecasting would improve by leaps and bounds.

Basically everything would change all at once because we'd have effectively made p = np, any problem you can turn into a matrix multiplication (so basically most maths problems) would become solvable.

At the moment we use hardware like GPU's and TPU's in the case of AI to make matrix multiplication much quicker and the companies that make them have recently become some of the biggest in the world because it's so important to everything we do now to be able to multiply matrices quickly.


We would have broken some fundamental laws, because to multiply two n x n matrices, it takes n^2 time to write down the answer.

So the better question to ask is: What happens if we find a practical algorithm (and a theoretical approach) that lets us do matmul in only n^2 time? And the answer is - well, some important things get faster and we're able to solve larger problems in things like optimization, simulation, deep learning, etc., or save a lot of time and money doing them.

We'd go from about n^2.8 to n^2, which, let's say for a 1M x 1M matrix, is about 64k times faster. That's really nice. But the speedup for more common sized matrices is smaller - 256x faster for a 1024x1024 matrix.

It would be a very important thing that would cause us to re-examine the use of matrix multiplication as a primitive for more things, and would have very important practical implications - and at the same time, would also kind of look like 15 years of Moore's law, not that we're guaranteed to have 15 years of that. So, "big" but not necessarily "totally reshape the world".


Well, since matrix multiplication also contains regular multiplication (multiplication of 1x1 matrices) and addition (multiplication of 2x2 matrices of the form

1 x

0 1

Then you can do basic arithmetic with 0 resources.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: