> Their current training cluster would be the 5th largest supercomputer if Tesla stopped all real workloads, ran Linpack, and submitted it to the Top500 list.
which is trivial to do, and pretty much a must when bringing up the cluster to make sure its working properly, so much that most clusters do this on every maintainance, along with another bunch of benchmarks;
and that they say this:
> cost equivalent versus Nvidia GPU, Tesla claims they can achieve 4x the performance, 1.3x higher performance per watt, and 5x smaller footprint.
but have no MLPerf results, tells you everything you need to know about it.
The list of long-term hype-only AI-hardware companies with billions of dollars of VC investment and literally nothing to show is incredible and keeps growing.
Every MLPerf round, the list of companies that want to submit is "huge", and 1 week before the deadline, 99.999% of them have been saying "we'll submit next round" for years.
It's as-if people would spend billions on creating an F1 team, and then notice during pre-season training that the car can't even finish a lap. And then fail to even start a lap on every race of the season. And then do this again, year after year, for a decade. Burning billions and billions...
What's often overlooked is just because you have a shit-ton of compute nodes doesn't mean you could make it to the TOP500. You might have the compute power, but the system most likely doesn't have the connectivity. E.g. on the first slide it says this is distributed over more than three locations, which essentially guarantees that the system doesn't have supercomputer-like connectivity as a whole. Worth pointing out that the networking in a supercomputer is a very significant chunk of cost and power.
And if it is tailored for AI, it might not even do 32-bit float, or only at a fraction of the "AI FLOPS".
The more 3D renders in a presentation the more skepticism I develop. I noticed there is a direct leading indicator of a stock's price in relation to the quality of the 3D renders in the company's presentations/PR events. This is of course only anecdotal evidence but I'm pretty sure the hypothesis can hold it's ground against 50% of "ML Papers" today.
> Their current training cluster would be the 5th largest supercomputer if Tesla stopped all real workloads, ran Linpack, and submitted it to the Top500 list.
That line is referring to their current Nvidia A100 powered supercomputer which they set up with 5,760 A100 GPUs recently this year [0].
Read the previous line of the line which you’ve shared from the post:
> Tesla has been expanding the size of their GPU clusters for years. Their current training cluster would be the 5th largest supercomputer if Tesla stopped all real workloads, ran Linpack, and submitted it to the Top500 list.
Had to look into MLPerf as I don’t follow super compute.
But I don’t see them not doing that test as an issue as they made it quite clear that their entire system is tailor made, like an ASIC, to focus on neural nets and specific, relevant compute pipelines to what they care about. I would imagine dropping a generic, broad based ML benchmarking tool will not only perform suboptimally but also not be representative of what they’re trying to do. It’s not meant to be a general purpose ML super computer, it’s supposed to be a super computer to solve a narrowish niche of problems.
I only work on a lowly private cluster but running standard benchmarks is utterly routine here (it is in fact automated). As others with HPC experience pointed out, running benchmarks is pretty much mandatory when bringing a new system up, not just to ensure it actually performs as promised, but also to weed out bad and marginal components. You do one or two weeks of intense benchmarking and testing and you're assured numerous nodes will fail and need parts replaced. When you apply patches, you benchmark again. Why? Because the suppliers fix for "code XYZ crashes nodes" is probably "let's uh just reduce power limits by a few %". Or because of unintentional issues limiting performance. When a node crashes, you benchmark it again. Why? Because linpack and friends are good at making marginal hardware fail.
So it is literally unbelievable that Tesla not just stands up a cluster, but created their own hardware to do so, and didn't run any quotable benchmark and only has the theoretic FLOPS numbers for marketing.
> So it is literally unbelievable that Tesla not just stands up a cluster, but created their own hardware to do so, and didn't run any quotable benchmark and only has the theoretic FLOPS numbers for marketing.
They haven't gotten to this point yet.
They have a single tile (A node). It wouldn't surprise me if the tile they have is just a prototype as well.
You have to have a cluster before you can start running cluster benchmarks.
I don't get it - they're using it for actual work, rather than burning power to run useless benchmarks for bragging rights - and you think that makes it hype?! Surely it's the opposite - running benchmarks rather than doing something useful is hype.
You have to connect thousands of cables, have hundreds of nodes, with thousands of components, everything interconnected, and if you connect one wrong, the computer outputs incorrect results. You have to routinely update the software, and if a software upgrade introduces a 20% perf regression (which happens), then your 10 MWh cluster starts burning 2MWh for nothing. Or maybe your cooling system sucks, and after a minute of running at full capacity, you need to throttle your cluster to 0.1% of the peak to keep it cool enough that it runs "something".
That's why all systems in the top500 i've been involved with (15 or so) run these benchmarks as an integration tests on every single cluster maintenance (node updates, servicing, OS updates, etc.).
Submitting these results to the Top500 costs you nothing... if your cluster actually works. When you submit to the Top500, they ask for access so that they can re-run them themselves, which happens typically during / after the next maintenance to avoid impacting any users.
If they haven't submitted, 100% sure their cluster does not deliver what they say it should deliver on paper. Maybe it delivers 1% of it, or 0.01% of it (seen both cases in real life). If they haven't fixed it, then maybe it can't be fixed.
HPL, MLPerf, Spec, Stream, OSU.... these are not "benchmarks for bragging rights", these are tests that show that your system works.
> run these benchmarks as an integration tests on every single cluster maintenance
Why run someone else's benchmark and not your own application to test performance? And what's the point of submitting to Top500? Why do you care how your system ranks? What's the business or technical purpose in that?
For the same reason that we don't test engines during a race. Only what's under test should change, with the rest being fixed.
It would be much more difficult to adapt some of your own applications as a test. If the result are bad, how would you even know if it's the application's fault or a problem with the cluster? LINPACK on the other hand is well understood and there has been tons of work to make sure that it uses all the power your cluster can deliver.
When a company claims that their new vaccine cures cancer, but is against applying for FDA approval because "no reasons given", you seem to be the kind of person that camps in front of their offices overnight to get the first 10 shots, all in the neck.
Tesla is saying their hardware is 10x better than the competition.
I don't believe it. They don't publish any numbers, and everybody else does.
You believe them. Good for you.
I don't. I think not doing that is extremely suspicious, because if their cluster can be turned on, they have the results. So the reality of this is 100% certain that their results just suck, and they don't publish them to save face.
You believe their shit is so good, they didn't even test it while installing it. Good for you.
You think there is a masterplan to keep the results secret. Good for you.
I think their results are just horrible, and that's why they don't show them.
With any serious company, this wouldn't even be a discussion, because they would just show the verified facts about what their cluster can do or shut up about it if its really so secret.
You don't see the NSA or the DOD bragging around about being 100x better than "the competition".
To me they look forced to sell that they are doing something about these clusters to... drive hype up and get the stocks back up, and since their numbers are actually horrible, this is what we get.
But they've got no obligation to prove anything to random people on the internet. They aren't asking for peer review. They probably don't want to waste time and energy running a benchmark to compete in someone else's top N list. What's in it for them?
> But they've got no obligation to prove anything to random people on the internet.
They are a publicly traded company.
I'm a tesla investor and deserve to know.
If they are right and their hardware is 10x better than the competitor, then I'm happy. If they are wrong and they could have get 10x better outcomes by spending 100x less money, I'll be very pissed off.
> They probably don't want to waste time and energy running a benchmark to compete in someone else's top N list.
It costs no time. It's the first thing anybody does when building any cluster. It's part of accepting the materials from your suppliers to check that they didn't sell you shit.
I'm not an expert in securities law, but I don't believe being an investor entitles you to demand arbitrary proprietary information you want from a company. Otherwise people would buy one Apple share and find out what specs the new iPhone will have.
HPL is a stress test with a notionally useful output measurement - just about the most effective way of pushing CPU load to the limit, and it tells you what fraction of the theoretical maximum FLOPS you can actually achieve given the other system constraints like memory and network throughput.
Running benchmarks is not only useful but necessary. It's a point where planning meets the real world. Private clusters are also routinely benchmarked, often as part of validation.
In the end if someone just tells you their supercomputer "would" be in the TOP500, they just tell you that it is definitely not in the TOP500. There might be good reasons for that, but still, it's like claiming that you would win Olympic gold if only you'd bother to participate.
There might be some level of showmanship involved, but Tesla isn't selling those things, it is using them. Quite a different situation in comparison to a random startup which tries to convince investors to finance their products.
Taking everything at face value: Dojo is overall a very impressive project!
- Communication speed is one of the biggest bottlenecks to large models, so their bandwidth of 4TBps is very smart.
- They claim a 1.3x perf/watt improvement, which is not really that great for ASICs compared to GPUs. Perf/watt is probably the most important number in datacenters.
- They only use SRAM, no DRAM. This is a huge mistake, which limits their model size. You can only fit a ~10GB model inside a single tile, versus 80GB models for a single A100 GPU.
- Software / compiler stack is as or more important than the hardware itself, because it dictates how much real performance you can squeeze out of the chips. I think Tesla will need to heavily focus on this area before getting anywhere close to real-world GPU performance.
Overall, I imagine the project will have similar pitfalls to Cerebras.
> If their claims are true, Tesla has 1 upped everyone in the AI hardware and software field. I’m skeptical, but this is also a hardware geek’s wet dream.
How is the author declaring Tesla has one upped everyone in AI hardware and software while the article has exactly zero references to TPUs?
Also, these chips are as yet only in Tesla’s labs, not in production. What’s cooking in secret in NVIDIA’s labs right now?NVIDIA doesn’t have to hype their stuff, so they don’t tip their hand to competitors until it’s ready for sale.
Creating a lab beast is one thing, making it useful and economical in production is another.
In their presentation yesterday they made it clear that they have a full software stack for programming it. I don't know what is marketing hype and what is real, but they suggested it is very easy for them to program for it.
There’s no third party here. As far as I can tell Tesla designed the chips and wrote the software stack to work together. They don’t want to rely on third parties for their critical infrastructure and AI chip design is vital to their success. At least this is what I can grok as an outsider.
Right, I understand the claims, but as you can see with AMD there's a ton of work to actually be competitive, and AMD has a large team working on it. I'm just skeptical at the claim the the software is of any use to anyone but a small group.
I think the difference is that AMD needs to cater to a wide variety of customers, software stacks, and hardware platforms. Tesla has basically one platform and one customer (themselves).
Their software only needs to be useful to a small group - the autonomy team. I did not get the impression that Tesla plans to sell supercomputers, but that they are building these supercomputers for themselves to train and deploy AI networks.
You said "Even if the hardware existed, if you can't program it, it won't succeed." But it seems like the key customer - Tesla's internal Autopilot team - can already program it. So I just don't see any problem. They may choose to sell these systems (note that they have been shipping their Gen1 custom chip for years now and it is not for sale to the public), but the real plan for revenue is to succeed at AI and profit there. For that they need incredible computing power internally and in a portable platform for edge deployment, but they do not need to sell their chips as a general purpose compute platform.
My problem with articles like these is the sensational headlines that Tesla made an AMD/Nvidia killer. AMD and Nvidia could both make this chip if they wanted to, but dedicating all the die to matrix multiply cores is a waste for most end users. The media makes it seem like Tesla did something revolutionary here, but all they did was make a very, very targeted asic.
That makes sense. Certainly lots of journalism is bad. I haven’t read the article actually. I’ve just watched the two hour presentation by Tesla as well as their past presentations. I am a robotics engineer and I’ve been trying to understand how best to make an “animal like” brain system for an autonomous robot in the real world. I have been pleased with how much Tesla shared about their system and I think their extremely powerful hardware and their neural network approach is ideal for solving this problem. So I’m very happy with what they’ve come up with and I’m happy that it will push competitors to do the same as I think very large neural networks might be needed to solve general purpose robotics.
So all in all I think the chip is very good and I think they are on a path to success. Whatever the article says to hype it up doesn’t change the value of what Tesla has done in my eyes.
the how is: ML hype. Very few people can actually swim through the hype to the actual island, and in this case, the author isn't even capable of judging.
I find it surprising that google, amazon and now tesla can enter the « build your own processor » market that easily. Isn’t it more likely that they’re in fact only defining the general spec, and then use the services of other companies specialized in actually designing cpus ?
Tesla hired people who designed chips for Apple and Intel to design their chips. They are designed from the ground up for this application. It seems like they're doing all the chip design in house.
To design your own IC has at the same time become extremely complex and very approachable, if you have the money for it.
A multi-billion transistor CPU is an insanely complex product, there are multiple layers coming together to make it possible. The base layer is the foundry, here TSMC, which does the manufacturing and provides reference design kits. They contain for example the blueprints for transistors and other electrical components. For a digital chips like processors, they are usually used as they come from the foundry.
Next is a layer of software which enables the chip designs, provided by the big EDA companies. This is a rather huge layer, enabling the basic designs on the one side, but also including all kind of simulation and verification tools. And quite some engineering knowledge comes along with it.
So if you want to start designing your own CPU, you still need good engineers who know what they are doing, but large parts of the whole "stack" can and have to bought from the vendors listed above. This enables the quick entries of companies into the market, who were not traditional chip design houses.
Wow. All that packaging and power delivery must be incredibly expensive. Yet they claim massive TCO advantage over Nvidia. Amazing what you can do when not paying for Nvidia's 60% margins ;)
If this is as good as they say, they should spin this off and sell these. Probably won't.
i cant stop thinking about the tesla bot. was he he outright lying? is it just a ploy to recruit robotics people? is it even plausible?
i think the most challenging aspect of the idea is interacting with the world, picking up and handling various objects.
its obvious from the presentation that fsd is very good at placing itself in space and mapping out its environment as well as devising routes even when accounting for things like other moving objects. boston dynamics has built machines that take the input of a joystick and translate that into four-legged locomotion. so if you just took FSD and bolted that on top of a spot or atlas, you would have a machine that gets you a very long way towards tesla bot. something that can walk and navigate by itself. so the question is whether or not tesla will be able to recreate what boston dynamics has done and whether or not they could do that on the time-scale that was insinuated by musk. the last time i checked, boston dynamics does not use any NN in their robots.
and then theres the question of interacting with the world. i think its a fair assumption based on ai day that fsd would be able to create a very rich and accurate map of its surroundings. so that half of the question is taken care of. but how could they get it to grasp a drill and manipulate it in an intelligent way? the implication is that they would train a NN to provide that signal. but how are you going to train that? they have thousands of cars out in the world generating tons of driving data. where are they going to get data for performing these tasks? are you going to train it on every task that might be asked of it? its hypothetically possible but thats hard data to generate and label. it would take a really long time at best, and even then it would be an experiment, just as likely to fail as succeed.
it seems to me that this must be a stunt where the intention is to build something that can walk around but not manipulate things or do anything useful. i would change my mind if musk came out with an explanation of how hes going to train interaction.
The last commercial anthropomorphic was the Willow Garage PR2 back in 2010. It weighed 600 pounds, and had a wheeled base. Each arm had a max payload of 4 pounds. It cost $250,000. The company went bankrupt because there wasn't anything you could do with it.
The tesla bot is supposed to be bipedal, only weigh 125 pounds, and have a "arm extend lift" of 10 lbs. Is that per arm, or both together? Even BD Atlas weighs 196 pounds, and it doesn't have hands!
Like, any one of their numbers would be exceptional. All together, and you are definitely sacrificing something. Either onboard compute is minimal, or it has a battery life measured in dozens of minutes. Something.
They claim a deadlift of 150 lbs. I don't want to say "impossible!", but... difficult? Just holding 150 lbs with any of the robotic hands you can buy on the market today would be hard or impossible.
None of the five finger hands today are as compact as shown in the concept renders. If it does ship with a five finger hand, (stupid, pointlessly expensive) the forearms would be far bulkier. An easy bet is that the first gen will ship with a three finger gripper.
That is, if it even ships at all. BD Handle is a much better design for a humanoid-ish industrial robot. It's going to be simpler, faster and lighter for the same payload as this hypothetical Tesla bot.
i think this comment illuminates the issue. seems like an industry veteran (unconfirmed) thinks having the thing walking around is plausible but is confused about interaction.
Even assuming the robot existed, which it doesn't, (Maaaaybe it was tethered, so it didn't have to carry around batteries and a hydraulic pump) I'm quailing at the task of programming the thing. To match human performance would be decades of work for thousands of programmers.
(The robot picks up a screwdriver, and a screw. Then the screw slips out of its fingers. Now what? It lines up the screw and the screwdriver. It applies torque. The head of the screw strips out. Now what? The pain of robotics is that you need to cover each and every little error case, because if you don't, the damn thing doesn't work, because it has no brain! This is why every industrial robot is massively overbuilt, and it's environment and fixturing is carefully simplified and fenced off, because error handling is such a pain in the real world, where a dropped item bounces away and hides under a bench or in an orientation where your gripper can't pick it up. 1 in 1000 is too high of an error rate. 1 in 10000 is too much. It has to function perfectly, every time, every grasp.)
And the maintenance costs! Mechanical humanoid hands are terrible end effectors. All little moving parts and lousy tolerances. They would need constant repair and replacement. It couldn't possibly be cheaper than a human in 2021 or 2030.
> Either onboard compute is minimal, or it has a battery life measured in dozens of minutes.
For industrial applications you could probably have some sort of novel power system, like a tether from the ceiling or special floor that delivers power through the feet.
Haha, don't know where the downvotes are coming from, this is the absolute truth right here.
He knows the world idolizes him as a slightly eccentric genius engineer with a heart of gold, and I think there's probably some truth in that, even. Nevertheless, if you are smart enough to be a good engineer, it doesn't take much to realize that you can greatly expand your aura by occasionally trolling people with something stupid just to see if you can get away with it. The fans will adore him more, the critics will pounce, but the net result is more exposure, and more exposure gets investment money. Rinse, repeat.
Personally, I think one of the smarter touches is to release/leak information that looks bad shortly before releasing information saying that the problem was overcome. This constantly reinforces the impression that anything Musk-related is always overcoming impossible odds. "Andrej didn't think we could do it, what do you think now, Andrej?" "The Starlink terminal costs $2400, but just a few months later, it costs $1000!"
yes im aware. that thread is a day old and probably quite dead. i post here because i am interested to read peoples responses. and its certainly related since tesla bot will use fsd and be trained on the hardware discussed in the article.
to be honest FSD seems to be coming along albeit behind schedule. ive watched videos of the fsd beta and it is the closest thing ive ever seen to a car driving itself in an uncontrolled environment. i wonder what you make of dirty teslas latest driving video on youtube?
I find it some what odd that you don't consider Waymos taxi service to be in an uncontrolled environment. Yes, it is an incredible well mapped environment but so are the highways Tesla drive on.
listen im trying to be objective about this but i cant help but point out that fsd is navigating crowded intersections, crowded downtown areas and roads without lane markers. ive seen videos of it
i dont know why people downvote the shit out of me whatever i post. it seems like HN has become a cesspit of idiots... anyway i was very interested to see my comment validated by this guy
I don't know about cesspits & idiots, but a little smell can develop that does tend to bring the intellectual level toward the below-average rather than above.
For those that want to push downward, the present system highly leverages their ability to do so simply through the process or habit of frequent downvoting.
If there is any perception that there was a time when there was very little detectable smell by comparison, it would be good to assess the average downvote-per-user rate then, and compare it to today's figure.
Then measure each user along this scale, perhaps including a time component, or relative to activity in some way.
Allowing for a reasonable standard deviation, it might be better if frequent downvoters past a certain range had the weight of each downvote normalized and see what happens.
This could possibly also be tuned to achieve a target level of discourse relative to a previously-considered-desirable data point in time.
Alternatively, users alone appear theoretically able to overcome the issue if there was a widespread concerted or random effort to frequently upvote the comments or postings seen descending, whether fully deserved or not, keeping them at least neutral without having a negative effect on the commenter's rating.
Mathematically a small uptick in "compensatory upvoting" habits among average users could bring the target way up as long as the overly-frequent downvoters are in the vast minority.
Then when there is true downward consensus it will still always drop through, but those who participate mainly to downvote will have less negative impact.
The only thing worse than the "nattering nabobs of negativity" are the non-nattering nabobs of even worse negativity.
Now assume that Moore's law stays healthy and in a few decades we're carrying around dojo-equivalent smart phones. Or implants. What do our apps do with all that compute? Detailed convincing AR, accurate real-time translation, identify almost any object in sight, turing-test passing answering machine. We'll be carrying around neural-net training hardware that is different than the stuff between our ears, often worse but sometimes much better. A cyber brain. Scary af.
We will have 7 more layers of UI abstraction to solve the problem of not adequately warming the users hands as they use it during extreme winter weather caused by climate change.
Moore’s law is quite dead. Transistor density isn’t doubling. We’re still getting speed increases due to architectural improvements, but transistor density is capping out because thermals are a problem. That’s why you see these massive wafer designs and they’re not shrinking.
Sure, but if you look at the raw transistor count of a series of similarly sized dies (take Apple's A series for example), it holds up. It only doesn't hold up if you look at Intel the last 5 years. If you look at series' made on TSMC or Samsung, it holds up.
If you look at CPU performance relative to density as it has progressed over the last few decades, there's a clear decline in speed improvement. Denser only means faster to a point, regardless of Intel's process update failures.
> What do our apps do with all that compute? Detailed convincing AR, accurate real-time translation, identify almost any object in sight, turing-test passing answering machine. We'll be carrying around neural-net training hardware
Or more likely they'll spend the compute power on the latest JavaScript framework....
This isn’t far from the truth. But embrace it rather than feel upset by it. The world of programmers can be unlocked by a good API. And the underlying components can be C{,++}, as V8 is.
Predict weather and issue an AR warning where to seek shelter from nearby tornado and a passing heat dome, all sponsored by your telecom and facilitated by facebook, of course.
If have a recent iPhone, you are carrying some special purpose neural net hardware around in your pocket already. I think this is a trend which is going to continue. First obvious application is, that speech recognition moves onto the phones themselves, no network required any more.
Looks a bit like PEZY SC[1] massive multicore MIMD architecture. That one was basically shot down for grant frauds involving CEO but I wonder if this is going to be a spiritual successor.
I thought it was strange that throughout is the same for CFP8 and BF16. Why support a fancy new 8 bit format if it's not faster than 16 bits? And there weren't any details about CFP8 so who knows what it really is.
replying to myself: according to the transcript[1] it's a custom ISA and they mentioned some things about their custom compiler, including a diagram[2].
As long as the ISA can get compiled for from Tensorflow, Torch or whatever DeepLearning AST is used geht's comppiled down to it the user won't notice unless he is optimizing his model for the hardware in which cases he would also have to care about the equivalent implementation details
Not sure that's relevant here - the reason they can cool this is because they're custom-designing housing, server racks, etc for these chips based on the amount of power they draw. You could cool pretty much anything if you were able to give it this much love and care.
Also, the reason most laptops run hot is that most modern high performance processors are thermally limited. It means the cooling is never "sufficient" because if you make the cooling better then you get a faster processor instead of a cooler laptop.
> Their current training cluster would be the 5th largest supercomputer if Tesla stopped all real workloads, ran Linpack, and submitted it to the Top500 list.
which is trivial to do, and pretty much a must when bringing up the cluster to make sure its working properly, so much that most clusters do this on every maintainance, along with another bunch of benchmarks;
and that they say this:
> cost equivalent versus Nvidia GPU, Tesla claims they can achieve 4x the performance, 1.3x higher performance per watt, and 5x smaller footprint.
but have no MLPerf results, tells you everything you need to know about it.
The list of long-term hype-only AI-hardware companies with billions of dollars of VC investment and literally nothing to show is incredible and keeps growing.
Every MLPerf round, the list of companies that want to submit is "huge", and 1 week before the deadline, 99.999% of them have been saying "we'll submit next round" for years.
It's as-if people would spend billions on creating an F1 team, and then notice during pre-season training that the car can't even finish a lap. And then fail to even start a lap on every race of the season. And then do this again, year after year, for a decade. Burning billions and billions...