An inside look at the custom CPUs in Tesla's Dojo Supercomputer

volta83 · on Aug 21, 2021

The fact that they didn't do this:

> Their current training cluster would be the 5th largest supercomputer if Tesla stopped all real workloads, ran Linpack, and submitted it to the Top500 list.

which is trivial to do, and pretty much a must when bringing up the cluster to make sure its working properly, so much that most clusters do this on every maintainance, along with another bunch of benchmarks;

and that they say this:

> cost equivalent versus Nvidia GPU, Tesla claims they can achieve 4x the performance, 1.3x higher performance per watt, and 5x smaller footprint.

but have no MLPerf results, tells you everything you need to know about it.

The list of long-term hype-only AI-hardware companies with billions of dollars of VC investment and literally nothing to show is incredible and keeps growing.

Every MLPerf round, the list of companies that want to submit is "huge", and 1 week before the deadline, 99.999% of them have been saying "we'll submit next round" for years.

It's as-if people would spend billions on creating an F1 team, and then notice during pre-season training that the car can't even finish a lap. And then fail to even start a lap on every race of the season. And then do this again, year after year, for a decade. Burning billions and billions...

formerly_proven · on Aug 21, 2021

What's often overlooked is just because you have a shit-ton of compute nodes doesn't mean you could make it to the TOP500. You might have the compute power, but the system most likely doesn't have the connectivity. E.g. on the first slide it says this is distributed over more than three locations, which essentially guarantees that the system doesn't have supercomputer-like connectivity as a whole. Worth pointing out that the networking in a supercomputer is a very significant chunk of cost and power.

And if it is tailored for AI, it might not even do 32-bit float, or only at a fraction of the "AI FLOPS".

ilovefood · on Aug 21, 2021

The more 3D renders in a presentation the more skepticism I develop. I noticed there is a direct leading indicator of a stock's price in relation to the quality of the 3D renders in the company's presentations/PR events. This is of course only anecdotal evidence but I'm pretty sure the hypothesis can hold it's ground against 50% of "ML Papers" today.

addicted · on Aug 21, 2021

Sure, your heuristic may be suspicious of 3D renders, but has it considered men in robot costumes?

dmitriid · on Aug 21, 2021

This reminds of Magic Leap. 3 billion dollars invested in a 3D render.

rajnathani · on Aug 22, 2021

> Their current training cluster would be the 5th largest supercomputer if Tesla stopped all real workloads, ran Linpack, and submitted it to the Top500 list.

That line is referring to their current Nvidia A100 powered supercomputer which they set up with 5,760 A100 GPUs recently this year [0].

Read the previous line of the line which you’ve shared from the post:

> Tesla has been expanding the size of their GPU clusters for years. Their current training cluster would be the 5th largest supercomputer if Tesla stopped all real workloads, ran Linpack, and submitted it to the Top500 list.

[0] https://blogs.nvidia.com/blog/2021/06/22/tesla-av-training-s...

clomond · on Aug 23, 2021

Had to look into MLPerf as I don’t follow super compute.

But I don’t see them not doing that test as an issue as they made it quite clear that their entire system is tailor made, like an ASIC, to focus on neural nets and specific, relevant compute pipelines to what they care about. I would imagine dropping a generic, broad based ML benchmarking tool will not only perform suboptimally but also not be representative of what they’re trying to do. It’s not meant to be a general purpose ML super computer, it’s supposed to be a super computer to solve a narrowish niche of problems.

elurg · on Aug 21, 2021

In the presentation they were quite open that they got to the point of running real loads but only on a single tile on a bench.

Not clear what you're disputing here?

formerly_proven · on Aug 21, 2021

I only work on a lowly private cluster but running standard benchmarks is utterly routine here (it is in fact automated). As others with HPC experience pointed out, running benchmarks is pretty much mandatory when bringing a new system up, not just to ensure it actually performs as promised, but also to weed out bad and marginal components. You do one or two weeks of intense benchmarking and testing and you're assured numerous nodes will fail and need parts replaced. When you apply patches, you benchmark again. Why? Because the suppliers fix for "code XYZ crashes nodes" is probably "let's uh just reduce power limits by a few %". Or because of unintentional issues limiting performance. When a node crashes, you benchmark it again. Why? Because linpack and friends are good at making marginal hardware fail.

So it is literally unbelievable that Tesla not just stands up a cluster, but created their own hardware to do so, and didn't run any quotable benchmark and only has the theoretic FLOPS numbers for marketing.

cogman10 · on Aug 23, 2021

> So it is literally unbelievable that Tesla not just stands up a cluster, but created their own hardware to do so, and didn't run any quotable benchmark and only has the theoretic FLOPS numbers for marketing.

They haven't gotten to this point yet.

They have a single tile (A node). It wouldn't surprise me if the tile they have is just a prototype as well.

You have to have a cluster before you can start running cluster benchmarks.

formerly_proven · on Aug 24, 2021

Thanks for the correction, I mixed up the article's description of their existing clusters and their upcoming clusters.

volta83 · on Aug 21, 2021

> Not clear what you're disputing here?

The article, that assumes "FLOPS on paper == FLOPS in practice".

chrisseaton · on Aug 21, 2021

I don't get it - they're using it for actual work, rather than burning power to run useless benchmarks for bragging rights - and you think that makes it hype?! Surely it's the opposite - running benchmarks rather than doing something useful is hype.

volta83 · on Aug 21, 2021

You have never built a super computer, have you?

You have to connect thousands of cables, have hundreds of nodes, with thousands of components, everything interconnected, and if you connect one wrong, the computer outputs incorrect results. You have to routinely update the software, and if a software upgrade introduces a 20% perf regression (which happens), then your 10 MWh cluster starts burning 2MWh for nothing. Or maybe your cooling system sucks, and after a minute of running at full capacity, you need to throttle your cluster to 0.1% of the peak to keep it cool enough that it runs "something".

That's why all systems in the top500 i've been involved with (15 or so) run these benchmarks as an integration tests on every single cluster maintenance (node updates, servicing, OS updates, etc.).

Submitting these results to the Top500 costs you nothing... if your cluster actually works. When you submit to the Top500, they ask for access so that they can re-run them themselves, which happens typically during / after the next maintenance to avoid impacting any users.

If they haven't submitted, 100% sure their cluster does not deliver what they say it should deliver on paper. Maybe it delivers 1% of it, or 0.01% of it (seen both cases in real life). If they haven't fixed it, then maybe it can't be fixed.

HPL, MLPerf, Spec, Stream, OSU.... these are not "benchmarks for bragging rights", these are tests that show that your system works.

chrisseaton · on Aug 21, 2021

> run these benchmarks as an integration tests on every single cluster maintenance

Why run someone else's benchmark and not your own application to test performance? And what's the point of submitting to Top500? Why do you care how your system ranks? What's the business or technical purpose in that?

cedilla · on Aug 21, 2021

For the same reason that we don't test engines during a race. Only what's under test should change, with the rest being fixed.

It would be much more difficult to adapt some of your own applications as a test. If the result are bad, how would you even know if it's the application's fault or a problem with the cluster? LINPACK on the other hand is well understood and there has been tons of work to make sure that it uses all the power your cluster can deliver.

volta83 · on Aug 21, 2021

When a company claims that their new vaccine cures cancer, but is against applying for FDA approval because "no reasons given", you seem to be the kind of person that camps in front of their offices overnight to get the first 10 shots, all in the neck.

chrisseaton · on Aug 21, 2021

But FDA approval is mandatory, and participation in top N lists is just marketing. And they aren't even asking you to use their machine!

volta83 · on Aug 21, 2021

Dude i don't know what you don't get it.

Tesla is saying their hardware is 10x better than the competition.

I don't believe it. They don't publish any numbers, and everybody else does.

You believe them. Good for you.

I don't. I think not doing that is extremely suspicious, because if their cluster can be turned on, they have the results. So the reality of this is 100% certain that their results just suck, and they don't publish them to save face.

You believe their shit is so good, they didn't even test it while installing it. Good for you.

You think there is a masterplan to keep the results secret. Good for you.

I think their results are just horrible, and that's why they don't show them.

With any serious company, this wouldn't even be a discussion, because they would just show the verified facts about what their cluster can do or shut up about it if its really so secret.

You don't see the NSA or the DOD bragging around about being 100x better than "the competition".

To me they look forced to sell that they are doing something about these clusters to... drive hype up and get the stocks back up, and since their numbers are actually horrible, this is what we get.

N1H1L · on Aug 21, 2021

Is this some sort of a post-truth joke?

If you are claiming something is the best, then without any comparable metrics across systems we are in the Twilight zone.

chrisseaton · on Aug 21, 2021

But they've got no obligation to prove anything to random people on the internet. They aren't asking for peer review. They probably don't want to waste time and energy running a benchmark to compete in someone else's top N list. What's in it for them?

volta83 · on Aug 21, 2021

> But they've got no obligation to prove anything to random people on the internet.

They are a publicly traded company.

I'm a tesla investor and deserve to know.

If they are right and their hardware is 10x better than the competitor, then I'm happy. If they are wrong and they could have get 10x better outcomes by spending 100x less money, I'll be very pissed off.

> They probably don't want to waste time and energy running a benchmark to compete in someone else's top N list.

It costs no time. It's the first thing anybody does when building any cluster. It's part of accepting the materials from your suppliers to check that they didn't sell you shit.

chrisseaton · on Aug 21, 2021

> I'm a tesla investor and deserve to know.

I'm not an expert in securities law, but I don't believe being an investor entitles you to demand arbitrary proprietary information you want from a company. Otherwise people would buy one Apple share and find out what specs the new iPhone will have.

Sanguinaire · on Aug 21, 2021

HPL is a stress test with a notionally useful output measurement - just about the most effective way of pushing CPU load to the limit, and it tells you what fraction of the theoretical maximum FLOPS you can actually achieve given the other system constraints like memory and network throughput.

N1H1L · on Aug 21, 2021

Hi fellow supercomputer person!

My experience is on Summit & am now working with Frontier setup.

cedilla · on Aug 21, 2021

Running benchmarks is not only useful but necessary. It's a point where planning meets the real world. Private clusters are also routinely benchmarked, often as part of validation.

In the end if someone just tells you their supercomputer "would" be in the TOP500, they just tell you that it is definitely not in the TOP500. There might be good reasons for that, but still, it's like claiming that you would win Olympic gold if only you'd bother to participate.

_ph_ · on Aug 21, 2021

There might be some level of showmanship involved, but Tesla isn't selling those things, it is using them. Quite a different situation in comparison to a random startup which tries to convince investors to finance their products.

matmatmatmat · on Aug 21, 2021

I don't know, given Elon's showmanship, is it really that different?

ipsum2 · on Aug 21, 2021

Taking everything at face value: Dojo is overall a very impressive project!

- Communication speed is one of the biggest bottlenecks to large models, so their bandwidth of 4TBps is very smart.

- They claim a 1.3x perf/watt improvement, which is not really that great for ASICs compared to GPUs. Perf/watt is probably the most important number in datacenters.

- They only use SRAM, no DRAM. This is a huge mistake, which limits their model size. You can only fit a ~10GB model inside a single tile, versus 80GB models for a single A100 GPU.

- Software / compiler stack is as or more important than the hardware itself, because it dictates how much real performance you can squeeze out of the chips. I think Tesla will need to heavily focus on this area before getting anywhere close to real-world GPU performance.

Overall, I imagine the project will have similar pitfalls to Cerebras.

formerly_proven · on Aug 21, 2021

The thermal solution and packaging looks a lot like mainframe systems from the 80s (IBM, Siemens, Japanese).

Siemens H100: https://pbs.twimg.com/media/Ee56Q2bWkAA8vBE?format=jpg&name=... https://pbs.twimg.com/media/Ee56Q2ZX0AEqAWo?format=jpg&name=...

IBM: https://www.youtube.com/watch?v=xQ3oJlt4GrI

ipsum2 · on Aug 22, 2021

That was great, thanks for linking it.

01100011 · on Aug 21, 2021

Yeah, a lot of that probably doesn't matter for Tesla's current internal use case but all of it matters when you're talking about commercializing it.

modeless · on Aug 21, 2021

They said they have DRAM on the sides where it also connects to PCI-E. I guess the chips in the middle won't be able to access it easily though.

ra7 · on Aug 21, 2021

> If their claims are true, Tesla has 1 upped everyone in the AI hardware and software field. I’m skeptical, but this is also a hardware geek’s wet dream.

How is the author declaring Tesla has one upped everyone in AI hardware and software while the article has exactly zero references to TPUs?

new_realist · on Aug 21, 2021

Also, these chips are as yet only in Tesla’s labs, not in production. What’s cooking in secret in NVIDIA’s labs right now?NVIDIA doesn’t have to hype their stuff, so they don’t tip their hand to competitors until it’s ready for sale.

Creating a lab beast is one thing, making it useful and economical in production is another.

mda · on Aug 21, 2021

Probably Google already have at 100x compute power of whatever Tesla has. (Sundar mentioned in Google I/O that a single TPU v4 pod has >1 exaflopfs)

shaklee3 · on Aug 21, 2021

It's also a fictional product at this point. Even if the hardware existed, if you can't program it, it won't succeed.

SequoiaHope · on Aug 21, 2021

In their presentation yesterday they made it clear that they have a full software stack for programming it. I don't know what is marketing hype and what is real, but they suggested it is very easy for them to program for it.

creato · on Aug 21, 2021

Every hardware vendor ever says that. It's probably true for the people that built the software stack. Whether it is true for third parties is TBD.

SequoiaHope · on Aug 21, 2021

There’s no third party here. As far as I can tell Tesla designed the chips and wrote the software stack to work together. They don’t want to rely on third parties for their critical infrastructure and AI chip design is vital to their success. At least this is what I can grok as an outsider.

shaklee3 · on Aug 21, 2021

Right, I understand the claims, but as you can see with AMD there's a ton of work to actually be competitive, and AMD has a large team working on it. I'm just skeptical at the claim the the software is of any use to anyone but a small group.

SequoiaHope · on Aug 22, 2021

I think the difference is that AMD needs to cater to a wide variety of customers, software stacks, and hardware platforms. Tesla has basically one platform and one customer (themselves).

Their software only needs to be useful to a small group - the autonomy team. I did not get the impression that Tesla plans to sell supercomputers, but that they are building these supercomputers for themselves to train and deploy AI networks.

You said "Even if the hardware existed, if you can't program it, it won't succeed." But it seems like the key customer - Tesla's internal Autopilot team - can already program it. So I just don't see any problem. They may choose to sell these systems (note that they have been shipping their Gen1 custom chip for years now and it is not for sale to the public), but the real plan for revenue is to succeed at AI and profit there. For that they need incredible computing power internally and in a portable platform for edge deployment, but they do not need to sell their chips as a general purpose compute platform.

shaklee3 · on Aug 22, 2021

My problem with articles like these is the sensational headlines that Tesla made an AMD/Nvidia killer. AMD and Nvidia could both make this chip if they wanted to, but dedicating all the die to matrix multiply cores is a waste for most end users. The media makes it seem like Tesla did something revolutionary here, but all they did was make a very, very targeted asic.

SequoiaHope · on Aug 22, 2021

That makes sense. Certainly lots of journalism is bad. I haven’t read the article actually. I’ve just watched the two hour presentation by Tesla as well as their past presentations. I am a robotics engineer and I’ve been trying to understand how best to make an “animal like” brain system for an autonomous robot in the real world. I have been pleased with how much Tesla shared about their system and I think their extremely powerful hardware and their neural network approach is ideal for solving this problem. So I’m very happy with what they’ve come up with and I’m happy that it will push competitors to do the same as I think very large neural networks might be needed to solve general purpose robotics.

So all in all I think the chip is very good and I think they are on a path to success. Whatever the article says to hype it up doesn’t change the value of what Tesla has done in my eyes.

panick21 · on Aug 23, 2021

If anything I think they would 'sell' this as a cloud service I would think. But it would be a while before that happens.

bagels · on Aug 21, 2021

A few of big tech companies are doing things like this without publicizing it.

dekhn · on Aug 21, 2021

the how is: ML hype. Very few people can actually swim through the hype to the actual island, and in this case, the author isn't even capable of judging.

bsaul · on Aug 21, 2021

I find it surprising that google, amazon and now tesla can enter the « build your own processor » market that easily. Isn’t it more likely that they’re in fact only defining the general spec, and then use the services of other companies specialized in actually designing cpus ?

SequoiaHope · on Aug 21, 2021

Tesla hired people who designed chips for Apple and Intel to design their chips. They are designed from the ground up for this application. It seems like they're doing all the chip design in house.

_ph_ · on Aug 21, 2021

To design your own IC has at the same time become extremely complex and very approachable, if you have the money for it.

A multi-billion transistor CPU is an insanely complex product, there are multiple layers coming together to make it possible. The base layer is the foundry, here TSMC, which does the manufacturing and provides reference design kits. They contain for example the blueprints for transistors and other electrical components. For a digital chips like processors, they are usually used as they come from the foundry.

Next is a layer of software which enables the chip designs, provided by the big EDA companies. This is a rather huge layer, enabling the basic designs on the one side, but also including all kind of simulation and verification tools. And quite some engineering knowledge comes along with it.

So if you want to start designing your own CPU, you still need good engineers who know what they are doing, but large parts of the whole "stack" can and have to bought from the vendors listed above. This enables the quick entries of companies into the market, who were not traditional chip design houses.

elurg · on Aug 21, 2021

You can just license full CPUs and major IP blocks for stuff like IO from companies such as a ARM.

The machine-learning parts are custom but the rest are probably similar to what you would find in a smartphone chip.

littlestymaar · on Aug 21, 2021

> It appears the total tile is 15KW

> For those counting at home, there are 12 total tiles per cabinet

So 180kW per cabinet. I'd love to see what the cooling system looks like!

lianqiao999 · on Aug 25, 2021

it will heat 3086kg water by 50C degree every hour, pretty easy to solve with water cooling

mastax · on Aug 21, 2021

Wow. All that packaging and power delivery must be incredibly expensive. Yet they claim massive TCO advantage over Nvidia. Amazing what you can do when not paying for Nvidia's 60% margins ;)

If this is as good as they say, they should spin this off and sell these. Probably won't.

01100011 · on Aug 21, 2021

Does that include an accurate NRE estimate?

mianos · on Aug 21, 2021

The rather exotic packing reminds me of the IBM ES/9000 series processors. If you like an exotic package check that one one on youtube.

skee_0x4459 · on Aug 21, 2021

i cant stop thinking about the tesla bot. was he he outright lying? is it just a ploy to recruit robotics people? is it even plausible?

i think the most challenging aspect of the idea is interacting with the world, picking up and handling various objects.

its obvious from the presentation that fsd is very good at placing itself in space and mapping out its environment as well as devising routes even when accounting for things like other moving objects. boston dynamics has built machines that take the input of a joystick and translate that into four-legged locomotion. so if you just took FSD and bolted that on top of a spot or atlas, you would have a machine that gets you a very long way towards tesla bot. something that can walk and navigate by itself. so the question is whether or not tesla will be able to recreate what boston dynamics has done and whether or not they could do that on the time-scale that was insinuated by musk. the last time i checked, boston dynamics does not use any NN in their robots.

and then theres the question of interacting with the world. i think its a fair assumption based on ai day that fsd would be able to create a very rich and accurate map of its surroundings. so that half of the question is taken care of. but how could they get it to grasp a drill and manipulate it in an intelligent way? the implication is that they would train a NN to provide that signal. but how are you going to train that? they have thousands of cars out in the world generating tons of driving data. where are they going to get data for performing these tasks? are you going to train it on every task that might be asked of it? its hypothetically possible but thats hard data to generate and label. it would take a really long time at best, and even then it would be an experiment, just as likely to fail as succeed.

it seems to me that this must be a stunt where the intention is to build something that can walk around but not manipulate things or do anything useful. i would change my mind if musk came out with an explanation of how hes going to train interaction.

sbierwagen · on Aug 21, 2021

>is it even plausible?

The last commercial anthropomorphic was the Willow Garage PR2 back in 2010. It weighed 600 pounds, and had a wheeled base. Each arm had a max payload of 4 pounds. It cost $250,000. The company went bankrupt because there wasn't anything you could do with it.

The tesla bot is supposed to be bipedal, only weigh 125 pounds, and have a "arm extend lift" of 10 lbs. Is that per arm, or both together? Even BD Atlas weighs 196 pounds, and it doesn't have hands!

Like, any one of their numbers would be exceptional. All together, and you are definitely sacrificing something. Either onboard compute is minimal, or it has a battery life measured in dozens of minutes. Something.

They claim a deadlift of 150 lbs. I don't want to say "impossible!", but... difficult? Just holding 150 lbs with any of the robotic hands you can buy on the market today would be hard or impossible.

None of the five finger hands today are as compact as shown in the concept renders. If it does ship with a five finger hand, (stupid, pointlessly expensive) the forearms would be far bulkier. An easy bet is that the first gen will ship with a three finger gripper.

That is, if it even ships at all. BD Handle is a much better design for a humanoid-ish industrial robot. It's going to be simpler, faster and lighter for the same payload as this hypothetical Tesla bot.

skee_0x4459 · on Aug 21, 2021

i think this comment illuminates the issue. seems like an industry veteran (unconfirmed) thinks having the thing walking around is plausible but is confused about interaction.

https://old.reddit.com/r/robotics/comments/p7t14o/tesla_reve...

sbierwagen · on Aug 21, 2021

Even assuming the robot existed, which it doesn't, (Maaaaybe it was tethered, so it didn't have to carry around batteries and a hydraulic pump) I'm quailing at the task of programming the thing. To match human performance would be decades of work for thousands of programmers.

(The robot picks up a screwdriver, and a screw. Then the screw slips out of its fingers. Now what? It lines up the screw and the screwdriver. It applies torque. The head of the screw strips out. Now what? The pain of robotics is that you need to cover each and every little error case, because if you don't, the damn thing doesn't work, because it has no brain! This is why every industrial robot is massively overbuilt, and it's environment and fixturing is carefully simplified and fenced off, because error handling is such a pain in the real world, where a dropped item bounces away and hides under a bench or in an orientation where your gripper can't pick it up. 1 in 1000 is too high of an error rate. 1 in 10000 is too much. It has to function perfectly, every time, every grasp.)

And the maintenance costs! Mechanical humanoid hands are terrible end effectors. All little moving parts and lousy tolerances. They would need constant repair and replacement. It couldn't possibly be cheaper than a human in 2021 or 2030.

rlt · on Aug 23, 2021

> Either onboard compute is minimal, or it has a battery life measured in dozens of minutes.

For industrial applications you could probably have some sort of novel power system, like a tether from the ceiling or special floor that delivers power through the feet.

hellbannedguy · on Aug 21, 2021

He does his little stunts/blunders, and he gets people talking. It's free advertising, and brand awareness.

matmatmatmat · on Aug 21, 2021

Haha, don't know where the downvotes are coming from, this is the absolute truth right here.

He knows the world idolizes him as a slightly eccentric genius engineer with a heart of gold, and I think there's probably some truth in that, even. Nevertheless, if you are smart enough to be a good engineer, it doesn't take much to realize that you can greatly expand your aura by occasionally trolling people with something stupid just to see if you can get away with it. The fans will adore him more, the critics will pounce, but the net result is more exposure, and more exposure gets investment money. Rinse, repeat.

Personally, I think one of the smarter touches is to release/leak information that looks bad shortly before releasing information saying that the problem was overcome. This constantly reinforces the impression that anything Musk-related is always overcoming impossible odds. "Andrej didn't think we could do it, what do you think now, Andrej?" "The Starlink terminal costs $2400, but just a few months later, it costs $1000!"

I'm sure FSD will be here by Christmas.

systemvoltage · on Aug 21, 2021

More here: https://news.ycombinator.com/item?id=28241884

Sorry to point out but this has nothing to do with the article which is about semiconductors.

skee_0x4459 · on Aug 21, 2021

yes im aware. that thread is a day old and probably quite dead. i post here because i am interested to read peoples responses. and its certainly related since tesla bot will use fsd and be trained on the hardware discussed in the article.

wnkrshm · on Aug 21, 2021

Maybe they plan to get investors on "the hardware is great, it's only the software that has to be finished" just as with FSD.

skee_0x4459 · on Aug 21, 2021

to be honest FSD seems to be coming along albeit behind schedule. ive watched videos of the fsd beta and it is the closest thing ive ever seen to a car driving itself in an uncontrolled environment. i wonder what you make of dirty teslas latest driving video on youtube?

freemint · on Aug 21, 2021

I find it some what odd that you don't consider Waymos taxi service to be in an uncontrolled environment. Yes, it is an incredible well mapped environment but so are the highways Tesla drive on.

skee_0x4459 · on Aug 21, 2021

listen im trying to be objective about this but i cant help but point out that fsd is navigating crowded intersections, crowded downtown areas and roads without lane markers. ive seen videos of it

pmuk · on Aug 21, 2021

Perhaps it’ll be trained primarily through simulation?

skee_0x4459 · on Aug 21, 2021

i dont know why people downvote the shit out of me whatever i post. it seems like HN has become a cesspit of idiots... anyway i was very interested to see my comment validated by this guy

https://old.reddit.com/r/robotics/comments/p7t14o/tesla_reve...

fuzzfactor · on Aug 21, 2021

I don't know about cesspits & idiots, but a little smell can develop that does tend to bring the intellectual level toward the below-average rather than above.

For those that want to push downward, the present system highly leverages their ability to do so simply through the process or habit of frequent downvoting.

If there is any perception that there was a time when there was very little detectable smell by comparison, it would be good to assess the average downvote-per-user rate then, and compare it to today's figure.

Then measure each user along this scale, perhaps including a time component, or relative to activity in some way.

Allowing for a reasonable standard deviation, it might be better if frequent downvoters past a certain range had the weight of each downvote normalized and see what happens.

This could possibly also be tuned to achieve a target level of discourse relative to a previously-considered-desirable data point in time.

Alternatively, users alone appear theoretically able to overcome the issue if there was a widespread concerted or random effort to frequently upvote the comments or postings seen descending, whether fully deserved or not, keeping them at least neutral without having a negative effect on the commenter's rating.

Mathematically a small uptick in "compensatory upvoting" habits among average users could bring the target way up as long as the overly-frequent downvoters are in the vast minority.

Then when there is true downward consensus it will still always drop through, but those who participate mainly to downvote will have less negative impact.

The only thing worse than the "nattering nabobs of negativity" are the non-nattering nabobs of even worse negativity.

hirundo · on Aug 21, 2021

Now assume that Moore's law stays healthy and in a few decades we're carrying around dojo-equivalent smart phones. Or implants. What do our apps do with all that compute? Detailed convincing AR, accurate real-time translation, identify almost any object in sight, turing-test passing answering machine. We'll be carrying around neural-net training hardware that is different than the stuff between our ears, often worse but sometimes much better. A cyber brain. Scary af.

pplante · on Aug 21, 2021

We will have 7 more layers of UI abstraction to solve the problem of not adequately warming the users hands as they use it during extreme winter weather caused by climate change.

vlovich123 · on Aug 21, 2021

Moore’s law is quite dead. Transistor density isn’t doubling. We’re still getting speed increases due to architectural improvements, but transistor density is capping out because thermals are a problem. That’s why you see these massive wafer designs and they’re not shrinking.

pabs3 · on Aug 21, 2021

Both TSMC and Intel have transistor density increases on their process roadmaps, although they definitely aren't doubling.

WoodenChair · on Aug 21, 2021

> Transistor density isn’t doubling.

It pretty much continues to double: https://en.m.wikipedia.org/wiki/Transistor_count#/media/File...

Baeocystin · on Aug 21, 2021

Raw transistor count isn't the same as density.

WoodenChair · on Aug 21, 2021

Sure, but if you look at the raw transistor count of a series of similarly sized dies (take Apple's A series for example), it holds up. It only doesn't hold up if you look at Intel the last 5 years. If you look at series' made on TSMC or Samsung, it holds up.

smolder · on Aug 21, 2021

If you look at CPU performance relative to density as it has progressed over the last few decades, there's a clear decline in speed improvement. Denser only means faster to a point, regardless of Intel's process update failures.

WoodenChair · on Aug 21, 2021

Not sure what you’re replying to. The statement I’m replying to didn’t say anything about performance. It said “ Transistor density isn’t doubling.”

smolder · on Aug 23, 2021

I have always looked at Moore's "law" as a proxy for performance, so I felt it was relevant, if tangential.

chrisseaton · on Aug 21, 2021

Moore’s law was about count, not density.

AussieWog93 · on Aug 21, 2021

I mean, strictly speaking, you're technically correct. Informally, though, it's a proxy for transistor density.

shadilay · on Aug 21, 2021

It was about cost. https://hasler.ece.gatech.edu/Published_papers/Technology_ov...

jhgb · on Aug 21, 2021

Yes, count for optimum cost per transistor. So perhaps GP's statement was incomplete but not wrong.

rPlayer6554 · on Aug 21, 2021

> What do our apps do with all that compute? Detailed convincing AR, accurate real-time translation, identify almost any object in sight, turing-test passing answering machine. We'll be carrying around neural-net training hardware

Or more likely they'll spend the compute power on the latest JavaScript framework....

pbalau · on Aug 21, 2021

> What do our apps do with all that compute?

javascript

sillysaurusx · on Aug 21, 2021

This isn’t far from the truth. But embrace it rather than feel upset by it. The world of programmers can be unlocked by a good API. And the underlying components can be C{,++}, as V8 is.

jameskilton · on Aug 21, 2021

I wish I had your optimism. All I expect are better ways to deliver targeted advertising.

mysterydip · on Aug 21, 2021

Based on current computing resource usage: ads.

Keyframe · on Aug 21, 2021

Predict weather and issue an AR warning where to seek shelter from nearby tornado and a passing heat dome, all sponsored by your telecom and facilitated by facebook, of course.

roca · on Aug 21, 2021

Performance per watt is not improving that much, nor is feature size. You'll probably never be carrying around a supercomputer with this many FLOPS.

cma · on Aug 21, 2021

Even with reversible computing?

roca · on Aug 21, 2021

If we ever use reversible computing it won't look anything like this Tesla stuff, that's for sure.

freemint · on Aug 21, 2021

Linear algebra is reversible if you nudge the matrixes so that the determinant is non-zero.

nefitty · on Aug 21, 2021

What about cloud services. A 5G connection to a supercomputer on demand.

roca · on Aug 21, 2021

You can have that now. That's now that the parent was hoping for.

_ph_ · on Aug 21, 2021

If have a recent iPhone, you are carrying some special purpose neural net hardware around in your pocket already. I think this is a trend which is going to continue. First obvious application is, that speech recognition moves onto the phones themselves, no network required any more.

icytrumpet · on Aug 21, 2021

That's not a world I want to live in. Not even a little bit.

flipbrad · on Aug 21, 2021

It's going to be really good at scanning your texts and photos to check they don't contain unacceptable content.

cepp · on Aug 21, 2021

Relevant cost breakdown from r/hardware: https://www.reddit.com/r/hardware/comments/p7tmuw/tesla_unve...

antattack · on Aug 21, 2021

Awesome. I like where Tesla R&D money is going.

numpad0 · on Aug 21, 2021

Looks a bit like PEZY SC[1] massive multicore MIMD architecture. That one was basically shot down for grant frauds involving CEO but I wonder if this is going to be a spiritual successor.

1: https://news.mynavi.jp/article/hetero_manycore-4/

etaioinshrdlu · on Aug 21, 2021

I agree that this seems quite similar to Cerebras. Does anyone think Tesla might sell raw compute hardware?

AtlasBarfed · on Aug 21, 2021

That's a business far easier to develop than a personal robot.

modeless · on Aug 21, 2021

I thought it was strange that throughout is the same for CFP8 and BF16. Why support a fancy new 8 bit format if it's not faster than 16 bits? And there weren't any details about CFP8 so who knows what it really is.

bagels · on Aug 21, 2021

I think they said custom floating point?

hencoappel · on Aug 21, 2021

Could be more efficient than BF16?

ericd · on Aug 21, 2021

Memory usage?

fulafel · on Aug 21, 2021

So what's the programming model, ISA and compiler technology situation (reliance on / maturity of) like?

fulafel · on Aug 22, 2021

replying to myself: according to the transcript[1] it's a custom ISA and they mentioned some things about their custom compiler, including a diagram[2].

[1]https://www.tweaktown.com/news/81229/teslas-insane-new-dojo-...

[2] https://www.tweaktown.com/image.php?image=https://static.twe...

panick21 · on Aug 21, 2021

Is this power from the top method something other people do?

antattack · on Aug 21, 2021

Power is from the bottom. Cooling is from the top and bottom. What you see on top are voltage regulators soldered directly to the wafer.

mastax · on Aug 21, 2021

Looks similar to what Cerberus is doing.

croes · on Aug 21, 2021

At the end they all depend on TSMC

mola · on Aug 21, 2021

Benchmarks?

danceparty · on Aug 21, 2021

custom ISA, 400W TDP. Sounds like a real pain in the ass =)

freemint · on Aug 21, 2021

As long as the ISA can get compiled for from Tensorflow, Torch or whatever DeepLearning AST is used geht's comppiled down to it the user won't notice unless he is optimizing his model for the hardware in which cases he would also have to care about the equivalent implementation details

minitoar · on Aug 21, 2021

I'm assuming this is a reference to the dojo scene in The Matrix.

wmf · on Aug 21, 2021

Dojos have been used for training for a long time before The Matrix.

minitoar · on Aug 21, 2021

Yeah but it’s happening in a computer! Definitely feels like a The Matrix reference. Plus the whole thing is doing tons of matrix-based math.

rowanG077 · on Aug 21, 2021

Insane, truly insane. Meanwhile laptop CPUs, with the notable exception of M1, are impossible to cool.

kg · on Aug 21, 2021

Not sure that's relevant here - the reason they can cool this is because they're custom-designing housing, server racks, etc for these chips based on the amount of power they draw. You could cool pretty much anything if you were able to give it this much love and care.

zrm · on Aug 21, 2021

Also, the reason most laptops run hot is that most modern high performance processors are thermally limited. It means the cooling is never "sufficient" because if you make the cooling better then you get a faster processor instead of a cooler laptop.