I found the news of Intel releasing this chip quite encouraging. If they have enough capacity on their 10nm node to put it into production then they have tamed many of the problems that were holding them back. My hope is that Gelsinger's renewed attention to engineering excellence will allow the folks who know how to iron out a process to work more freely than they did under the previous leadership.
That said, fixing Intel is a three step process right? First they have to get their process issues under control (seems like they are making progress there). Second, they need to figure out the third party use of that process so that they can bank some some of revenue that is out there from the chip shortage. And finally, they need to answer the "jelly bean" market, and by that we know that "jelly bean" type processors have become powerful enough to be the only processor in a system so Intel needs to play there or it will lose that whole segment to Nvidia/ARM.
Forget all the technical problems, fixing Intel is mostly about figuring out how they let a bunch of MBAs mismanage the company to the point that they squandered a two-decade long global monopoly on computing, in an era where computing absolutely dominated global economic development.
I mean, when you put it that way, it's genuinely astounding that they managed to find themselves technically so far behind, and so quickly. This was not a minor mismanagement fuck up, this is complete incompetence. Things clearly need to be burnt to the ground, they need a lot of churn.
If they can get back to being engineering-driven rather than financially driven, it will all fall into place.
Once you invite the vampire into the house should you really be surprised that you find it sucking your blood? The answer is that like most large companies they decided technologist leadership no longer mattered. It’s a pattern that repeats again and again and that is because the people who need to learn the lesson most are counter-incentivized to do so.
Several conversations on this topic have brought up that pay rates at Intel are now less than the median of the industry. This would be fallout from actions by the MBAs in charge, but I don't think it's just an MBA issue. Something is clearly rotten in middle management in Intel. Either the people investigating the problems with their process weren't given the freedom to experiment to narrow down the scope of the problem, or they lost all their competent process engineers and are left with insufficient clue to fix the problem. If Intel has to rebuild their process group from scratch, they could be in for additional years of pain as new people come up to speed. Given that the United States is no longer a leader in bleeding edge process technology, this could take far longer than expected. How long will it take Intel to poach enough talent from TSMC to make a difference? And will they be able to integrate into Intel's culture?
The Israel group saved their behind in the days of the Pentium M. It seems like skunk works projects have been crucial to Intel, but the last decade has been a lot of tick tock strategy and not a lot of hard engineering...
> bunch of MBAs mismanage the company to the point that they squandered a two-decade long global monopoly on computing
I'm generally happy to blame MBAs for quite a bit. But isn't this also part of the nature of monopolies? A lack of competition means that in the short term, strategy doesn't matter at all. There was a period where Intel execs could have run the company via astrological chart or coin flip and done just as well for themselves. Managerialism and other MBA ideologies make that worse for sure.
MBAs teach you a load of useful skills, in particular a great breadth of business problems. HR. Accounting. Strategic planning. International expansion. Supply chain considerations. Micro and macroeconomics. Negotiation. The list goes on.
It doesn't make you experienced, or a leader. But it does give you the essential skills to run a business, that are so diverse that they are hard to acquire organically.
A good MBA is perhaps like a good coding Boot Camp. It will basically teach you enough that you can survive 3 months of a junior dev job, and if you're smart, bootstrap yourself up from that. But if you staff a company with no-one but people from such Boot Camps with no other experience, who will just then resonate against each other, that won't go anywhere.
They teach you about those topics. But it's not clear how much they teach actual skills beyond finance and presentation, in that there's no actual business to run. One can read about negotiation, for example, but actual negotiation skills come from negotiating deals.
A friend of mine went back for a high-end MBA after a few years working for large companies as a (non-software) engineer. His basic take was that the curriculum only had one thing that he couldn't have figured out just by thinking about the topic for a bit. (It was the principle of comparative advantage, which is a bit of a mind-bender.) He says what he really got out of it was strong relationships with a network of people that were soon highly placed in a bunch of major companies.
As per my other comment, my wife is currently doing an MBA.
Again, skills are not replacement for experience, or even good ideas. They won't make you a pro in those things either.
It's a bit like saying a CS degree is useless, because it's impractical, and with programming, you learn by doing anyway. And you can't build a business out of a CS degree alone. Undoubtedly so, and yet a CS degree shows you how large the computer world is, so to speak.
And again, I don't think the MBA program teaches skills, the main exception being presenting confidently when you don't in fact know much.
And I'm not saying an MBA is useless. As I said above, it's very useful for establishing a network. It's also an entry ticket into a bunch of high-caste jobs and a thorough indoctrination into the ideology of managerialism. The starting salary for MBA is proof of its usefulness to their holders.
I agree there's some analogy with a CS degree. The main difference being that nobody puts a fresh CS grad in charge of anything important, because we understand they don't know much yet. An MBA, on the other hand, puts people in positions of significant influence despite them knowing little or nothing about the actual work. It's as if somebody went through the book-learning part of a medical degree and then was turned loose to practice on real patients.
> And again, I don't think the MBA program teaches skills, the main exception being presenting confidently when you don't in fact know much.
What can I say, this is not consistent with my experience. Perhaps it says more about the motivations and attitude of the people attending.
Perhaps a core confusion is thus: running a technology-based business requires, well, technological and business acumen. If you try to supplement one with the other, things go wrong, regardless of the direction. When I hear Toyota is ran by engineers, I'm sure the engineers don't negotiate finance deals, manage HR, career development, accounting etc. These boring things are important. Equally, when I hear about Boeing or IBM becoming corporate quagmires, whose core product is developed without a proper engineering input, I'm not suggesting this can work. My personal experience of companies ran by technical folk was, as said, awful.
The frustration about MBAs I hear in this thread seems to be when they encroach on technical territories they don't know about, and I agree this cannot work.
I think where MBAs work best is where you have mid-career people, already with experience, using this to boost their skills in diverse areas (alongside all other benefits like networking etc.), and I maintain it is great at that. If you take someone who doesn't know anything about anything, give them an MBA, then a random start-up to run, and it all goes wrong, I wouldn't blame the MBA.
I certainly agree the motivations of the people attending are a problem. But again, it's systemic. Locke and Spender cover it well.
You keep being sure about Toyota without any knowledge. That's very MBA of you, but I don't think it advances the discussion. The executives who turned it into a global power were engineers. And that engineering orientation was vital to the creation of the Toyota Way. Deeply influenced by W Edwards Deming, an engineer and statistician.
Nobody is denying that "boring" things like finance are important. Indeed, I'm saying the opposite. My point is that giving somebody a quick education that covers many "boring" things in a shallow fashion is not a great way to create executives. It is a great way, though, to create a glib executive caste that is good at extracting cash. Which is why executive comp has risen massively over the decades even though few would argue that executives are massively more competent.
You might be right that the MBA is better for people who have already developed actual expertise. But the average age of Harvard MBA applicants is 27, and only 12% are over 30. So if you're correct, then we can at least agree that most MBA grads shouldn't be in charge of anything.
About as much detail as I can learn from 1 week on coursera.
More often than not it teaches them the arrogance to believe they can do everything on their own and that they know the law because they have an MBA. I'm currently closely observing a sequoia invested startup run by a bunch of TOP 3 MBA program invested people.
Questionable legal decision, terrible HR, abusive bullying management. It's all a big circle jerk.
Let's not kid ourselves. The real thing MBAs learn in school is how to network and score their next company they will destroy.
I think you're spot on with the arrogance bit. I've had friends who have done MBAs and some of them came out of it with an enormous performative confidence despite actually knowing much about whatever topic was under discussion. It was exhausting to be around until they learned to moderate it.
It reminds me of a friend's complaint about having done debate in high school. He says it took him years to unlearn the habits it gave him. Including the notion that whatever take first occurred to him in a discussion is one that must be defended at all costs. Or treating all discussions as gladiatorial to-the-death combat.
Learning that sort of faux certitude is certainly helpful in persuading people. So I get why people invest so much in learning it. But it seems to be to be double edged.
Is there any experience to corroborate it, or is this your impression?
My wife is doing an MBA right now, on a top-100 uni globally; so not Harvard but good. I am genuinely impressed at the breadth and detail they go into.
As I say, these don’t turn you into experts or experienced business people, but from my perspective, very valuable.
It's my personal observation. In my experience there is a strong difference between people that study MBA for the sake of studying MBAs and people that are engineers or have other experiencing, want to create something but feel like they're lacking from a business perspective. Those people seem to do comparatively better, and seem to have better functioning companies if they choose to run them after the MBA.
Yeah, I'm not questioning this. If you want to start a company, you need a good product, and you won't magic one out from an MBA.
Equally, running a large corporation without people who understand so many complex business dynamics (and crucially, you'll need some who appreciate how they interact) seems crazy-suboptimal to me.
It's a mistake to think the only way to understand something complex is with an MBA.
A good example is Toyota versus the US Big 3 automakers. GM basically invented the modern American management the MBA came out of. Toyota was run by engineers. Toyota came out of the post WWII decimation of Japan to become the world's largest automaker, kicking the asses of US automakers in terms of cost and quality.
Toyota was happy to teach how they did it, and even set up a joint venture with GM, the NUMMI plant. The plant worked fine, but GM was unable to adopt the lessons. This American Life tells the tail well: https://www.thisamericanlife.org/561/nummi-2015
I think this is directly related to the MBA ideology, an important part of which is that you can take privileged people, give them a short general education covering a number of disciplines, and then make them part of the managerial caste that controls everybody else. A core notion of the MBA is that management is a universal skill. This is both historically novel and distinct from elements of the Toyota Way, which strongly honor the people doing the actual work and their accumulated knowledge.
MBA is not the only way to understand anything complex.
MBA is a good starting place to understand complex business problems. It won't make you a good engineer. And yet even engineering companies have supply chains to manage, contracts to negotiate, labour force to grow and keep motivated, finance deals to secure, multi-year strategic planning etc. I know for a fact engineers are, by and large, not good at that.
Doing an MBA won't magically make you good at that, but at least gives you a head start thinking about those problems and standard industry solutions (if you take more than arrogance and networking out of it). Similar to how an engineering/science degree doesn't tell you how to build a Tesla, but gives you the basics to bootstrap yourself from that point. Even if many a Stanford engineering grad thinks they'll build the next Decacorn because they are so great.
Lots of things give people a head start in thinking about those problems. But an MBA is a ticket to power.
If you really can't see what's controversial about giving power to a bunch of young people with a smattering of training and a very specific ideology, you haven't been looking at the harm caused. Maybe start there.
"MBA" has become a sort of shorthand for everything that is wrong with modern American business culture.
The pattern is typically something like:
1) Management awards themselves a bunch of stock and sets annual cash bonus targets tied to short term financial metrics.
2) Management then focuses entirely on optimizing for short term metric attainment (e.g., quarterly EPS) and current stock price, by minimizing investment in R&D and new products, reducing worker headcount, engaging in financial engineering like stock buybacks, etc.
3) Management extracts as much value for themselves as possible through the above and then leaves before it all falls apart, to repeat the process somewhere else.
MBAs might be over-represented within this group, but this isn't something taught in MBA curricula, not all MBAs are engaged in this abhorrent behavior, and non-MBAs do it as well.
You might enjoy reading "Confronting Managerialism", which looks at the history of the ideology and how it has become basically a caste system. Or read some of the Lean literature on how non-MBA-driven companies approach things. E.g., "Toyota Kata".
As an occasional manager, there's some value to the work that some managers do. But managerialism itself often extractive (that is, part of its function is to justify siphoning a lot of cash into the pockets of managers and execs) and value-destroying (both through optimizing for value-negative things and empowering bad decisions).
I'm management, albeit not at Intel, and I don't have/have no interest in an MBA, so read in whatever biases you'd like. Or decide I'm full of shit, that's possible too. There is value in management, but management has to realize that it doesn't exist for its own sake. That's a tough one when "management is the company" is effectively what anyone's been taught for decades now. The problem is, I think, that it is an oversimplification. Nominally management makes the decisions, but we aren't where the product gets built or sold. The company is as successful as its products, and so maybe the product is the company, and we're all doing our part to make it the best thing that people want.
I'd like to think I'm a good leader. Probably not the best one - there are plenty of folks I look up to. Probably not the worst one, but that's not for me to decide. In any case, as middle management, [0] I see three major components to my role:
1. Hire and retain really talented people who are good at what we do;
2. Decide what we'll work on now, what we'll do later, and what we won't do; [1] and
3. Sponsor my team and provide air cover for #2
I generally have a pretty good idea of _what_ to do, but not always exactly. I used to know _how_ to do it, but I'm probably not the best at that today. My goal is to hire or develop a team that can put a fine point on the _what_, or challenge me when I'm wrong, and to own the _how_.
In terms of sponsorship and air cover, it's taking 1 and 2 above, and keeping it on the rails with minimal disruption. I see good middle management as needing to be very team-focused, caring for the health and composition of the team above all. Sometimes that means absorbing a lot coming from above, and sometimes it means being really pushy upwards to get what we need to be successful. But the team should be free to execute if we're doing our jobs well, and their end of the bargain is to deliver what we agreed to do.
This is not the philosophy of all middle managers, and I'm sure there are other models that work. This is the one I can speak to.
--
[0]: "Middle management" defined as someone who manages other managers but is not C-level.
[1]: This doesn't mean I cook this shit up. I'm listening to the team I manage, as well as the priorities and directives coming from above, and applying my filters. Gotta figure they hired me because I know generally what to do in my space, and the team agreed to work with me because I have the ability to point them toward the critical path and push the debris out of the way before they run into it. Without management, there are either too many ideas to harmonize, or you get lucky and the team converges on the right one. I think good middle management helps us converge on the right ideas faster.
While I largely agree, it is also true that running a business requires very different skills than building its product.
MBA or not, technologist or not, you can still be good or bad at what you do. Smart or stupid. I think simply looking at a mismanaged tech company and saying “ah too many MBAs in charge” is missing the point.
My guess is, if you look at the well-ran large companies, these will also have an MBA-rich management core.
I worked at a mid-sized company completely driven by technological people, with no one having a solid business background. It wasn’t good.
> fixing Intel is mostly about figuring out how they let a bunch of MBAs mismanage the company to the point that they squandered a two-decade long global monopoly on computing, in an era where computing absolutely dominated global economic development.
Isn’t the situation you described precisely where we would expect stagnation to be very likely?
> I mean, when you put it that way, it's genuinely astounding that they managed to find themselves technically so far behind, and so quickly. This was not a minor mismanagement fuck up, this is complete incompetence.
Is it though?
Or is it just the hardware of equivalent of the risky move of throwing it all away and rewriting from scratch for version 2? Intel have just incrementally improved their design, whereas AMD threw theirs away and almost disappeared in the process. In a parallel dimension, Intel could be exactly where they are now, with no AMD.
But... I agree that there was/is clearly a huge amount of incompetence letting non-engineers run an engineering company.
Large corps do gather intelligence on the competition. Even small ones do.
For example, when a new pizza place opened in town, the one I worked for when in College, immediately ordered one to see what they were like, pricing, taste, etc.
Large corps go way beyond this, and information does leak. Bribes, promised promotions and roles at the new company, theft of code, processes, trade secrets.
You do not have to use trade secrets, IP, or processes in house, and can silo that info from engineers / the main company. Yet having an awareness of these things, gives you an idea of how much to spend on research.
I wonder, did Intel not do any of this? Did they do so, but AMD security/counter espionage worked to provide a fake picture?
I wonder how large corps operate in this realm, these days.
>how they let a bunch of MBAs mismanage the company to the point that they squandered a two-decade long global monopoly on computing
A MBA Story.
Once upon a time, a MBA joined a company, worked his way from Finance Director to C grade. He could probably be described as pure breed MBA. He made friends with other rare MBA in the company and improve their career path along the way. Why? Because that is what you do, MBA always help another MBA. Somewhere along the line, the company's first non-technical / engineer MBA CEO was born.
As the company grow there are more MBAs, but not all MBAs are bad, you just need engineers and product people around these MBA. But some of these engineers or product genius [1] are pain in the ass to manage, and since many of them dont have MBA, they dont understand MBA speak and they get driven out of company decision making forum. So what happened was these product engineers got rotted out over the years even when they are C grade and out very publicly.
With leading edge tech company there is a very long lead time in their product R&D roadmap, so they were fine for the first few years, and continue to make record breaking results. Then MBA move his way up to the board and become chairman just in the time before his MBA CEO friend retired. Which means MBA Chairman gets to pick a new CEO again, guess which CEO would a MBA pick? Another MBA! Since MBA always help another MBA, new MBA CEO promoted more MBA to various rolls within the company. And MBA cult now officially dictate the company decision making.
Everything was great for a few more years, the market was growing, they have a monopoly and most important of all, they are doing great because they are MBA. At least that is what those MBA thought. Until one day cracks started appearing, MBA CEO thought it was all fine, but with time external pressure were piling in and the whole industry moved on. The market looked very differently to what company were 10 years ago. And somehow the MBA CEO already had an on going internal investigation which shows he had an inappropriate relationship and was let go. Which is the MBA speak of CEO being fired. And a recently arrived CFO was named as interim CEO while MBA Chairman and Board continue their CEO search. They could hire other MBA from outside the company, but since the leading edge tech company are a gigantic company with so many different specific domain knowledge and culture. The suitable choice outside the company are virtually nil. MBA Chairman looked at CFO, despite not being his first choice and is fairly new to the tech company, but he has a MBA! And the third MBA CEO was born.
The market is moving at a much faster pace than before. To give credit to the new MBA CEO he was trying very hard to steer the ship to a new path and direction. The tech company are already behind. These new direction and requirement are completely new, MBA middle management have no idea what they are doing and cant turn the ship fast enough. And while they are still turning, the industry has moved further along. I guess this is an example of MBA being killed by MBA.
Finally shareholder were so fed up, some of them asked the same question;
>how they mismanage the company to the point that they squandered a two-decade long global monopoly on computing, in an era where computing absolutely dominated global economic development.
There are very little choice of what shareholder could do. They need those product engineers back. And a new CEO. But the perfect candidate had a fallout with MBA Chairman a decade ago, the only option was to rally enough support to get rid of him. In the end, MBA Charimand retired, a new Board Chairman was named, and those product engineers are back to save the company.
As far as the MBA story goes, this is the end. We dont know what happen to the leading edge company, I guess that part is To be continued.
The Amlogic S905X3/4 are "jelly bean" chips under this definition - you can buy full computers with those including a motherboard of sorts, RAM and standard USB/HDMI/Ethernet/eMMC/Wifi for under $30 bulk in China. They're branded as "TV boxes". The S905X3 is also the SoC of the $55 Odroid C4 SBC.
Yet those chips are apparently made on either TSMC's or GloFo's 12nm nodes (which one I can't find). Whilst Intel's 14nm is still superior to either TSMC/GloFo 12nm by all accounts, they're definitely the same ballpark. Competition for Intel even looks rough for cheaper high volume chips. You'd need to consider that Intel still needs some time to adapt to being a third party foundry too.
Why do you think Intel will succeed in selling leading edge fab services to fabless companies when GlobalFoundries (the chip making business of AMD) was unable to compete?
To me, it seems like the same story is playing out again.
One could easily flip that question around right? "Why would Intel try to sell fab services like Global Foundry which didn't work out well?"
But more importantly, Intel actually has way more foundry expertise than Global did/does. In the case of AMD selling off Global it was more like cutting loose an under performing part of the company rather than empowering the a significant asset of the company.
Production for a datacenter CPU is not the same as production for datacenter + enthusiast-grade consumer CPUs like Zen 3 currently achieves, unfortunately. Rocket lake being backported to 14nm is still not a good sign for actual production volume, although it probably means next generation will be 10nm all the way.
Datacenter CPUs are much larger than consumer parts and yield goes down with the square of the die area. They start with these because the margins go up faster than square of the die area.
But modern techniques exist to deal with problems in a large die (ie testing and then segmenting off cores with mistakes on them), so the fact they’re starting with large chip die sizes doesn’t really tell you much, no?
The top end most profitable SKUs are fully enabled dies. That they are now able to ship dies this large is a good sign. The 10nm laptop chips they have produced so far were rumored to have atrocious yield.
"The top end most profitable SKUs are fully enabled dies." If you take a part that would be discarded because it didn't make "top end" binning, and sell it as a lesser(core count or clock speed) chip. Isn't that profit from what would otherwise be scrap?
My impression is the vast, vast majority of chips have at least part of the chip disabled to help solve yield problems. So it seems to be the status quo?
Die harvesting is not perfect. If there is an error in a logic transistor, they can turn off that cpu core and use the rest of the chip. If there is a short between ground and power, the whole chip is useless because it will melt when you plug it in.
Yields on larger chips are still substantially worse than smaller chips, even with harvesting.
And here I always thought it was because, should your large die have an issue, you can likely sell it as a lower-binned SKU with the failed area gated off. This way you end up with more inventory of a chip you intend to sell, or you have a high margin chip - win-win!
> Rocket lake being backported to 14nm is still not a good sign for actual production volume
I'm genuinely having trouble understanding what you mean by this.
Rocket Lake being backported to 14nm means that 10nm can be allocated in greater proportion toward higher-priced chips like Alder Lake and Ice Lake SP. Seems like it would be good for production volume.
I was under the impression the issue with 10nm is frequency, which lead to the rocket lake backport. Unfortunately it seems that the 10nm node's efficiency point is lower on the frequency curve. The reviewed ice lake processor was 300mhz lower then the previous generation (though with much higher core count), despite higher power draw and the process node shrink.
In laptop processors they can easily show efficiency gains from the 10nm process and IPC improvement, It appears most laptop processors end up running a power envelope lower then the ideal performance per watt efficiency. Server Processors with higher core counts means you can run more workloads per server, again providing efficiency gains. However desktop / gaming tends to be smaller core count + higher frequency with little concern of efficiency outside of quality of life factors (i.e. don't make me use a 1KW chiller). Intel has been pushing 5ghz processor frequency for years, and rocket lake continues that push (5.3ghz boost), when they drop frequency to move to 10nm, its hard to see an IPC improvement that is able to paper over that.
However alder lake CPUs will have a thread count advantage, so at least with 24 threads it should be able to show generational improvement over the current 8c/16 rocket lake parts. That will allow them to at least argue their value with select benchmarks and intel only features. Those 8 efficiency cores will likely be a BIG win on laptop, but on desktop I doubt they will compare favourably to the full fat cores on a current Ryzen 5900x (i.e. a currently available 24 core processor).
Intel is going to have at least 1 more BAD mainstream desktop generation before they can truely compete on the mainstream high end, however there is a chance they have something like a HEDT part that would allow them to at least save face. That being said, given a choice, Intel will give up desktop market share for the faster growing laptop and server markets.
I think they mean that the fact that they needed to backport Rocket Lake, versus just having all their production on 10nm, implies a much more limited production capacity than the other situation.
Production volume referring to production volume of the node still being low as a whole not risks to the being able to get volume of these particular SKUs using the node.
> Rocket lake being backported to 14nm is still not a good sign for actual production volume,
I'm not seeing a good reason for thinking this is the case. Server CPUs are harder to fab (much larger die area) and they need to fab more of them (desktop CPUs are relatively niche compared to mobile and server CPUs).
If anything this is a sign that 10nm is fully ready.
Intel has a momentum and some cult following, but they're no match for AMD in certain aspects like PCI lanes, memory channels, and some types of computation which favors AMD's architecture.
Day by day, more data centers get AMD systems by choice or by requirement (Oh, you want 8xA100 nVidia made modules with maximum performance. You need an AMD CPU since it has more PCI lanes, for example).
You don't see much AMD server CPUs around because first generation and most of the second generation has completely bought by FAANG, Dropbox, et al.
As the productions ramps up with newer generation, we can buy the overflowing parts after most of the production is gobbled up by these buyers.
In terms of PCI lanes efficiency, there is no competition between Intel and AMD. Intel is much ahead wrt AMD. Don't not be impressed about number of lanes available on the board.
> Don't not be impressed about number of lanes available on the board.
When you configure the system full-out with GPUs & HBAs, the number of lanes becomes a matter of necessity rather than a spec which you drool over.
A PCIe lane is a PCIe lane. Its capacity, latency and speed is fixed, and you need these with minimum number of PCIe switches to saturate the devices and servers you have, at least in our scenario.
I can only assume you're referring to Intel's Rocket Lake storage demonstration they tweeted out. This was using PCMark 10's Quick Storage Benchmark which is more CPU bound than anything else.
All of the other benchmarks in the PCMark test suite push the bottleneck down to the storage device.
One would think Intel might want to build a storage array that could stress the PCIe lanes but then that might show an entirely different picture than the one Intel is portraying.
> Jellybean is a common term for components that you keep in your parts inventory for when your project just needs “a transistor” or “a diode” or “a mosfet”
-----------
For many hobbyists, a Raspberry Pi or Arduino is a good example of a Jellybean. You buy 10x Raspberry Pis and stuff your drawer full of them, because they're cheap enough to do most tasks. You don't really know what you're going to use all 10x Rasp. Pi for, but you know you'll find a use of it a few weeks from now.
---------
At least, in my Comp. Engineering brain, I think N2222 transistors or 3904-transistors, or the 741 Op-amp. There are better op-amps and better transistors for any particular job. But I chose these parts because they're familiar, comfortable, cheap and well understood by a wide variety of engineers.
Well, not the 741 OpAmp anymore anyway. 741 was a jellybean back in the 12V days. Today I think 5V compatibility has become the standard voltage (because of USB). So 5V op-amps are a more important "jellybean".
I don't know how old you are, but the 741 was obsolete in the 1980s. It sticks around in EE textbooks because it's such an easy way to demonstrate problems with op-amps... high input current, low gain-bandwidth product, low slew rate, etc.
I think your jellybean op-amps would more likely be TL072, LM358, or NE5532.
Old, beaten up textbooks from the corner of my neighborhood library was talking 741 back in the 2000s, when I was in high school and started dabbling around with electricity more seriously.
Maybe it was fully obsolete by that point, but high school + neighborhood libraries aren't exactly filled with up-to-date textbooks or the latest and greatest.
I remember that Radio Shack was still selling kits with 741 in them, as well as breadboards and common components... 12V wall-warts and the like. Online shopping was beginning to get popular, but I was still a mallrat who picked up components and dug through old Radio Shack manuals into 2005 or 2006.
It was the ability to walk around, and see those component shelves sitting there in Radio Shack that got me curious about the hobby and start researching it. I do wonder how modern children are supposed to get interested into hobbies now that malls are less popular (and electronic shops like Radio Shack are basically disappeared).
------------
I don't remember what we used in college. I knew that I was more selective and understood the kinds of problems various OpAmps had back then. Also you're not really rich enough to invest into a private stockpile of chips, and instead just use whatever the labs are stocked with in college.
LM358 is the jellybean that I keep in my drawer today. If you're curious. Old habits die hard though, I still think 741 as the jellybean even though it really is obsolete today.
I'm not sure if this is the same as what you mention with 12V vs 5V, but Intel is pushing a 12V standard PSU design that is also used by the new Nvidia Founders Edition GPUs:
So when you work with analog electronics, you learn that voltage is your friend. Playing with +/- 12V (a 24V range) for "analog logic" is innately cleaner than playing with 0V to 5V (5V range).
Lets say you make a design that has +/- 1V error. On the 5V circuit, that's 20% error. But on the +/-12V circuit, it is only 4.1% error. (Going much better than 5% or 1% error is nonsensical. Most resistors you'll buy are 5% error, and capacitors are maybe +/- 20% error). So you can see, +/-1V error is acceptable on +/-12V circuits, but maybe unacceptable on 5V circuits.
If precision is important, there are 1% or 0.1% resistors available, as well as matched-resistors (which have say 1% error, but all the resistors have the same degree of error, so your "proportions" remain consistent. You can manually-match resistors together with an accurate ohm-meter as well)
As such, +/- 12V is simply easier to make "precise" electronics on. Of course, the downside is that you've got leakage all over the place (more voltage means more power-draw)
---------
But now its 2021. Most parts, even extremely cheap "Jellybean" parts like the LM358, have low errors and low-bias. And USB's popularity as a 5V delivery mechanism has grown, everyone has a USB plug somewhere to use as the basis of electricity experiments.
So while electronic engineers way back played with +/-12V, today the assumption is that you play with 0V to 5V.
My interpretation: Small buck converters are cheap and efficient, and so many parts need different voltages anyway (1.8V, 1.5V, 1.2V, etc.), that what you want to do is distribute the 12V from the main PSU and then locally convert it to the voltage you need.
If you are already putting buck converters everywhere, it makes sense to raise the voltage coming from the PSU, because you can reduce the current supplied by the PSU. Only the power supply will be running at 12V, almost everything else will be running at a lower voltage. (Would you rather supply 120W at 12V/10A or 5V/24A?)
Sometimes referred to as "applications specific processor" (ASP) or "System on chip" (SoC). These are the bulk of semiconductor sales these days as they have replaced all of the miscellaneous gate logic on devices with a single programmable block that has a bunch of built in peripherals.
Think Atmel AtMega parts, there are trillions of these in various roles. When you think of something like a 555 timer[1] that is now more cost effectively and capably replaced with an 8 pin micro-processor you can get an idea of the shift.
While these are rarely built on the "leading edge" process node, when a process node takes over for high margin chips, the previous node gets used for lower margin chips, which effectively does a shrink on their die increasing their cost (most of these chips seem to keep their performance specs fairly constant, preferring cost reduction over performance improvement.)
Anyway, the zillions of these chips in lots of different "flavors" are colloquially referred to as "jelly bean" chips.
What I don't understand is: ASML is building these machines for making ICs. Why can TSMC use them for 7nm but Intel can only use them for 10 right now? Doesn't ASML make the lenses as well so that you're "only" stuck making the etching thingy (forgot what it's called, but the reflective template of a CPU).
It seems like nobody is talking about this, could anyone shine some light?
Consider that the wavelength of red light is 700 nm, and the wavelength of UV-C is 100nm to 280nm.
And immediately, we see the problem about dropping to 10nm: that's literally smaller than the distance that photons vibrate on their way to the final target.
And yeah, 10nm and 7nm is a marketing term, but that doesn't change the fact that these processes are all smaller than the wavelength of light.
-------
So there are two ways to get around this problem.
1. Use smaller light: "Extreme UV" is even smaller than normal UV at 13.5nm. Kind of the obvious solution, but higher energy and changes the chemistry slightly, since the light is a different color. Things are getting mighty close to literal "X-Ray Lasers" as they are, so the power requirements are getting quite substantial.
2. Multipatterning -- Instead of developing the entire thing in one shot, do it in multiple shots, and "carefully line up" the chips between different shots. As difficult as it sounds, its been done before at 40nm and other processes. (https://en.wikipedia.org/wiki/Multiple_patterning#EUV_Multip...)
3. Do both at the same time to reach 5nm, 4nm, or 3nm. Either way, 10nm and 7nm is the point where the various companies had to decide to do #1 first or #2 first. Either way, your company needs to learn to do both in the long term. TSMC and Samsung went with #1 EUV, and I think Intel though that #2 multi-patterning would be easier.
And the rest is history. Seems like EUV was easier after all, and TSMC / Samsung's bets paid off.
Mind you, I barely know any of the stuff I'm talking about. I'm not a physicist or chemist. But the above is my general understanding of the issues. I'm sure Intel had their reasons to believe why multipatterning would be easier. Maybe it was easier, but other company issues drove away engineers and something unrelated caused Intel to fall behind.
Could you not simply take the 2D fourier transform of the desired on-chip pattern and use that as the mask?
It seems like the diffraction pattern of the "long" wave laser would give you exactly what you want on the chip... and if you are putting hundreds of chips on a single wafer, it seems like you might not even need to worry too much about ringing at the edges.
You might be confusing the somewhat similar shapes arising from the Fourier transform of step functions etc. or from diffraction from an individual hole or obstacle. Unlike the Fourier transform, which is a purely mathematical operation, diffraction has no "inverse": if you add more diffraction you just add more waves to the projected pattern.
The only ways to defeat diffraction are reducing it with shorter wavelengths and compensating it with multiple exposures with different patterns in which light and dark fringes compensate each other: exactly the two general approaches (EUV and multipatterning) taken by the semiconductor industry.
In the far field with a narrow bandwidth coherent light source (i.e., a laser), the projected image should be the FT of the aperture. That limiting condition is sometimes known as Fraunhofer diffraction, and generalizes to arbitrary apertures (not just a single hole).
Consider a narrow-band laser incident upon a diffraction grating for example. It produces a single point (well, two or three points, mirrored across the grating), not the uniform smudge that you'd expect by naively adding up the diffraction patterns of a bunch of slits. You should actually try this experiment for yourself!
The only trick is that you need a collimated laser and you need it to illuminate the entire inverse-FT-chip-grating aperture at once.
For one, "7nm" and "10nm" are just marketing names at this point, and don't really correspond to the physical dimensions of anything on the chips produced. Intel 10nm and TSMC 7nm are considered to have very comparable density. TSMC 5nm and is quite ahead though.
As to why it's not just a matter of buying a bunch of ASML lithography machines and plugging them in: In addition to what the other replies have noted, there is so much complexity and precision required in a fab. Consider all of the challenges that would be involved with starting with a bunch of industrial robots, and trying to build a fully automated assembly line that manufactures cars. Then scale precision requirements up by many orders of magnitude.
I'll take a crack at it, though I'm only in undergrad (took a course on VLSI this semester).
Making a device at a specific technology node (e.g. 14nm, 10nm, 7nm) isn't just about the lithography, although litho is crucial too. In effect, lithography is what allows you to "draw" patterns onto a wafer, but then you still need to do various things to that patterned wafer (deposition, etching, polishing, cleaning, etc.). Going from "we have litho machines capable of X nm spacing" to "we can manufacture a CPU on this node at scale with good yield" requires a huge amount of low-level design to figure out transistor sizings, spacings, and then how to actually manufacture the designed transistors and gates using the steps listed above.
Thanks, that clarifies it as far as possible for me who don't really know how the full manufacturing process works.
Could we simplify this roughly into "ASML makes the machines to shine light at the right nm, foundries makes the Silicon and packages it into an useful device, architecture designers give you the layout to etch" if my mom were to ask?
A few possible corrections, but one interesting one: the output of a foundry is generally known-good dice -- sliced up and tested bits of a silicon wafer. Packaging is done by IC packaging companies, and is really quite an interesting field in its own right. (Many of the foundries do some packaging in house, especially for more exotic approaches like Intel's Foveros, but this is the exception.)
I would've guessed that they packaged everything inhouse. I'm fascinated by this industry, it's so incredibly omnipresent products, yet "we" know "nothing" about the process of going from "sand" to gigaflops in your machine.
We talk to the piece of sand with lightning (USB Keyboard 5V interface), and then we hear its response back with lightning (HDMI Port interface). And somewhere in the middle, there's a source of lightning that's making it think on its own (Power supply and VRMs).
I think I kept working the joke in my brain, and it was too brutally simple at first. And then I worked it over, and now it reads too complicated. Ah well.
I'm not sure that is a worthwhile segment for them to try to compete in though. It's high mix, has a ton of competition, and doesn't necessarily leverage their fab strength.
If they price it right, it could be amazing. Computing is mostly about economics. The new node sizes greatly increase the production capacity. Half the dimension in x and y gets you 4x the transistors on the same wafer. It is like making 4x the number of fabs.
It also has speed and power advantages.
I think this release is excellent news on many levels.
Presumably because people aren’t buying chips based on the node size used and they would get bad press from calling it 2nm. My guess is it simply relates to the node refreshes in the old pattern of old node / ~1.414 = new node size. With different companies picking different arbitrarily differences in nodes aka intel’s 14nm++ was better than it’s initial 10nm process.
TLDR from Anandtech is that while this is a good improvement over previous gen, it still falls behind AMD (Epyc) and ARM (Altra) counterparts. What's somewhat alarming is that on a per-core comparison (28-core 205W designs), the performance increase can be a wash. Doesn't bode well for Intel as both their competitors are due for refreshes that will re-widen the gap.
Key question will be how quickly Intel will shift to the next architecture, Sapphire Rapids. Will this release be like the consumer/desktop Rocket Lake? E.g. just a placeholder to essentially volume test the 10nm fabrication for datacenter. Probably at least a year out at this point since Ice Lake SP was supposed to be originally released in 2H2020.
> Key question will be how quickly Intel will shift to the next architecture, Sapphire Rapids. Will this release be like the consumer/desktop Rocket Lake? E.g. just a placeholder to essentially volume test the 10nm fabrication for datacenter. Probably at least a year out at this point since Ice Lake SP was supposed to be originally released in 2H2020.
Alder Lake is meant to be a consumer part contemporary with Sapphire Rapids, which is server only. They're likely based on the same (performance) core, with Adler Lake additionally having low-power cores.
Last I heard the expectation was still that these new parts would enter the market at the end of this year.
Lately Intel seems to be getting a lot of flack here. As a layperson in the space who's pretty out of the loop (I built a home PC about a decade ago), could someone explain to me why that is? Is Intel really falling behind or dressing up metrics to mislead or something like that? I also partly ask because I feel that I only really superficially understand why Apple ditched/is ditching Intel, although I understand if that is a bit off-topic for the current article.
Intel's processes (i.e. turning files on a computer into chips) have been a complete disaster in recent years, to the point of basically missing one of their key die shrinks entirely as far as I can tell.
They are, in a certain sense, suffering from their own success in that their competitors have basically been nonexistant up until Zen came about (and even then only until Zen 3 have Intel truly been knocked off their single thread perch). This has led to them getting cagey, and a bit ridiculous in the sense that they are not only backporting new designs to old processes but also pumping them up to genuinely ridiculous power budgets. With Apple, AMD, and TSMC they have basically been caught with their trousers down by younger and leaner companies.
Ultimately this is where Intel need good leadership. The mba solution is to just give up and do something else (e.g. spin off the fabs), but I think they should have the confidence (as far as I can tell this is what they are doing) to rise to the technical challenge - they will probably never have a run like they did from Nehalem to shortly before now, but throwing in the towel means that the probability is zero.
Intel have been in situations like this before, e.g. When Itanium was clearly doomed and AMD were doing well (amd64), they came back with new processors and basically ran away to the bank for years - AMD's server market share is still pitiful compared to Intel (10% at most), for example.
Yeah, it feels like Intel is sort of in the Pentium 4 days again, but without the Core microarchitecture sitting in the wings, and way more well funded competition - not just AMD.
It's hard to see what they can do. AMD is winning on every measure I can see on x86, and M1/Ampere feel like a huge curveball thrown at them. I don't even think they could switch to using TSMC fabs to help them shrink the process size as all the capacity is booked up for years (especially by Apple).
I think also the dynamics have changed in the server market in the last 10 years or so. You now have a much more condensed market with this big cloud operators buying millions of CPUs and having much more leverage over them before.
I would be surpised if by the end of the 2020s if ARM wasn't standard in nearly all server workloads. Most software can run on it unmodified now - databases all work great on ARM, and most applications are interpreted or run on a VM, so it is pretty easy to move to.
I don't want to council despair but I'm not as sanguine as you either. Intel has had disastrous microarchitectures before. Itanium, P4, and previous ones. But it's never had to worry about recovering from a process disaster before. It might very well be able to but I worry.
I'm not exactly optimistic either, I just think that the doomsaying is overblown (and sometimes looks like a tribal thing from Apple and AMD fans if I'm being honest - i.e. companies aren't your friends)
> Intel's processes (i.e. turning files on a computer into chips) have been a complete disaster in recent years, to the point of basically missing one of their key die shrinks entirely as far as I can tell.
Which one? I dont believe they missed a die shrink, it just took a long time. Intel 14nm came out in 2014 with their Broadwell Processors, and the next node, 10nm came out in 2019 (technically 2018, but very few units shipped that year).
Intel killed the longstanding "tick tock" model in 2016 because of failures with 10nm yield and the higher-than-expected costs of 14nm. Intel got too aggressive with the timeline of the die shrink, which led to them trying to do 10nm on DUV rather than waiting for EUV technology where the light is about an order of magnitude shorter in wavelength than that of DUV (and thus able to resolve today's nano-scale features without all the RETs needed for DUV).
From the 2015 10-K [0]:
"We expect to lengthen the amount of time we will utilize our 14nm and our next-generation 10nm process technologies, further optimizing our products and process technologies while meeting the yearly market cadence for product introductions."
Spoiler alert: In the five years after shelving the tick-tock model, Intel also missed the yearly market cadence for product introductions.
"Released" is a very generous term for a product that had one SKU anyone ever saw outside of 3DMark. (And two if you include the m3-8114Y, which appears to only exist in some uploaded benchmarks as far as anyone can tell.)
Yes, quite simply they have fallen behind while also promising things they have failed to deliver. As an example, their most recent flagship release is the 11900k, which has 2 fewer cores (now 8) than its predecessor (had 10, 10900k), and almost no improvement to speak of otherwise (in some games its ~1% faster). On the other hand, AMD's flagship, which to be fair is $150 more expensive, has 16 cores, very similar clock speeds, and is much more energy efficient (intel and amd calculate TDP differently). Overall, AMD is the better choice by a large margin and Intel is getting flock because it sat on its laurels for the last decade(?) and hasn't done anything to improve itself.
Naturally the 11900K performs quite a bit worse than the 10900K in anything which uses all cores, but the remarkable thing about the 11900K is that it even performs worse in a bunch of game benchmarks, so as a product it genuinely doesn't make any sense.
They can't get their next-gen fabs (chip factories) into production. It's been a problem long enough that they're not even next-gen anymore: it's current-gen, about to be previous-gen.
So what you're seeing isn't really anti-Intel, it's probably often more like bitter disappointment that they haven't done better. Though I'm sure there's a tiny bit of fanboy-ism for & against Intel.
There's definitely some of that pro-AMD fanboy sentiment in the gaming community where people build their own rigs: AMD chips are massively cheaper than a comparable Intel chip.
Just a minor nitpick regarding your last paragraph, this is no longer the case. Intel is now significantly cheaper after they heavily cut prices across the board.
For instance, you can now get an i7-10700K (which is roughly equivalent in single thread and better in multi thread) for cheaper than a R5 5600X.
Edit: picking individual processors to compare (especially low volume ones) is often not useful when talking about how well a company is competing in the market.
The comment you're replying to is "cherry-picking" the current-gen AMD processor which offers the best value for most users. You're cherry-picking an Intel processor which almost no-one has any reason to buy over other Intel options (the i9-11900K is much more expensive than the 11700K or 10700k for little extra performance; AMD had a few chips like this last gen, and they actually downplayed how much of a price increase this gen was by only comparing to those poor-value chips). One of these comparisons is a lot more useful than the other.
I still see a significant discount on AMD chips for the same level of performance. An i9 11900k will cost me about $620 for the same performance of obtained from a $450 Ryzen 7 5800x.
Yes, which is why I was practically shouting inside my head when Intel recently announced it would finally open its fabs for outside work the way TSMC operates. Finally! Considering the way Intel is structured around its own designs it might never be a significant source of profits for them, but it should at least help to cover their fixed costs during dry spells and production problems on their own designs.
A perfect storm. Intel had trouble with its 10nm/7nm engineering processes, which TSMC has been able to achieve. AMD had a resurgence with Zen arch. and ARM/Apple/TSMC/Samsung put 100s of billions to catchup with the x86 performance.
Intel is still biggest player in the game, because even though they are stuck at 14nm, AMD isn't able to manufacture enough to take bigger chunks of the market. Apple won't sell it to PC/Datacenter space, rest are still niche.
I think this isnt quite fair, their laptop 10nm chips have been shipping in volume since last year, and their server chips were released today, with 200k+ units already shipped (according to Anandtech). The only line left on 14nm is socketed Desktop processors, which is a relatively small market compared to laptops and servers.
Hacker News users generally aren't very interested in laptop processors. Sure business wise they're incredibly important, but as far as getting flack on hacker news, laptop chips won't stop it. People here have been waiting for intel 10nm on server and especially desktop for 6 years now.
Lots of shade because they first missed the whole mobile market, then got beat by AMD Zen by missing the chiplet concept and a successful current-gen process size, then finally also overshadowed by Apple's M1. The M1 thing is interesting, because it likely means the next set of ARM Neoverse CPUs for servers, from Amazon and others, will be really impressive. Intel is behind on many fronts.
> Intel is already behind AMD -- they have no product segment where they are absolutely superior.
There are some who would argue this claim, but I think it's at least a defensible one.
Still, availability is an important factor that isn't captured by benchmarking. AMD has had CPU inventory trouble in the low-end laptop segment and high-end desktop segment alike.
> The consensus seems to be that Intel who have their own fabs -- never really nailed anything under 14nm and are now being outcompeted.
Intel has done well with 10nm laptop CPUs. They were just very late to the party. Desktop and server timelines have been quite a bit worse. I agree Intel did not nail 10nm, but they're definitely hanging in there. It's one process node at the cusp of transition to EUV, so some of the defeatism around Intel may be overzealous if we keep in mind that 7nm process development has been somewhat parallel to 10nm because of the difference in the lithographic technology.
Absolutely. Intel has been stuck on the 14nm node for a very, very long time. 10nm CPUs were supposed to ship in 2015, they did really only in late 2019, 2020. Meanwhile AMD caught up and Intel has been doing the silliest shenanigans to appear as if they were competitive, like in 2018 they demonstrated a 28 core 5GHz CPU and kinda forgot to mention the behind-the-scenes one horsepower (~745W) industrial chiller keeping that beast running.
Also, the first 10nm "Ice Lake" mobile CPUs were not really an improvement over the by then many times refined 14nm chips "Comet Lake". It's been a faecal pageant.
Some of the other comments above have touched on this, but I think there is also a bit of latent anti-Intel sentiment in many people's minds. Intel extracted a non-trivial price premium out of consumers for many, many years (both for chips and by forcing people to upgrade motherboards by changing CPU sockets) while AMD could only catch up to them for brief periods of time. People paid that price premium for one reason or another, but it doesn't mean they were thrilled about it.
Many people, I'd say especially enthusiasts, were quite happy when AMD was able to compete on a performance/$ basis and then outright beat Intel.
Of course, now the tables have turned and AMD is able to extract that price premium while Intel cut prices. Who knows how long this will last, but Intel is still the 800 lb gorilla in terms of capacity, engineering talent, and revenue. I don't think we've heard the last from them.
Intel was unable to improve their fabrication process year after year, while promising to do so repeatedly. Now, they have been practically lapped twice. Apple has a somewhat specific use case, but their cpus have significantly better performance per watt.
"As impressive as the new Xeon 8380 is from a generational and technical stand-point, what really matters at the end of the day is how it fares up to the competition. I’ll be blunt here; nobody really expected the new ICL-SP parts to beat AMD or the new Arm competition – and it didn’t. The competitive gap had been so gigantic, with silly scenarios such as where a competing 1-socket systems would outperform Intel’s 2-socket solutions. Ice Lake SP gets rid of those more embarrassing situations, and narrows the performance gap significantly, however the gap still remains, and is still undeniable."
This sounds about right for a company fraught with so many process problems lately: Play catch up for a while and hope you experience fewer in the future to continue to narrow the gap.
"Narrow the gap significantly" sounds like good technical progress for Intel. But the business message isn't wonderful.
I don't know that it's all so bad. The final takeaway is that a 660mm2 Intel die at 270W got about 70-80% of the performance that AMD's 1000mm2 MCM gets at 250W. So performance per transistor is similar, but per watt Intel lags. But then the idle draw was significantly better (AMD's idle power remains a problem across the Zen designs), so for many use cases it's probably a draw.
That sounds "competetive enough" to me in the datacenter world, given the existing market lead Intel has.
I would argue that for high-end servers, idle draw is a bit of a non-issue as presumably either you have only one of these machines and it’s sitting idle (so no matter how inefficient it doesn’t matter) or you have hundreds/thousands of them and they’ll be as far from idle as it’s possible to be.
AMD’s idle power consumption is a bigger issue for desktop, laptop, and HEDT.
If you have enough servers for idle draw to be more than a rounding error in your opex breakdown, then you have a strategy to keep idle time to zero. It doesn’t make any financial sense (no matter how low idle draw is) to have a server sit idle (or even powered off, but that’s a capex problem).
Can confirm, built out a datacenter space in a past life. Power costs were of limited concern - cooling was the limited resource. Even then, literally no one went down a spec sheet and compared "hmm, this one has a tiny less amount of watts idle". We just kept servers dark regardless so that we can save on cooling. Nitpicking idle draw for server processors just isn't realistic for a lot of cases.
Yeah, I think it's hard to keep your computers at 100% utilization for the entire day. You host services close to your users, and your users go to bed at some point, many of them at around the same time every day. Then your computers have very little work to do.
Some bigger companies have a lot of batch jobs that can run overnight and steal idle cycles, but you have to be gigantic before that's realistic. (My experience with writing gigantic batch jobs is that I just requisitioned the compute at "production quality" so I could work on them during the day, rather than waiting for them to run overnight. Not sure what other people did, and therefore not really sure how much runs overnight at big companies.)
Cloud providers have spot instances that could take up some of this slack, but I bet there is plenty of idle capacity precisely because the cost can't go to $0 because of electricity use. Or I could be completely wrong about workloads, maybe everyone has their web servers and CI systems running at 100% CPU all night. I've never seen it, though.
The target market for this part is not that kind of datacenter.
Based on the article they're targeting high performance compute, i.e. "application codes used in earth system modeling, financial services, manufacturing, as well as life and material science."
The opposite is true... a major advantage of running cloud infrastructure is that you can run your CPUs near 100% all the time. CPUs which are not running full bore can have jobs moved to them.
Shouldn't datacenters attempt to minimize idle time though? A server sitting at idle is a depreciating asset that could likely be put to more productive use if tasks were rescheduled to take advantage of idle time (this would also reduce the total number of servers needed).
Utilization is a very difficult problem to solve. The difference between peak and off peak utilization can be as much as 70% or more depending on the application.
That is definitely the objective but the reality is that load is not* uniform over the day. So you are paying to keep some number of servers hot (I don't know about spinup/spindown practices in modern datacenters).
I doubt this applies to HPC (the target market for this part) as they either schedule jobs closely or could, I imagine, shut them down. But I'm not in that space either so this is merely conjecture.
* I am sure there are corner cases where the load is uniform, but they are by definition few.
large datacenters have hardware orchestration systems that let them turn off unused machines. There really is no reason to have lots of machines on but unused. At least, that is not a significant enough event to be a determining factor in hardware purchasing.
If you are the one paying the electricity bills for that datacenter, then yes, it probably matters to you a lot. If you are just renting a server from aws or gcp, it probably matters less. Although, I assume costs born from idle inefficiency will probably be passed to the customer...
Most servers are doing things for human beings, and we have irregular schedules. Standard rule of thumb is that you plan for a peak capacity of 10x average. A datacenter that doesn't have significant idle capacity is one that's some kind of weird special purpose thing like a mining facility.
Servers are very expensive. It is not cost efficient to buy a server and let it sit idle or turn it off.
Even purchased on the used market, a decent server could cost $4000; if you spread that cost across 4 years, that's $80ish per month just to pay for the server, which is probably around your power / rent costs.
The costs are more stark if you're buying new, where it is easy to sped $10-30k for a decent spec server.
Don't buy a server to leave it turned off, you're wasting money.
Certainly for bit players and corporate datacenters with utilization < 1% you'd expect the median server to just sit there. For larger (amazon, google, etc) players the economic incentives against idleness are just too great.
> For larger (amazon, google, etc) players the economic incentives against idleness are just too great.
Not all workloads are CPU-bound. Cloud providers have many servers for which the CPUs are idle most of the time, because they're disk-bound, network-bound, other-server-bound, bursty, or similar. They're going to aim to minimize the idle time, but they can't eliminate it entirely given that they have customer-defined workloads.
The workloads are determined by their customers, and customers don't always pick the exact size system they need (or there isn't always an option for the exact size system they need). The major clouds are going to upgrade and offer faster CPUs as an option, people are going to use that option, and some of their workloads will end up idling the CPU. Major cloud vendors almost certainly have statistics for "here's how much idle time we have, so here's approximately how much we'd save with lower power consumption on idle".
This sounds like a dangerous assumption to make. I would expect that needing 25% more machines for the same performance would be a non-starter for many potential customers.
A bit off topic from the server CPU discussion, but I was curious how well AMD is advancing idle power consumption.
For example, the Ryzen 3000 desktop chips seemed to have the issue[0], but the same Zen 2 cores seem to have found some improvements in the Ryzen 4000 mobile chips[1].
I didn't want to just rely on Reddit forum comments, so I found this measure of the Ryzen 3600[2].
> When one thread is active, it sits at 12.8 W, but as we ramp up the cores, we get to 11.2 W per core. The non-core part of the processor, such as the IO chip, the DRAM channels and the PCIe lanes, even at idle still consume around 12-18 W in the system.
My interpretation was expect ~12 W or more idle consumption (just from the CPU package), but I'm not sure I understand it correctly.
I couldn't find the same information for Ryzen 4000 laptops, but the same APU is tested in a NUC, where the total system draw (at the wall) at idle was about 10-11 W, still nearly double that of a Core i7 U-series NUC[3], but certainly lower than that of just the CPU package in the Ryzen 3600.
Anecdotally, my 45W Ryzen 7 4800H laptop with 15.6" 1080p screen lasts about 4 hours on 80% of the 60Wh battery with 95% brightness, doing various non-intensive tasks. Though I don't know how well the battery holds up on complete non-use standby.
Ryzen 3000 desktop processors use a chiplet design, with the IO die built on an older process than the processor dies. Ryzen 4000 mobile processors are monolithic dies, so they don't have the extra power of the inter-chiplet connections and they're entirely 7nm parts instead of a mix of 7nm and 14nm.
> I couldn't find the same information for Ryzen 4000 laptops
I measured an Asus Mini PC PN50 with a Ryzen 4500U. The idle power usage was 8.5 Watt for the system. This with 32GB of memory and a SATA SSD installed. It would be nice if it was lower than this, but it isn't too bad. Interestingly the machine used 1.2 Watt while off after it wasn't on power, 0.5 Watt after starting up and shutting it down.
Recently noticed some people focussing on low power but powerful 24/7 home "servers". Systems that are on 24/7, but often idle. One system used around 4.5 Watt in idle. The "brick" / power adapter often uses too much power, even when everything is off.
It's worth noting the difference in tone from the bulk of the discussion in this thread.
Objectively, Ice Lake "isn't too bad" in comparison with Zen 3 either. "It would be nice if it" drew a 20% less power at load, but in practice you could use either product without trouble.
In order to satisfy their wafer supply agreement with Global Foundries, AMD ships a GF fabbed I/O die in all their volume processors. I'm pretty sure that once they have met the terms required to exit that wafer supply agreement, we will see AMD shift that I/O die to a newer process with lower power draw.
You can't really compare die sizes of a MCM and a single die and expect to get transistor counts out of that. So much of the area of the MCM is taken up by all the separate phys to communicate between the chiplets and the I/O die, and the I/O die itself is on GF14nm (about equivalent to Intel 22nm) last time I checked, not a new competitive logic node.
There's probably a few more gates still on the AMD side, but it's not the half again larger that you'd expect looking at area alone.
I'm not sure that's a fair area comparison? AMD only has around 600 mm2 of expensive leading edge 7nm silicon and uses chiplets to up their yields. The rest is the connecting bits from an older and cheaper process. Intel's full size is a single monolithic die on a leading edge process.
An interesting observation, which is even more damning for Intel: moderately multicore CPUs from AMD are made in an effective and efficient way, while Intel, despite boasting about their advanced packaging technology, is slow to adopt heterogeneous chiplets and is paying the price of this technological rigidity. This is in addition to miserable failure with new processes.
Do you see getting the most of limited 7nm production capacity as "cheating"? The only fair comparison is in the resulting performance and TDP, and Intel remains seriously behind.
As others have said, in isolation and perfect manufacturing capability, it's absolutely true a monolithic die is better.
However, because each die is smaller, you can be more aggressive with your binning, potentially having a larger number of chips which you can bin into a higher frequency bucket.
Ultimately, it's a matter of trade-offs, and what's right in one place might be wrong in another (see, e.g., AMD manufacturing mobile APUs on monolithic dies versus all their desktop and server parts using chiplets).
We are seeing more of the industry moving to chiplets and similar solutions, notably Intel with their tiles.
All things being equal a chiplet design will underperform a monolithic die. But we've already seen the benchmarks on the performance of Milan so looking at chiplets versus monolithic is mostly about considering AMD's strategy and constraints rather than how the chips perform.
"At the end of the day, Ice Lake SP is a success. Performance is up, and performance per watt is up. I'm sure if we were able to test Intel's acceleration enhancements more thoroughly, we would be able to corroborate some of the results and hype that Intel wants to generate around its product. But even as a success, it’s not a traditional competitive success. The generational improvements are there and they are large, and as long as Intel is the market share leader, this should translate into upgraded systems and deployments throughout the enterprise industry. Intel is still in a tough competitive situation overall with the high quality the rest of the market is enabling."
I found it a little weird that they conclusions section didn't mention the AMD or ARM competition at all, given that the Intel chip seemed to be behind them in most of the tests.
The conclusions section was quoted in my post and they explicitly mention it.
"As impressive as the new Xeon 8380 is from a generational and technical stand-point, what really matters at the end of the day is how it fares up to the competition. I’ll be blunt here; nobody really expected the new ICL-SP parts to beat AMD or the new Arm competition – and it didn’t. "
>This sounds about right for a company fraught with so many process problems lately
Publicly the problems have been lately, but the things that caused these problems have happened much further back.
I'm cautiously bullish on Intel. From what I gather, Intel is in a much better place internally. They have much better focus, there is less infighting, its more engineering then sales lead, they have some very good people and they are no longer complacent. It will however take years before this is becomes visible from the outside.
Given the demand for CPUs and the competitions inability to deliver, I think intel will do OK even if they are no ones first choice of CPU vendor, while they try to catch up.
It is certainly good enough to compete, prioritising Fab capacity to Server unit and lock in those important ( Swaying ) deals from clients. Sales and Marketing work their connection along with software tools that HPC markets needs and AFAIK is still far ahead of AMD.
And I can bet those prices have lots of room for special discount to clients. Since RAM and NAND Storage dominate the cost of server, the difference of Intel and AMD shrinks rapidly in the grand scheme of things, giving Intel a chance to fight. And there is something not mentioned enough, the importance of PCI-E 4.0 Support.
I wanted to rant about AMD, but I guess there is not much point. ARM is coming.
Nice to see that AVX512 hasn't died with Xeon Phi. I see it coming out in a number of high end but lightweight notebooks too (Surface Pro with i7 10XXG7, MacBookPro 13" idem). This is a nice way to avoid needing GPU for heavily vectorizable compute tasks, assuming you don't need the CUDA ecosystem.
GPGPU will never really be able to take over CPU-based SIMD.
GPUs have far more bandwidth, but CPUs beat them in latency. Being able to AVX512 your L1 cached data for a memcpy will always be superior to passing data to the GPU.
With Ice Lake's 1MB L2 cache, pretty much all tasks smaller than 1MB will be superior in AVX512 rather than sending it to a GPU. Sorting 250,000 Float32 elements? Better to SIMD Bitonic sort / SIMD Mergepath (https://web.cs.ucdavis.edu/~amenta/f15/GPUmp.pdf) on your AVX512 rather than spend a 5us PCIe 4.0 traversal to the GPU.
It is better to keep the data hot in your L2 / L3 cache, rather than pipe it to a remote computer (even if the 16x PCIe 4.0 pipe is 32GB/s and the HBM2 RAM is high bandwidth once it gets there).
--------
But similarly: CPU SIMD can never compete against GPGPUs at what they do. GPUs have access to 8GBs @500GB/s VRAM on the low-end and 40GBs @1000GB/s on the high end (NVidia's A100). EDIT: Some responses have reminded me about the 80GB @ 2000GB/s models NVidia recently released.
CPUs barely scratch 200GB/s on the high end, since DDR4 is just slower than GPU-RAM. For any problem where data-bandwidth and parallelism is the bottleneck, that fits inside of GPU-VRAM (such as many-many sequences of large scale matrix multiplications), it will pretty much always be better to compute that sort of thing on a GPU.
> Being able to AVX512 your L1 cached data for a memcpy will always be superior to passing data to the GPU.
The two last apps I worked on have been GPU-only. The CPU process starts running and launches GPU work, and that's it, the GPU does all the work until the process exits.
There is no need to "pass data to the GPU" because data is never on CPU memory, so there is nothing to pass from there. All network and file I/O goes directly to the GPU.
Once all your software runs on the GPU, passing data to the CPU for some small task doesn't make much sense either.
So we know that GPUs are really good at raytracing and matrix multiplication, two things that are needed for graphics programming.
However, the famous "Moana" scene for Disney-level productions is a 93GB (!!!!) scene statically, with another 131GBs (!!!) of animation data (trees blowing in the winds, waves moving on the shore, etc. etc.).
That's simply never going to fit on a 8GB, 40GB, or even 80GB high-end GPU. The only way to work with that kind of data is to think about how to split it up, and have the CPU store lots of the data, while the GPU processes pieces of the data in parallel.
Which has been done before, mind you. But it should be noted that the discussion point for GPU-scale compute runs into practical RAM-capacity constraints today, even on movie-scale problems from 5 years ago (Moana was released in 2016, and had to be rendered on hardware years older than 2016).
But yes, if your data fits within the 8GBs GPU (or you can afford a 40GB or 80GB VRAM GPU and your data fits in that), doing everything on the GPU is absolutely an option.
>That's simply never going to fit on a 8GB, 40GB, or even 80GB high-end GPU. The only way to work with that kind of data is to think about how to split it up, and have the CPU store lots of the data, while the GPU processes pieces of the data in parallel.
There is always a problem size that does not fit into memory.
Whether that memory is the GPU memory, or the CPU memory, doesn't really matter.
We have been solving this problem for 60 years already. It isn't rocket science.
---
The CPU doesn't have to do anything.
The GPU can map a file stored on hard disk to VRAM memory, do random access into it, process chunks of it, write the results into network sockets and send them over the network, etc.
The only thing the CPU has to do is launch a kernel:
int main(args...) {
main_kernel<<<...>>>(args...);
synchronize();
return 0;
}
and this is a relatively accurate depiction of how the "main function" of the two latests apps I've worked on look like: the GPU does everything.
---
> However, the famous "Moana" scene for Disney-level productions is a 93GB (!!!!) scene statically, with another 131GBs [...] That's simply never going to fit on a high-end GPU.
LOL.
V100 with 32Gbs and 8x per rack gave you 256 Gb of VRAM addressable from any GPU in the rack.
A100 with 80GB and 16x per rack give you 1.3 TB of VRAM addressable from any GPU in the rack.
You can fit Moana in GPU VRAM in a now old DGX-2.
If you are willing to bet cash on Moana never fitting on a single GPU, I'd take you on that bet. Sounds like free money to me.
This person rendered the Moana scene on just 8GBs of GPU VRAM. It does this by rendering 6.7GB chunks at a time on the GPU, with the CPU keeping the RAM-heavy "big picture" in mind. (EDITED paragraph. First wording of this paragraph was poor).
------
Its not that these problems "cannot be solved", its that these problems "become grossly more complicated" when under RAM / VRAM constraints. They're still solvable, but now you have to do strange techniques.
------
With regards to a Ray-tracer, tracing the ray-of-light that's bouncing around could theoretically touch ANY of the 93GBs of static object data (which could have been shifted by any of the 131 GBs of animation data). That is to say: a ray that bounces off of a any leaf on any tree could bounce in any direction, hitting potentially any other geometry in the scene.
That pretty much forces you to keep the geometry in high-speed RAM, and not do an I/O cycle between each ray-bounce.
As a rough reminder of the target performance: Raytracers aim at ~30 million to 30-billion ray-bounces per second, depending on movie-grade vs video-game optimized. Either way, that level of performance is really only ever going to be solved by keeping all of the geometry data in RAM.
> A100 with 80GB and 16x per rack give you 1.3 TB of VRAM addressable from any GPU in the rack.
That doesn't mean it makes sense to traverse a BVH-tree across a relatively high-latency NVLink connection off-chip. I know GPUs have decent latency hiding but... that's a lot of latency to hide.
Again: your CPU-renderers can hit 10s of millions of rays per second. I'm not sure if you're gonna get something pragmatic by just dropping the entire geometry into distributed NVSwitch'd memory and hoping for the best.
Honestly, that's where the 8GB CPU+GPU team becomes interesting to me. A methodology for clearly separating the geometry and splitting up which local compute-devices are responsible for handling which rays is going to scale better than a naive dump reliant on remote-connections pretending to be RAM.
Video games hit Billions of rays/second. The promise of GPU-compute is on that order, and I just doubt that remote RAM accesses over NVLink will get you there.
> If you are willing to bet cash on Moana never fitting on a single GPU, I'd take you on that bet. Sounds like free money to me.
The issue is not Moana (or other movies from 2016), the issue are the movies that will be made in 2022 and into the future. Especially if they're near photorealistic like Marvel-movies or Star Wars.
----------
The other problem is: what's cheaper? A DGX-system could very well be faster than one CPU system. But would it be faster than a cluster of Ice-Lake Xeons with AVX512 each with the precise amount of RAM needed for the problem? (Ex: 512GBs in some hypothetical future movie?)
A team probably would be better: CPUs have expandable RAM, that's their biggest advantage. GPUs have fixed RAM. Slicing the problem up so that pieces of Raytracing fits on GPUs, while the other, "bulkier" bits fit on CPU DDR4 (or DDR5), would probably be the most cost-efficient way at solving the raytracing problem.
The GPU-Moana experiment showed that "collecting rays that bounce outside of RAM" is an efficient methodology. Slice the scene into 8GB chunks, process the rays that are within that chunk, and the collate the rays together to find where the rays go.
These convos are super interesting to me as a guy building his own poor-man's render farm, and validate some of what I'd heard elsewhere. You do need a huge amount of memory for some scenes, and while the raytracing GPU in my notebook is great for modeling, it doesn't take many 4k/8k textures to run out/crash. I'm not savy enough to work only in GPU and like you say we don't know where a light bounce is headed anyway right. Even with sufficiently small scenes, GPUs and systems to run them are expensive when you can buy 10 year old rackmount servers with 12 cores @ 3GHz and loaded with cheap old memory for less than $200 a pop total. At this point the shipping might cost more than the shipped. I just bought a lot of old servers from a local university for a better deal than that, half junkyard bound they are so ancient.
We know that GPUs are really good at far more than ray tracing and matrix multiplication. Oversimplifying a bit, they’re great at basically any massively parallel operation that has minimal branching and can fit in memory. Using a GPU to just add two images together probably isn’t worth it, but many real world workflows allow you to operate solely on the GPU.
If you’re Disney, you can afford boxes with 10+ A100s with NVLink sharing the memory in a single 400+ GB pool. Unknown if that ends up being more economical than the equivalent CPU version, but it’s important to understand in order to evaluate the future of GPUs.
> There is no need to "pass data to the GPU" because data is never on CPU memory, so there is nothing to pass from there. All network and file I/O goes directly to the GPU.
This is very interesting - do you have a link that explains how it works / is implemented?
The GPU itself can send PCIe 4.0 messages out. So why not have the GPU make I/O requests on behalf of itself? Its a bit obscure, but this feature has been around for a number of years now. The idea is to remove the CPU and DDR4 from the loop entirely, because those just bottleneck / slowdown the GPU.
--------
From an absolute performance perspective, it seems good. But CPUs are really good and standardized at accessing I/O in very efficient ways. I'm personally of the opinion that blocking and/or event driven I/O from the CPU (with the full benefit of threads / OS-level concepts) would be easier to think about than high-performance GPU-code.
But still, its a neat concept, and it seems like there's a big demand for it (see PS5 / XBox Series X).
The CPU is still acting as the PCIe controller though (right?), which kind of makes the CPU act like a network switch. PCIe is a point-to-point protocol kind of like ethernet too. Old-school PCI was a shared bus so devices might be able to directly talk to each other, but I don't think that was ever actually used.
As you can see, the GPU is attached to the x16 slot, and the 4x NVMe SSDs are attached to the GPU. When the CPU wants to store data on the SSD, it communicates first to the GPU, which then pass-throughs the data to the four SSDs.
NVidia's GPUs would command the PCIe switch to grab data, without having the PCIe switch send data to the CPU (which would most likely be dropped in DDR4, or maybe L3 in an optimized situation).
My understanding matches yours, but it's worth noting that (IIUC) memory and PCIe are (last time I checked?) a separate I/O subsystem that just happens to reside within the same package as the CPU on modern chips. So P2PDMA avoids burning CPU cycles and RAM bandwidth shuffling data around that you never wanted to use on the CPU anyway. (Also see: https://lwn.net/Articles/767281/)
It should be noted that most CPUs do use DDR4 for good reason: they have an expectation to use more than the capacity of HBM2. GPUs (and a supercomputer CPU, like the A64FX) can rely upon the fact that their workloads are designed for distributed-compute and/or otherwise fit inside of 32GBs or 40GBs.
CPUs on the other hand, could very well be reading/writing to a 2TB in-memory database these days, even on a single-socket single-node system like AMD EPYC. That flexibility to have as much (or as little) RAM as the customer demands is a major advantage of traditional CPUs like EPYC or Xeon, which the A64FX cannot partake in.
If you need more than 32GBs of RAM, A64Fx is a no-go. That HBM2 is soldered directly onto the package and cannot expand.
Yes, and 32 GB seems rather limited for a 48 core chip.
While not a "general purpose" CPU, this HPC CPU does show IMO that it's at least possible of making a CPU that does compete in some aspects with what GPUs do.
Although an A100 still has twice the bandwidth and over 3x the GFLOPS without tensor cores and more than 6x with.
In my experience, the most important aspect missing in most CPU GPU discussions, is that CPUs have a massive cache compared to GPUs, and that cache has pretty good bandwidth (~30 GB/core?), even if main memory doesn't. So even if your task's hot data doesn't fit in L2 but in L3/core, AVX-whatever per core processing is a good bet regardless of what a GPU can do.
Another aspect that seems like a hidden assumption in CPU-GPU discussions is that you have the time-energy-expertise budget to (re)build your application to fit GPUs.
On the memory perspective, I basically see problems in roughly the following grouping of categories:
40TBs+ -- Storage-only solutions. "External Tape Merge sort algorithm", "Sequential Table Scan", etc. etc. (SSDs or even Hard drives if you go big enough)
4TB to 40TBs -- Multi-socket DDR4 RAM is king (8-way Ice Lake Xeon Scalable Platinum will probably reach 40TBs). Single-node distributed memory with NUMA / UPI to scale.
1TB to 4TB -- Single Socket DDR4 RAM (EPYC, even if at 4x NUMA. Or Single-node Ice Lake).
80GB to 1TB -- DGX / NVlink distributed memory A100 ganging up HBM2 together. GPU-distributed RAM is king.
256MBs to 80GBs -- HBM2 / GDDR6 Graphics RAM is king (80GB A100 2TB/s).
1.5MBs to 256MBs -- L3 cache is king (8x32MBs EPYC L3 cache, or POWER9 110MB+ L3 cache unified)
128kB to 1.5MBs -- L2 cache is king (1.25MB Ice Lake Xeons L2, this article)
1kB to 128kB -- L1 cache is king. (128kB L1 cache on Apple M1). Note: "GPU __Shared__" is a close analog to L1 and competes against it, but is shared between 32 to 256 GPU threads, so its not an apples-to-apples comparison.
1kB and below -- The realm of register-space solutions. (See 64-bit chess engine bitboards and the like). Almost fully CPU-constrained / GPU-constrained programming. 256x 32-bit GPU registers per GPU-thread / SIMD thread. CPUs have fewer nominal registers, but many "out of order" buffers or "reorder buffers" that practically count as register storage in a practical / pragmatic sense. CPUs just use their "real registers" as a mechanism to automatically discover parallelism in otherwise single-thread written code.
------------
As you can see: GPUs win in some categories, but CPUs win in others. And these numbers change every few months as a new CPU and/or GPU comes out. And at the lowest levels: CPUs and GPUs cannot be compared due to fundamental differences in architecture.
For example: GPU __shared__ memory has gather/scatter capabilities (the NVidia PTX instructions / AMD GCN instructions permute vs bpermute), while CPUs traditionally only accelerate gather capabilities (pshufb), and leave vgather/vscatter instructions to the L1 cache instead. GPUs have 32x ports to __shared__, so every one of the 32-threads in a wave-front can read/write every single clock-tick (as long as all 32 they are on different ports/alignment, or you have a special one-to-all broadcast). CPUs only have 2 or 4 ports, so vscatter and vgather operate slowly, as if a single thread were reading/writing each of the memory locations.
But CPU L1 cache has store-forwarding, MESI + cache coherence, and other acceleration features that GPUs don't have.
GPUs are therefore more efficient at sharing data within workgroups of ~256 threads, but CPUs are more efficient at sharing data between cores, or even among out-of-die NUMA solutions, thanks to robust MESI messaging.
FWIW: your DRAM numbers are quoting clock speeds and not bandwidth. They aren't linear at all. In fact with enough cores you can easily saturate memory that wide, and CPUs are getting wider just as fast as GPUs are. The giant Epyc AMD pushed out last fall has 8 (!) 64 bit DRAM channels, where IIRC the biggest NVIDIA part is still at 6.
Yeah. And at 3200 Mbit/sec, that comes out to 200GB/s. (3200 MHz x 8-bytes (aka 64-bit) == 25GB/s. x8 channels == 200GB/s).
> where IIRC the biggest NVIDIA part is still at 6.
That's 6x *1024-bit* HBM2 channels. Total bandwidth is 2000GBps, or over 10x the speed of the "8x channel EPYC". Yeah, HBM2 is fat, extremely fat.
----------
*ONE* HBM2 channel offers over 300GBps bandwidth. And the A100 has *SIX* of them. Literally ONE HBM2 channel beats the speed of all 8x DDR4 EPYC memory channels working in parallel.
25600 is the channel rate in (EDIT) MB/sec of the stick of RAM. That's 25GB/s for a 3200 MHz DDR4 stick. x8 (for 8-channels working in parallel) is 200GB/s.
As you can see, Netflix's FreeBSD optimizations have allowed EPYC to reach 194GB/s measured performance (or just under the 200GB/s theoretical). And only with VERY careful NUMA-tuning and extreme optimizations were they able to get there.
That presentation shows 194 gigabits/s which is only ~24 gigabytes/s at the NIC; that requires ~96 gigabytes/s of memory bandwidth. Usable memory bandwidth on Milan is only <120 gigabytes/s which is about 60% of the theoretical max. DRAM never gets more than ~80% of theoretical max bandwidth because of command overhead (which is what I think ajross keeps alluding to). https://www.anandtech.com/show/16594/intel-3rd-gen-xeon-scal...
I appreciate the correction. It seems like I made the mistake of Gbit vs GByte confusion (little-b vs big-B).
> (which is what I think ajross keeps alluding to)
It seems like ajross is accusing me of underestimating CPU-bandwidth. At least, that's my interpretation of the discussion so far. As you've pointed out however, I'm overestimating it.
EDIT: But I'm overestimating it on both sides. A100 2000 TB/s is the "channel bandwidth" as well, as the CAS and RAS commands still need to go through the channel and get interpreted.
Look, if CPUs were better at memory latency, the BVH-traversal of raytracing would still be done on CPUs.
BVH-tree traversals are done on the GPU now for a reason. GPUs are better at latency hiding and taking advantage of larger sets of bandwidth than CPUs. Yes, even on things like pointer-chasing through a BVH-tree for AABB bounds checking.
GPUs have pushed latency down and latency-hiding up to unimaginable figures. In terms of absolute latency, you're right, GPUs are still higher latency than CPUs. But in terms of "practical" effects (once accounting for latency hiding tricks on the GPU, such as 8x way occupancy (similar to hyperthreading), as well as some dedicated datastructures / programming tricks (largely taking advantage of the millions of rays processed in parallel per frame), it turns out that you can convert many latency-bound problems into bandwidth-constrained problems.
-----------
That's the funny thing about computer science. It turns out that with enough RAM and enough parallelism, you can convert ANY latency-bound problem into a bandwidth-bound problem. You just need enough cache to hold the results in the meantime, while you process other stuff in parallel.
Raytracing is an excellent example of this form of latency hiding. Bouncing a ray off of your global data-structure of objects involved traversing pointers down the BVH tree. A ton of linked-list like current_node = current_node->next like operations (depending on which current_node->child the ray hit).
From the perspective of any ray, it looks like its latency-bound. But from the perspective of processing 2.073 million rays across a 1920 x 1080 video game scene with realtime-raytracing enabled, its bandwidth bound.
Skylake-X has already had its die shots examined and the AVX-512 register file, the dominant part of the layout, is something like .5 of a single core, so it's not even going to buy you much area for anything if Intel deleted it, the whining about how it's better spent on extra cores by Linus is totally overblown. Ice Lake has also dramatically improved the per-core frequency tuning for client SKUs to the point AVX-512 is quite viable on my laptop with no serious problems; a single thread doing something isn't going to tank anything. Ice Lake-X almost certainly has 2FMAs instead of the 1FMA of client SKUs however, so it'll be interesting to see what the new power licensing situation is, but this is clearly something they've had in the books to improve.
The problem is that for the workloads that need specialization, you sometimes really need it. You could also delete the vectorized AES units in your Intel machines too and the general purpose performance wouldn't be affected much, but cryptographic performance specifically would tank, and it turns out, that matters a lot in aggregate for many people.
Ultimately there are literally dozens of specialized inactive units on any CPU at any given time that could be "better spent on general purpose units" (which also isn't necessarily true if other architectural choices prevent those units from being utilized effectively). People just like complaining about AVX-512 because it's easily digestible water cooler chat they read about on a blog.
AVX-512 is problematic in multiple ways your comment fails to even touch upon.
It impedes latency-sensitive applications due to forced down-clocking taking place for AVX-intensive instruction streams. The other side effect is that it quickly produces extreme heat that not only forces further down-clocking of the core but also taints neighbouring cores with dissipated heat and prevents them from going into turbo.
The 2020 Intel MacBook Air and 13" Pro have 10nm Ice Lake with AVX512. The Ice Lake MacBook Air performs pretty well and very close to the Ice Lake Pro, though of course the M1 destroys it.
Actually I don't know... I suspect Intel still wins in wide SIMD. The M1 totally destroys Intel in general purpose code performance, especially when you consider power consumption.
"As impressive as the new Xeon 8380 is from a generational and technical stand-point, what really matters at the end of the day is how it fares up to the competition. I’ll be blunt here; nobody really expected the new ICL-SP parts to beat AMD or the new Arm competition – and it didn’t. The competitive gap had been so gigantic, with silly scenarios such as where a competing 1-socket systems would outperform Intel’s 2-socket solutions. Ice Lake SP gets rid of those more embarrassing situations, and narrows the performance gap significantly, however the gap still remains, and is still undeniable."
This might be a very dumb question but it always bothered me - silicon wafers are always shown as great circles, but processor dies are obviously square. But it looks like the etching etc goes right to the circular edges - wouldn't it be better to leave the dead space untouched?
Most semiconductor production processes like etching, doping, polish etc are done on the full wafer, not on individual images/fields. So there is nothing to be gained there in terms of production efficiency.
The litho step could in theory be optimized by skipping incomplete fields at the edges, but the reduction in exposure time would be relatively small, especially for smaller designs that fit multiple chips within a single image field. I imagine it would als introduce yield risk because of things like uneven wafer stress & temperature, higher variability in stage move time when stepping edge fields vs center fields, etc.
I think these are just press/PR wafers and real production ones don't pattern on the edge. (First of all it takes time, and in case of EUV it means things amortize even faster, because every shot damages the "optical elements" a bit.)
edit: it also depends on how many dies the mask (reticle) has on it. Intel uses one die reticles, so i. theory their real wafers have no situation in which they have partial dies at the edge.
Real wafers have the chip patterning going to the edge and this can results and often does in partial dies.
There is no reason not too so this and the edges are also often used for calibration and (potentially) destructive testing.
The only area of the wafer that will not be exposed is the notch, it’s always on one side of the circle and it’s used for moving the wafer around, this is why you often see wafers with one of the sides cut off giving it the flat tire shape.
As disappointing as the perf is for server workloads, what I'm really interested in is SLI gaming performance. I can imagine that this would be a boon for high end gaming with multiple x16 PCIe 4.0 slots and 8 DDR4 channels.
SLI really shines on HEDT platforms, and this is probably the last non-multi-chip quasi-HEDT CPU for a while with this kind of IO.
(Yes, I know SLI is 'dead' with the latest generation of GPUs)
These would be absolute trash for SLI performance vs top end standard consumer desktop parts. The best SKU has a peak boost clock of 3.7 GHz, the core to core latencies are about twice as high as the desktop parts, and the memory+PCIe bandwidth mean little to nothing for gaming performance (remember SLI bandwidth goes over a dedicate bridge as well) which is highly sensitive to latencies instead.
As gets repeated ad nauseum, industry numbering has gone wonky. Intel still hews more or less to the ITRS labelling for its nodes, which means that it's 10nm process has pitches and density values along the same lines as TSMC or Samsung's 7nm processes.
This is, indeed, no longer an industry leading density and it lags what you see on "5nm" parts from Apple and Qualcomm. But it's the same density that AMD is using for the Zen 2/3 devices against which this is competing in the datacenter.
Maybe the density is the same, but the 10-nm process variant that Intel is forced to use for Ice Lake Server is much worse than the 7-nm TSMC process.
It is worse in the sense that at the same number of active cores and the same power consumption, the 10-nm Ice Lake Server can reach only a much lower clock frequency than the 7-nm Epyc, which results in a much lower performance for anything that does not use AVX-512.
It is also worse in the sense that the maximum clock frequency when the power limits are not reached is also much worse for the 10-nm process used for Ice Lake Server.
Ice Lake Server does not use the improved 10-nm process (SuperFin) that is used for Tiger Lake and it is strongly handicapped because of that.
While I'd agree with you that TSMC's current 7nm seems to be better than Intel's current 10nm, comparing Epyc to Ice Lake SP isnt quite the same. Intel is putting (up to) 40 cores on the same die, AMD only puts 8 cores. It looks like AMD has the better method for overall performance, and Intel will likely follow them - in addition to being able to get more cores into a socket, I suspect Intel could also crank frequency higher with less cores per die.
For the user it does not matter how many cores are on a die.
For the user it matters what is included in a package.
The new Ice Lake Server package (77.5 mm x 56.5 mm) has finally reached about the same size as the Epyc package (75.4 mm x 58.5 mm), because now Intel offers for the first time 8 memory channels, like its competitors have offered for many years.
So in packages of the same size, Intel has 40 cores, while AMD offers 64 cores. Moreover Intel requires an extra package for the I/O controller, while AMD includes it in the CPU package.
So for general-purpose users, AMD offers much more in the same space.
On the other hand, Ice Lake Server has twice the number of FMA units, so it has as many floating-point multipliers as 80 AMD cores. This advantage is diminished by the fact that the clock frequency for heavy AVX-512 instructions is only 80% of the nominal frequency, but it can still give an advantage to Ice Lake Server for the programs that can use AVX-512.
From a yield perspective, if core failures are independent events, binning is probably easier with the big chiplet approach.
The Epyc 3 approach does have some drawbacks. Looking at the Epyc 3 TDP numbers, there's probably a nontrivial thermal cost to breaking out the dies as AMD has. Not to mention the I/O for Epyc 3 is not on TSMC 7nm.
If intel has learned anything it must be that whatever they call their next process it must be one smaller than the competition regardless of how they come up with that number.
Intel's process have been a disaster, however considering that for the most part they aren't that far behind (especially financially) I don't think they have to catch up much on process at least to be right back in the fight - I will believe that the pecking order has truly changed when AMD's documentation and software is as good as Intel's.
Why Intel even bothers releasing products that don't bring anything new and worthwhile to the table? This is such a massive waste of time, resources and environment.
That said, fixing Intel is a three step process right? First they have to get their process issues under control (seems like they are making progress there). Second, they need to figure out the third party use of that process so that they can bank some some of revenue that is out there from the chip shortage. And finally, they need to answer the "jelly bean" market, and by that we know that "jelly bean" type processors have become powerful enough to be the only processor in a system so Intel needs to play there or it will lose that whole segment to Nvidia/ARM.