Am I naive thinking this infra is overblown for a read-only content website? As ...

Joe8Bit · 2025-08-19T11:00:17 1755601217

Reading the article I got the impression the big challenge is doing "personalzation" of the content at scale.

If it were "just" static pages, served the same to everyone, then it's pretty straightforward to implement even at the >300m users scale Netflix operates at. If you need to serve >300m _different_ pages, each built in real-time with a high-bar p95 SLO then I can see it getting complicated pretty quickly in a way that could conceivably justify this level of "over engineering".

To be honest though, I know very little about this problem beyond my intuition, so someone could tell me the above is super easy!

ericmcer · 2025-08-19T14:54:52 1755615292

Whats the point of building & caching static pages if every single user gets their own static page... The number of users who will request each page is 1?

nchmy · 2025-08-19T15:35:51 1755617751

I dont think that's what they're doing. They seemingly cache some common templates and then fill in the dynamic placeholders per-request. So, their new architecture that has some sort of distributed in-memory cache allows for doing so much more efficiently than, presumably, doing the same queries over the network. The way I understand it, its essentially just a fancy SSR.

JamesSwift · 2025-08-19T19:35:46 1755632146

> cache some common templates and then fill in the dynamic placeholders per-request

Otherwise known as server-side includes (SSI)

nchmy · 2025-08-20T22:05:56 1755727556

Not necessarily - SSI/ESI do http requests for each placeholder and fill it with an HTML fragment. They may just be doing some templating of whatever sort, from the kv store.

miyuru · 2025-08-19T13:13:12 1755609192

> >300m users scale Netflix operates at

They are not talking about the Netflix steaming service.

https://www.netflix.com/tudum

This is a site they are talking about which is very similar to a WordPress powered PR blog.

giancarlostoro · 2025-08-19T14:54:56 1755615296

So, is this a read-only candidate? Or is there something very nuanced about it? I've never even used this site.

chatmasta · 2025-08-19T18:23:09 1755627789

It’s basically a company blog.

giancarlostoro · 2025-08-19T21:37:17 1755639437

So it could have just been a SSG'd website...

TYPE_FASTER · 2025-08-19T12:10:17 1755605417

Yup. Content that may be read-only from a user's perspective might get updated by internal services. When those updates don't need to be reflected immediately, a CDN or similar works fine for delivering the content to a user.

When changes to data are important, a library like Hollow can be pretty magical. Your internal data model is always up to date across all your compute instances, and scaling up horizontally doesn't require additional data/infrastructure work.

We were processing a lot of data with different requirements: big data processed by a pipeline - NoSQL, financial/audit/transactional - relational, changes every 24hrs or so but has to be delivered to the browse fast - CDN, low latency - Redis, no latency - Hollow.

Of course there are tradeoffs between keeping a centralized database in memory (Redis) and distributing the data in memory on each instance (Hollow). There could be cases where Hollow hasn't sync'd yet, so the data could be different across compute instances. In our case, it didn't matter for the data we kept in Hollow.

motorest · 2025-08-19T12:35:39 1755606939

> When those updates don't need to be reflected immediately, a CDN or similar works fine for delivering the content to a user.

What leads you to believe a CDN is a solution to a problem of delivering personalized content to a specific user?

TYPE_FASTER · 2025-08-19T13:32:40 1755610360

Personalization could include showing the same cached content to many people. The personalization here could refer to which content is served from the CDN to which users.

motorest · 2025-08-19T14:12:35 1755612755

> Personalization could include showing the same cached content to many people.

That's not the problem.

The main requirement is to allow editors to preview their changes without having to wait multiple seconds due to caching refresh cycles.

yandie · 2025-08-19T16:18:34 1755620314

I can’t imagine the number of editors is that large? Are we talking about hundreds or thousands? Taking a look at the site there can’t be that many users that need this low latency preview capability.

shermantanktop · 2025-08-19T12:46:37 1755607597

When new people join your team and learn your infrastructure, I bet they often ask ”why is this so complicated? It’s just a <insert simple thing here>.”

And your response is surely “Well of course, that would be nice, but it’s not as simple as that. Here are constraints X Y and Z that make a trivial solution infeasible.”

jiggawatts · 2025-08-19T13:25:25 1755609925

> And your response is surely “Well of course, that would be nice, but it’s not as simple as that. Here are constraints X Y and Z that make a trivial solution infeasible.”

It's 500 MB of text. A phone could serve that at the scale we're talking about here, which is a PR blog, not netflix.com.

They're struggling with "reading" from a database that is also used for "writes", a terribly difficult situation that no commercial database engine has ever solved in the past.

Meanwhile they have a complex job engine to perform complex tasks such as joining two strings together to form a complete URL instead of just a relative path.

This is pure, unadulterated architecture astronaut arrogance.

PS: This forum, right here, supports editing at a much higher level of concurrency. It also has personalised content that is visible to users with low latency. Etc, etc... All implemented without CQRS, Kafka, and so forth.

foobarian · 2025-08-19T14:36:52 1755614212

Yes but HN is not Yahoo scale

yandie · 2025-08-19T16:20:27 1755620427

This Netflix blog isn’t yahoo scale either

ericmcer · 2025-08-19T14:58:56 1755615536

This is a static blog site with lists about popular Netflix content.

Yes many times there are very valid reasons for complexity, but my suspicion is that this is the result of 20+ very talented engineers+designers+pms being put in charge of a building a fairly basic static site. Of course you are going to get something overengineered.

RandallBrown · 2025-08-19T16:20:58 1755620458

Tudum has some personalization so I'm not sure it's just a static blog site.

I can't say whether or not it's complicated enough to justify this level of engineering. It could just be some engineers building something cool because they can, like you said.

KronisLV · 2025-08-19T16:30:51 1755621051

> And your response is surely “Well of course, that would be nice, but it’s not as simple as that. Here are constraints X Y and Z that make a trivial solution infeasible.”

Sometimes it’s “It was made many months/years ago in circumstances and with views that no longer seem to apply (possibly by people no longer there). Sadly, the cost of switching to something else would be significant and nobody wants to take on the risk of rewriting it because it works.”

It depends.

shermantanktop · 2025-08-19T20:02:12 1755633732

Engineers and product managers fall in love with their dreams a little too easily. A working system is a wonderful thing.

Then again, sometimes it only “works” by waking people up with pages multiple times a night.

motorest · 2025-08-19T14:14:27 1755612867

> And your response is surely “Well of course, that would be nice, but it’s not as simple as that. Here are constraints X Y and Z that make a trivial solution infeasible.”

All problems are trivial once you ignore all real-world constraints.

__alexs · 2025-08-19T09:54:47 1755597287

Doing weird pointlessly complicated stuff on a niche area of your website is a not entirely ridiculous way to try out new things and build new skills I guess.

motorest · 2025-08-19T12:34:20 1755606860

> As much as this website could be very trafficked I have the feeling they are overcomplicating their infra, for little gains.

This sort of personal opinion reads like a cliche in software development circles: some rando casualy does a drive-by system analysis, cares nothing about requirements or constraints, and proceeds to apply simplistic judgement in broad strokes.

And this is then used as a foundation to go on a rant regarding complexity.

This adds nothing of value to any conceivable discussion.

gcr · 2025-08-19T12:37:30 1755607050

characterizing netflix as a "read-only" website is incredibly shortsighted. you have:

- a constantly changing library across constantly changing licensing regions available in constantly changing languages

- collaborative filtering with highly personalized recommendation lists, some of which you just know has gotta be hand-tuned by interns for hyper-demographic-specific region splits

- the incredible amounts of logistics and layers upon layers of caching to minimize centralized bandwidth to serve that content across wildly different network profiles

i think that even the single-user case has mind boggling complexity, even if most of it boils down to personalization and infra logistics.

rpsw · 2025-08-19T12:54:30 1755608070

This blog about an architecture change is about the Tudum website specifically, not the whole of Netflix.

jiggawatts · 2025-08-19T13:13:14 1755609194

> constantly changing languages

Wouldn't that be nice!

NetFlix still only supports 4 or 5 subtitle languages.

Their billions of dollars of fancy-pants infrastructure just doesn't scale to more than half a dozen of so text files.

motorest · 2025-08-19T12:42:22 1755607342

> characterizing netflix as a "read-only" website is incredibly shortsighted considering:

The people talking about "read-only" didn't even bothered to read the overview of the system they are criticizing. They are literally talking out of wilful ignorance.

But here we are.

dakiol · 2025-08-19T10:08:27 1755598107

Most of the tech infrastructure out there is over engineered. At least based on my experience.

busterarm · 2025-08-19T11:16:56 1755602216

I remember being interested in their architecture when I attended re:Invent in 2018. I went to four separate Netflix talks given by four separate people with wildly different titles, teams and responsibilities. The talks had different titles indicating a variety of topics covered. Two of these talks weren't even obviously/outwardly Netflix-focused from the description -- they were just talks supposedly covering something I was curious about.

All four speakers ran the exact same slide deck with a different intro slide. All four speakers claimed the same responsibility for the same part of the architecture.

I was livid. I also stopped attending talks in person entirely because of this, outside of smaller more focused events.

geodel · 2025-08-19T14:41:53 1755614513

I don't know because I have not been to AWS Re:invent. But from what I have seen at workplace is that trip to this event is equivalent of corporate junket for mid-level developers who happened to be manager's favorite.

geodel · 2025-08-19T14:35:47 1755614147

Not naive but perhaps missing that army of enterprise Java developers that Netflix employ do need to justify their large salaries by creating complex architecture to handler future needs.

rokkamokka · 2025-08-19T08:54:29 1755593669

From an outsiders perspective Tudum does seem to be an extremely simple site... But maybe they have complicated use cases for it? I'm also not convinced it merits this level of complexity

pram · 2025-08-19T09:16:59 1755595019

I’m gonna take a wild guess: the actual problem they’re engineering around is the “cloud” part of the diagram (that the “Page Construction Service” talks to)

There is probably some hilariously convoluted requirement to get traffic routed/internal API access. So this thing has to run in K8s or whatever, and they needed a shim to distribute the WordPress page content to it.

piva00 · 2025-08-19T09:28:44 1755595724

Having to run in k8s doesn't change that much, the description of a whole Cassandra + Kafka stack to deliver the ingestion of articles already says there's a lot more architecture astronaut-ing going on than simply deployment.

I cannot imagine why you'd need a reactive pipeline built on top of Kafka and Cassandra to deliver some fanservice articles through a CMS, perhaps some requirement about international teams needing to deliver tailored content to each market but even with that it seems quite overblown.

In the end it will be a technical solution to an organisational issue, some parts of their infrastructure might be quite rigid and there are teams working around that instead of with that...

pram · 2025-08-19T09:32:45 1755595965

Probably because they were available, and now they’ve done the equivalent of throwing redis on their containers. My main point is, the blog content is probably the “easy part” and whatever is consuming it is the “hard part”

sunrunner · 2025-08-19T12:11:43 1755605503

Alternative idea: the actual problem they’re engineering around is their developers CVs

boxed · 2025-08-19T08:32:46 1755592366

I mean, this is the company that invented microservices so....

moomoo11 · 2025-08-19T11:37:05 1755603425

Microservices for _organizational_ challenges.

Lots of people think microservices = performance gains only.

It’s not. It’s mainly for organizational efficiency. You can’t be blocked from deploying fixes or features. Always be shipping. Without breaking someone else’s code.

boxed · 2025-08-19T12:04:38 1755605078

In fact, it's opposed to performance in many cases. And dev velocity for small teams.

williamdclt · 2025-08-19T14:48:50 1755614930

> Lots of people think microservices = performance gains only.

I don't think I ever heard that. Who claims that microservice architectures are for performance gains?

moomoo11 · 2025-08-19T14:51:32 1755615092

You’d be surprised, a lot during interviews I conducted. I think it’s mostly people that have not worked with them.

thinkindie · 2025-08-19T09:39:03 1755596343

very fair point - but there are valid use cases for microservices.

rvz · 2025-08-19T10:26:55 1755599215

> As much as this website could be very trafficked I have the feeling they are overcomplicating their infra,

That is because they are and it seems that since they're making billions and are already profitable, they're less likely to change / optimize anything.*

Netflix is stuck with many Java technologies with all their fundamental issues and limitations. Whenever they need to 'optimize' any bottlenecks, their solution somehow is to continue over-engineering their architecture over the most tiniest offerings (other than their flagship website).

There is little reason for any startup to copy this architectural monstrosity just for attention on their engineering blog post for little to no advantage whatsoever.

* Unless you are not profitable, costs of infra continues to increase or the quality of service is sluggish or it is urgent to do so.

gf000 · 2025-08-19T12:41:25 1755607285

> There is little reason for any startup to copy this architectural monstrosity

This is the only reasonable take in your rant, but the reasoning is off for even this. They have little reason, because they will never hit the scale Netflix operates at. In the very very odd chance they do, they will have ample money to care about it.

rvz · 2025-08-19T16:11:10 1755619870

We've already seen a decade of cargo-culting around the micro-services hype which was popularized by Netflix and we don't need yet another over-engineered solution which given the results of this one, delivers little value or gains to solving the problem.

But many startups will die trying it anyway.