Fast 2D rendering on GPU has represented the last few months of concentrated wor...

danielvaughn · on June 14, 2020

Hi! This looks interesting, although I confess I'm very new to this area. I'm writing a programming language specifically for UI designers, and currently I'm building a naive implementation in html5 canvas. I know very little about low-level rendering - just really know how to use drawing API's like canvas and Quartz 2D.

With that being said, I'm looking to delve deeper into this subject. Do you have any recommendations on where to start? Whenever I look beyond simple drawing API's, the focus seems to be entirely on 3D rendering (which is interesting but not my main focus right now).

raphlinus · on June 14, 2020

It's a good question, as honestly the knowledge for 2D graphics is pretty arcane, as opposed to 3D being so widely taught. I actually started a github repo for a book on 2d graphics but have no idea whether I'll actually finish it.

In the meantime, antigrain.com is one good (if old) source. The original PostScript "red book" was extremely influential in its time (it's where I learned a lot of this stuff) but is quite dated now. Best of luck, and I'm also happy to field requests for more specific areas. For example, for color theory (an important aspect of 2D graphics!), handprint.com is quite a remarkable resource.

blondin · on June 14, 2020

> knowledge for 2D graphics is pretty arcane

such an unfortunate state of affairs!

i am currently learning how to render graphics using the GPU on my mac using apple metal. what i am getting is that the GPU has been optimized for 3D rendering?! GPUs make no provision or easy way for rendering 2D graphics?

it makes no sense to me... that's where you start...

danielvaughn · on June 14, 2020

I remember asking a similar question on HN a while back. The response was that 2D graphics, UIs in particular, are mostly computed on the CPU. I have no idea why this is the case, though.

raphlinus · on June 14, 2020

See this blog, it explains it pretty well imho: https://blog.mecheye.net/2019/05/why-is-2d-graphics-is-harde...

And: historically they've been computed mostly on CPU, but I think it's time for that to change.

nitrogen · on June 14, 2020

During the late 1990s and early 2000s it was a lot more common for GPUs to provide 2D acceleration, and GUIs were drawn using those primitives. I remember the switch to CPU rendering happening, and the subsequent removal of 2D acceleration from GPUs, but I don't remember why.

At any rate, the 2D graphics we expect now are a lot more complex than the unantialiased lines, blits, and fills of old.

jcelerier · on June 14, 2020

> And: historically they've been computed mostly on CPU, but I think it's time for that to change.

It would be great to wait a bit for OS & GPU power management to evolve before biting the bullet on that. My laptop goes from 6 to 2.something hours of battery as soon as I have a GL context opening somewhere, likely because it seems to power on its discrete GPU automatically in that case.

raphlinus · on June 14, 2020

This is changing. I've been doing power measurements as well (just didn't make the cut of this blog post), and the 1060 is surprisingly power-efficient in its low frequency modes. It's also generally the case that the GPU is always active in its role running the compositor.

jcelerier · on June 15, 2020

> and the 1060 is surprisingly power-efficient in its low frequency modes.

maybe ? the computer on which this happens is a 1070. But please be aware that series 10 are a very small percentage of people. The average laptop of non-tech people around me is easily 8 years old, often on their 2nd or 3rd battery... and these people won't be able to complain easily to anyone when their new battery's life suddenly is halved because of $SOFTWARE.

zozbot234 · on June 14, 2020

With most dual-GPU machines you do get a choice whether to power on the discrete GPU or not. It's even supported on GNOME/Wayland as of late.

ben-schaaf · on June 14, 2020

Both macOS and Windows have ways for applications to specify whether they want to prefer a discrete gpu or integrated.

Kuinox · on June 14, 2020

Don't fool yourself, they won't do anything until most of the browser/apps do it. Then they will fix it to sell "longer" battery life.

gct · on June 14, 2020

I've done a fair bit of 2d graphics work (written a rasterizer, etc). Honestly it's because it's

  1) tricky to shoehorn 2D graphics onto the APIs that GPUs provide and 
  2) really not needed.  I can easily render eg: a world map with hundreds of thousands of lines at hundreds of frames/second with one core.

chrismorgan · on June 14, 2020

Please don’t use preformatted text to write lists. It’s a pain. Just leave a blank line between the items so that each is a paragraph.

_y4o5 · on June 14, 2020

If you need portable results pixel to pixel on various platforms then CPU based rendering is more straightforward than using the GPU.

The various libraries such as freetype for font rastrization only works on CPU.

Plenty of work should be done to research and implementation is left to be done in order to use the GPU more widely.

raphlinus · on June 14, 2020

This is an interesting and subtle point about doing "software on GPU compute." You are in complete control over what gets computed, and are not at the mercy of the hardware's fixed function pipeline for stuff like rasterization rules and sampling patterns for antialiasing. So I think portable results pixel to pixel are in fact viable.

Of course CPU rendering is always more straightforward than GPU, the higher performance comes at a significant cost in complexity.

dungdang · on June 14, 2020

now this may be a dimb question, but why would you start there?

as far as i understand, 2d and 3d have literally zero to do with each other in how they are rendered. one is a bunch of triangles. the other is lines, curves, thickness, gradients, and fonts (which are essentially little programs)

chrisseaton · on June 14, 2020

> the other is lines, curves, thickness, gradients, and fonts

You can reduce all these to drawing triangles.

raphlinus · on June 14, 2020

> You can reduce all these to drawing triangles.

You can, people have tried this, and it sucks. The main problem is that the conversion of Bézier paths to triangles is a hard problem with lots of conditional branching. Even when you do it, there is the other problem of rendering triangles with really good antialiasing, MSAA forces a compromise between performance and quality. By contrast, piet-gpu does an exact-area calculation for antialiasing.

So it's not a question of whether you can do it, but whether it works well, and approaches like piet-gpu absolutely stomp triangles.

Const-me · on June 15, 2020

> there is the other problem of rendering triangles with really good antialiasing

Easier than you think. Here's couple lines of pixel shader that does that, with really good antialiasing and without MSAA:

https://github.com/Const-me/Vrmac/blob/master/Vrmac/Draw/Sha...

Jasper_ · on June 14, 2020

Doing that reduction is surprisingly difficult, and usually a serial algorithm that runs on the CPU, and a naive approach is a resulting triangle set is not efficient to a GPU, but that's what toolkits like cairo, Direct2D, nanovg do.

Raph is describing an architecture where path evaluation happens on the GPU, without being baked to triangles.

dungdang · on June 16, 2020

yes, you can draw 2d in 3d space. this thread however is about 3d being built on top of 2d. not 2d being built on top of 3d.

blondin · on June 14, 2020

that's not a dumb question! it's my own fault. i only have knowledge of 2D graphics.

smallstepforman · on June 14, 2020

Haiku OS AppServer (the screen rendering component, similar to Unix X11) is a full GUI system implemented with AntiGrain Geometry as the renderer.

gct · on June 14, 2020

I think that blend2d (https://blend2d.com/) is a worthy successor to AGG, and it's under active development.

longstation · on June 14, 2020

Thank you for the link. It was a great reading. But I just want to point out that Blender2D is a software renderer.

gct · on June 15, 2020

So is/was AGG

danielvaughn · on June 14, 2020

Thanks!

syspec · on June 14, 2020

I recommend the blog of the OP, so many gems

structural · on June 14, 2020

What's been your overall experience with using Vulkan shaders for compute? Are there basic primitives that are missing from shading languages and/or have you found any impedence mismatches between writing shaders vs. how you might describe the same algorithms in other languages?

raphlinus · on June 14, 2020

That's a big topic. I've been able to work around the missing primitives (for example, I autogenerate code for Rust-style structs and enums), but have had much bigger struggles around two issues: tools, which are still quite primitive, and understanding performance, which is extremely difficult. These two problems intersect because I can imagine a lot better tools for digging into performance issues. One that I would have paid good money for is an instruction-level simulator that would highlight the source code to tell me where the stalls, bank conflicts, divergence problems, etc. are in the source code. Such a thing is possible (there are academic papers like [1]), but not as far as I know usable in daily development.

The "impedance mismatch" is that you (generally) have to write in a style to extract lots of parallelism. This tends to be very different than the way you'd write scalar CPU code, but not completely alien to me as it has a lot of similarity with the way you'd write SIMD. I've pretty much gotten the hang of it now. I'm thinking of a blog post of redoing path_coarse.comp from its current basically scalar style to a more parallel version, as that would I think illuminate the issues.

[1] http://comparch.gatech.edu/hparch/papers/gera_ispass18.pdf

NotCamelCase · on June 14, 2020

I pondered on the same subject recently as I was implementing the same algorithm (i.e. Mandelbrot set) on the CPU (scalar vs SIMD) and GPU compute using fixed-point and floating-point for comparisons (if interested: https://tayfunkayhan.wordpress.com/2020/06/03/mandelbrot-in-...).

It bothers me how little progress has been done on "shading" languages front compared to overall many-core computation models and capabilities over the years. And, that is despite the fact that shaders are very often where the most time is spent in modern workloads.

Compute with Vulkan is another story. It offers some nice abstractions, but it shows that it's mostly intended for async-compute/work-offloading for rendering, IMO. Too much fruction.

RivieraKid · on June 14, 2020

- Shouldn't 2D rendering be a solved problem given that it's basically a subset of 3D rendering?

- Don't libraries like Skia, Qt, Cairo use GPU rendering? I've always assumed so. I mean, this is 2020, GPUs have been around for decades.

pcwalton · on June 14, 2020

> - Shouldn't 2D rendering be a solved problem given that it's basically a subset of 3D rendering?

The problem is that primitives artists use are different. 3D rendering tends to all consist of polygon meshes, which are relatively easy to render. 2D rendering (basically) consists of Bezier paths, which are harder. The equivalent in 3D, which is adaptive subdivision, is not really a solved problem in real-time either.

Additionally, 2D rendering quality tends to be more important than 3D rendering quality. Whereas you can get away with 4xMSAA or hacks like FXAA in 3D, true 16xAA (without hacks) is the absolute minimum for 2D rendering quality nowadays, and even it isn't considered great for some tasks like font rendering (Pathfinder and piet-gpu both use analytic AA which is effectively 256xAA).

> - Don't libraries like Skia, Qt, Cairo use GPU rendering? I've always assumed so. I mean, this is 2020, GPUs have been around for decades.

There's a difference between renderers with GPU support and renderers that are oriented around using the GPU efficiently. In many cases this results in an order-of-magnitude speedup. On the GPU, state changes are expensive, and many such renderers that have GPU support don't really go out of their way to avoid them. There are also occlusion culling optimizations that most renderers don't do, but piet-gpu and Pathfinder do.

raphlinus · on June 14, 2020

Others have spoken to this, but as a general introduction I highly recommend Jasper's post "Why are 2D vector graphics so much harder than 3D?" In short, no, it's not a solved problem.

Also, I see a lot of variations of this question, but I should state this more clearly. There's been accelerated graphics in one form or another for a long time, but what I'm doing is a completely different type of thing. In my world, on the CPU you just encode the scene into a binary representation that's optimized for GPU but in many ways is like flatbuffers, and then the GPU runs a highly parallel program to render the whole thing. In previous approaches, the CPU is deeply involved in taking the scene apart and putting it back together in a form that's well suited to relatively dumb pixel pipes. Now that GPUs are really fast, that approach runs into limitations.

It also depends what you're trying to do. I'm focusing here on dynamic paths (and thus font rendering), while most of the libraries optimized for UI put text into texture atlases and then use the GPU to composite quads to the final surface, something they can do well.

https://blog.mecheye.net/2019/05/why-is-2d-graphics-is-harde...

raks435 · on June 14, 2020

Can you expound the principle of tiling mentioned in your algorithm a bit more ? The conventional mechanism is to use de-Casteljau to divide a bezier curve into triangles and then rasterize these triangles using GPU. If the curve is required to be scaled, the triangularization/tesselation is done again How is the algorithm presented in the link different ? Somehow the concept of tiling seems to imply that rasterization of the curve is done in the CPU itself. What am I missing ?

raphlinus · on June 14, 2020

I recommend reading the blog post series, I'm not sure I can usefully summarize the concepts in a comment reply. But very briefly, there's a flattening step (evaluated on GPU, based on de Casteljau) that converts the Bezier into a polyline (not triangles), then a tiling step that records for each tile a "command list" that contains the complete description of how to render the pixels in a tile, finally "fine rasterization" so that each workgroup reads that command list and renders 256 pixels in parallel from it. From your question, it sounds like your mental model is pretty different from how this pipeline works.

raks435 · on June 14, 2020

Yes, I am trying to align my mental model with yours. I read your blog post "A sort-middle architecture for 2D graphics" & "2D Graphics on Modern GPU", but still unable to grasp the fundamental guiding principles. It's not clear what are the commands that constitute each tile and whatever they are, what is the fundamental reasion which leads to the performance being better than joining the polylines of a curve to get a triangle list and having those triangle rasterized by the GPU. Any blog/article on the fundamental principles, that you would recomend

pritovido · on June 14, 2020

No, it is not a subset, unless you are talking about drawing textures.

Those libraries use GPU rendering using textures, like distance fields(Qt) or simply calculate fonts in 2D and draw it on textures using the GPU, like Apple or cairo usually do.

Distance fields are blurry when you have small fonts on display.

Things like calculating the exact area order the 2D curve is something you could easily do in CPU, but it is extremely difficult to do on the GPU. You need decoupled data in order to parallelize it.

jcelerier · on June 14, 2020

> Don't libraries like Skia, Qt, Cairo use GPU rendering? I've always assumed so. I mean, this is 2020, GPUs have been around for decades.

As a Qt user / developer - you can use the GPU for your app, e.g. with Qt Quick or QGraphicsView, but there are sometimes good reasons to stick to CPU & software rendering, e.g. it is somewhat common to want to have intertwined "native" OS widgets (which are all CPU-rendered raster things) and custom GPU-drawn scene - this is a case where things fall a bit apart.

Another thing is that pretty, freetype-like font rendering is super expensive when you have a lot of text to show, and can't really be done (at least I have definitely not seen infinality-level beauty from the state of the art) on the GPU yet... next to that filling some rects (read: 95% of UI) with SSE/AVX/AVX2 as Qt does is stupidly fast.

chrismorgan · on June 14, 2020

> pretty, freetype-like font rendering is super expensive when you have a lot of text to show, and can't really be done (at least I have definitely not seen infinality-level beauty from the state of the art) on the GPU yet

This is exactly what Pathfinder does. pcwalton is on this thread and is the main author of that.

https://github.com/servo/pathfinder#features says:

> Advanced font rendering. Pathfinder can render fonts with slight hinting and can perform subpixel antialiasing on LCD screens. It can do stem darkening/font dilation like macOS and FreeType in order to make text easier to read at small sizes. The library also has support for gamma correction.

jcelerier · on June 14, 2020

> This is exactly what Pathfinder does.

from the screenshots I saw so far, pretty much not.

> It can do stem darkening/font dilation like macOS and FreeType in order to make text easier to read at small sizes.

there are tons of different ways to do that. Even freetype has a few different algorithms to do it, some not even merged if I'm not mistaken, which give wildly different results

chrismorgan · on June 14, 2020

To give you an idea that pcwalton knows what he’s been doing and has indeed been seeking to match platform rendering exactly, here are a couple of tweets about the macOS font dilation: https://twitter.com/pcwalton/status/918593367914803201, https://twitter.com/pcwalton/status/918991457532354560.

I rather like the demonstration of rendering including subpixel rendering at https://twitter.com/pcwalton/status/971475785616797698, as well.

jcelerier · on June 14, 2020

> To give you an idea that pcwalton knows what he’s been doing and has indeed been seeking to match platform rendering exactly,

I don't intend at all to cast doubt on pcwalton's abilities - the work is brilliant without any hesitation.

But I wonder how that is possible given that "platform rendering" pretty much has changed every other macOS version and every Windows version ("ClearType" from WinXP is definitely not "ClearType" from Win10) ; and let's not talk about the customization abilities of freetype which makes rendering on any two linux boxes also entirely distinct.

stanfordkid · on June 14, 2020

Things like multi-resolution font-rendering are actually more complicated than one might imagine.

The easy way is to just tesselate the font into polygons -- but this tesselation often depends on the zoom level. The same thing can be said about implicit curves etc.

Most libraries do use GPU's for basic draw operations (e.g rendering a gradient), but to build something like Photoshop -- you need much more complexity.

awfulaxolotl · on June 14, 2020

How well does this approach work for something like 2D data visualization where most of the visual elements are the same -- i.e. can be instanced in OpenGL/etc?

Thanks for publishing this, it's awesome work! I'm looking forward to progression to wgpu hinted in the Github README.

raphlinus · on June 14, 2020

I absolutely have data visualization in mind for this, as I think it can benefit greatly from the scale. But the pipeline I've built is very agile, it will easily handle a diverse mix of items. It's not like OpenGL etc where there's a certain amount of overhead for a draw call so there are significant gains to be had from instancing and batching.

It is likely that CPU-side encoding can be made more efficient, though, by just filling in quantities to a template, rather than encoding from scratch.

leeoniya · on June 14, 2020

any relation to the work currently being done here by pcwalton?

https://github.com/servo/pathfinder/pull/350

hopefully you guys aren't doing identical work in parallel (no pun intended) :)

pcwalton · on June 14, 2020

Yes, there's been a lot of influence in both directions. The approaches have a lot in common, but also have significant differences. I plan to write up a description of the Pathfinder compute work soonish.

paufernandez · on June 14, 2020

I would love that!

raphlinus · on June 14, 2020

As Patrick indicated, there's a lot of cross-fertilization of ideas, and one of the best outcomes of this work would be for high performance compute rendering to ship in Pathfinder.

moonchild · on June 14, 2020

It looks like the main fruit of this is piet-gpu[1]. How close would you say it is to stable? And how complete a realisation of the concepts put forth in the OP? I'm currently working on a ui library, and evaluating alternatives for hw acceleration. The whole library wants to be in c, so I would want to rewrite the library by release-time to use it, but don't want to waste time doing that now if it's still really volatile.

1. https://github.com/linebender/piet-gpu

raphlinus · on June 14, 2020

Not very, and not very. This is research, it depends on compute capabilities that have really only gone mainstream in GPU lately, and the current codebase is about trying out the ideas. Best of luck in your project!

MrBuddyCasino · on June 14, 2020

Wow, those numbers look very impressive on an absolute basis! Do you have an idea on how piet-gpu compares to Skia? I'm not a graphics guy, my impression was that it is currently the dominant vector graphics renderer.