Background Matting: The World is Your Green Screen

laumars · on April 25, 2020

This does look like an interesting step forward however I’ve found the biggest limitations of previous techniques to be

- people who are using occupying a wider z-axis (for example leaning forwards in the camera or who have arms in front of them)

- people holding objects like cups

How well do your method handle those kind of situations?

illumanaughty · on April 25, 2020

I'm not sure how either of those situations would trip up the system they're using. For a system trained on the background image, what difference does it make if the subject is holding a cup? The cup is not the background image, and it would be obvious in the same way that it's obvious the subject isn't the background image.

gnramires · on April 25, 2020

He might be referring to transparent/translucent/refractive objects, like a glass cup. Supposedly this technique can manage the transparency, but not refraction (and maybe the refraction could trip the transparency into failing).

TrueDuality · on April 25, 2020

I'm not associated with the paper, but I don't think this will have the same kinds of effects. It's effectively using a photo without the user to discriminate the background from any subjects in the scene.

The neural network used seems mostly for allowing for variations in lighting and dealing with the fuzzy effects you'll usually get around the subject. Depth of the subject doesn't appear to be relevant here.

BubRoss · on April 25, 2020

This is broadly called natural image matting if anyone is curious enough to look into the last 20 years of research.

vanderZwan · on April 25, 2020

Thank you!

(wouldn't it be nice if every time a research topic pops up here that there would be a small list of the essential keywords to look for more background information?)

shubidubi · on April 25, 2020

Is there an easy way to stream it to Zoom/Slack etc...? Will be nice to use it as a virtual camera source

myridium · on April 25, 2020

Is it fast enough to operate on a live video feed from, say, a webcam?

cricalix · on April 25, 2020

XSplit's VCam can do background removal without chroma-key. It's reasonably good, and emits a virtual webcam that VC clients or OBS can use as an input. Think it's 40 USD for a lifetime license. Has a bit of ghosting when you move quickly.

freeone3000 · on April 25, 2020

Yes. OBS offers it. Teams and Zoom have guesswork-based imitations.

wallflower · on April 25, 2020

The Zoom guesswork-based imitation is pretty good, and it seems to be optimized for a single person's movement. It gets confused when there are more actors like a child or dog entering from stage left.

myridium · on April 25, 2020

And where can I find the OBS implementation of this work?

kumarm · on April 25, 2020

https://github.com/obsproject/obs-studio

rahimnathwani · on April 25, 2020

I found the chroma key implementation, but this seems to assume you have a green screen or similar: https://github.com/obsproject/obs-studio/blob/master/plugins...

I thought GP was asking for an implementation of background removal without a green screen.

myridium · on April 25, 2020

Indeed. I'm fairly sure it's not in OBS. Which is why I asked for a link to where it was implemented. Surprisingly I didn't receive one...

fphhotchips · on April 25, 2020

That's definitely my impression, and to my knowledge, OBS doesn't offer it. I'd be thrilled to learn otherwise.

jiofih · on April 25, 2020

OBS only offers a green screen plugin, and the teams/zoom versions are nowhere even close to this good.

based2 · on April 25, 2020

And Apple PhotoBooth deprecated this functionality too 32 bit debty.

asutekku · on April 25, 2020

Looking at the demo released, this seems to work much better than apple’s implementation.

pgt · on April 25, 2020

Very cool. Next step would be to emulate lighting in the target scene, but that probably requires pose detection and facial landmarks for accurate shading.

Cerium · on April 25, 2020

I have been thinking that a killer zoom product would be a USB controller for an rgb led strip that helps match color with your virtual background.

Wistar · on April 25, 2020

Very good idea. And change in real-time if the background is dynamic. And allow the user to set styles such as warming the FG subject, shimmering as if there is a fire or candlelight in the room, etc.

ollifi · on April 25, 2020

Anybody know of any work done to improve greenscreen keying? The current old school techniques work quite poorly and require so much manual work. I would imagine with the new work coming out with neural nets etc. there would be possibilities for improvement. This is very cool work and good for certain applications, but seems to produce similar problems like greenscreen to some edges.

kmfrk · on April 25, 2020

Nvidia have had automatic "greenscreen keying" without a greenscreen in some sort of beta for a long time, but it still hasn't moved beyond that: https://blogs.nvidia.com/blog/2019/09/26/nvidia-rtx-broadcas....

dylan604 · on April 25, 2020

Modern green screen plugins/filters are miles better than what they used to be to the point that if the keying is hard, then the image must not have been produced well. By that, I mean an evenly list background (no light fall off producing gradients). Proper lighting of the subject. Proper distance from the background (helps reduce the edges and color tinting).

ollifi · on April 25, 2020

I work quite a bit with green screen keying. I see the same keylights, ultimattes and primattes used still even in big productions I have worked. Fixing the key can take weeks. Maybe industry is bit conservative and I haven't seen cool new stuff bubbling under, but would love to have new tools in the toolset to approach difficult shots.

kxh6 · on April 25, 2020

I recently tried working on this with OpenCV but it didn't quite work like I expected. I had problems with artifacts caused by my own shadows

amelius · on April 25, 2020

Does this work with reflective objects in the foreground, such as a car?

web007 · on April 25, 2020

What's novel about this?

If you have a background picture, you have all the info you need to identify your subject - just plain subtraction. I think this is what the Photo Booth app on my circa-2012 MacBook does, quite effectively.

vivjay30 · on April 25, 2020

This is a question we’ve gotten quite a bit (second author here).

A good intuition is that if it were easy to do it already with any background, professional studios wouldn’t be spending so much money on green screens. Background subtraction is pretty poor in general without very constrained setups. Our goal is really to provide professional quality without any of the equipment.

fock · on April 25, 2020

And can your solution do what studios want, namely process a 4K-video artifact free, when played back on a cinema screen? I doesn't look like that tbh if I watch the second video ("Ours real" is your work?).

And yeah, it requires constrained setup and a lot of additional work, because even before you "subtract" the background you have to think about lighting (your demo video might have very nice background matting, but the lighting is off, so it's relatively useless except for toy applications (which there are a lot).

Also: did you compare somewhere withe the very basic fixed-exposure method? Beause for fixed exposure, background and camera placement I suppose this should work just as well... Still I think this is a really cool project, I didn't get disappointed like with the last link of that sort, where someone tried the same with horrible artifacting.

KaiserPro · on April 25, 2020

Ex-VFX person here.

Green screens are crap with hair, because its translucent the green/blue bleeds through which means that it has to be cleaned up by hand.

Then there are the situations where there isn't a green screen. Again manual cleanup is required. Each frame needs to be cut out by hand. 24 times a second.

The same with a difference matte. Cameras are noisy, so there is constant noise in the alpha channel. This makes the effect look wobbly and cheap.

What this method does is pull a key from a difference matte, and makes it look good.

pierrebai · on April 25, 2020

Ex-VFX part-time software team member here.

I can assure you they read these papers. If it is good, it will be part of a future version of the software.

KaiserPro · on April 25, 2020

100%, Its how we managed to justify flying out to siggaph each year.

The foundry have been trying to get this into Nuke for years. The problem is that normally you get flickering, as you'll have seen.

jononor · on April 25, 2020

Are there no tools that automate the cleanup of green-screen artifacts?

jiofih · on April 25, 2020

The green screen software itself is that tool. The parent is saying it has limitations

KaiserPro · on April 25, 2020

Not really. The green screen gets you 85% of the way there, the rest needs a human to make an artistic decision as to how much hair to cut round.

tuukkah · on April 25, 2020

The project page has a video comparison against previous state of the art. You can't just subtract the background if it's not 100% static and stable. Further, the novelty seems to be less artifacts, especially around hair and eyeglasses.

Joeboy · on April 25, 2020

> If you have a background picture, you have all the info you need to identify your subject - just plain subtraction

It's not really "just plain subtraction", it's keying. Which AIUI basically means setting the alpha according to the difference between the image and the reference.

Green screen works well for this because, excepting Zoe Soldana, people tend to hang out around the opposite side of the colour wheel, so there tends to be a good distance between foreground colour and background colour. If you're trying to do this against arbitrary backgrounds, you seemingly need to augment keying with additional techniques like image segmentation to get good results.

asdf10 · on April 25, 2020

This new method works well for partially transparent regions (hair) and allows slightly larger background movement and color overlap between foreground and background.

fock · on April 25, 2020

I think they baited you with their "loook, movement" replacement videos. As far as I can tell, their inputs have a fixed background and are of constant exposure and camera position.

asdf10 · on April 25, 2020

No, the camera is indeed allowed to change a tiny bit. For example, you do not need a tripod. Taking photos with a handheld camera works fine (although a tripod works even better). They explain it in greater detail in their paper: https://arxiv.org/pdf/2004.00626.pdf

Background subtraction methods on the other hand usually fail if the camera moves even a tiny bit or the lighting changes slightly. More advanced methods can recover eventually, but you still get a few frames with improperly removed background.

yiyus · on April 25, 2020

In the first example (the one with the girl), you can see that there are small camera movements. You can also see the effect this have when applying straightforward background subtraction in the second video.

e_carra · on April 25, 2020

I thought the same thing. I remember vividly playing with PhotoBoot on my aunt's MacBook, I really enjoyed the rollercoaster!

Joeboy · on April 25, 2020

That would be https://www.youtube.com/watch?v=588kZu1JeFw ?

I mean, it's trying to do the same kind of thing, but this looks to be a lot better at it.

aledthemathguy · on April 25, 2020

That's @!#!@$ magical!

dang · on April 25, 2020

The originally submitted URL (https://www.catalyzex.com/paper/arxiv:2004.00626) points to

https://github.com/senguptaumd/Background-Matting, which points to

https://grail.cs.washington.edu/projects/background-matting, which points to

https://arxiv.org/abs/2004.00626, which points to

https://arxiv.org/pdf/2004.00626.pdf, which is inlined at the originally submitted URL. I'm not sure what's going on here, but on HN the convention is probably to link to the project home page first, and after that maybe the Github page and if neither of those exist, to the arxiv.org homepage (but not the pdf since those change with each revision). So I've changed to the project home page for now.

spideyunlimited · on April 28, 2020

Hi, I'm one of the folks building CatalyzeX (https://www.catalyzex.com). It's intended primarily as a free resource for machine learning practitioners (research engineers, developers, students, and generally anyone interested in R&D) to discover interesting ML projects and papers, easily access the code and datasets, and communicate with the authors or other experts.

The link share here was likely with keeping the relevance of this project to HN in mind, and that easy access to the code and authors would be valuable for anyone here looking to take it further.

Thanks for clarifying the convention here on HN, being transparent, and for updating accordingly. Much appreciated.

Always open to feedback if you have any as well! :)

myridium · on April 25, 2020

Are you suggesting it is an advertisement for catalyzex.com?

dang · on April 25, 2020

I wasn't suggesting anything, but having just looked at the submission history it seems clear that it's promotional. The HN community doesn't favor that. It's fine to submit your own site or work occasionally, but not to use HN primarily for promotion.

Also, the submitted title ('Zoom’s virtual background swap but better. DL+GANs for background replacement') was too promotey.

myridium · on April 25, 2020

I agree with you.

FYI your first sentence effectively says > I wasn't suggesting anything, but I was suggesting exactly that.

f_allwein · on April 25, 2020

Peripheral: What is the benefit of halving these artificial backgrounds? Apart from „it’s fun“, which wears off after about a minute? In my experience (Zoom Meetings), there’s blurring/artefacts around the edge of the head and the image quality seems to suffer as well.

I had a meeting where one participant uses an actual green screen, and the difference was remarkable, with none of the issues above.

onion2k · on April 25, 2020

There are two parts to background matting. The first is removing the existing background and the second is replacing it with something else. Removing the background improves the focus on the foreground - people watching can see you better and they'll listen more closely because they're not distracted by what's behind you. The second part, replacing the background with something else, might be done because you don't want people to see where you are, or because you want to overlay your foreground video on a presentation. Being able to pretend you're on a holodeck or a desert island is a trivial use of the tech.

jlokier · on April 25, 2020

Some people feel the need to hide their shitty apartment.

I've seen someone advised that the background on their webcam makes them "look poor", where the concern was that looking poor is a (perverse) impediment to getting paid work, but they can't exactly move, especially under lockdown. It may be better to use a calm artificial background in that case.

See also people doing online-conference presentations and Youtube videos. I've seen quite a few of those are using virtual backgrounds.

Perhaps for the same reason - thousands of people may see the video, and some people, having made the effort to put on a nice suit/makeup/etc, get a haircut, and look their best, don't want thousands of people to see what their not so nice home looks like behind it.