Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Background Matting: The World is Your Green Screen (washington.edu)
217 points by hr7161 on April 25, 2020 | hide | past | favorite | 54 comments


This does look like an interesting step forward however I’ve found the biggest limitations of previous techniques to be

- people who are using occupying a wider z-axis (for example leaning forwards in the camera or who have arms in front of them)

- people holding objects like cups

How well do your method handle those kind of situations?


I'm not sure how either of those situations would trip up the system they're using. For a system trained on the background image, what difference does it make if the subject is holding a cup? The cup is not the background image, and it would be obvious in the same way that it's obvious the subject isn't the background image.


He might be referring to transparent/translucent/refractive objects, like a glass cup. Supposedly this technique can manage the transparency, but not refraction (and maybe the refraction could trip the transparency into failing).


I'm not associated with the paper, but I don't think this will have the same kinds of effects. It's effectively using a photo without the user to discriminate the background from any subjects in the scene.

The neural network used seems mostly for allowing for variations in lighting and dealing with the fuzzy effects you'll usually get around the subject. Depth of the subject doesn't appear to be relevant here.


This is broadly called natural image matting if anyone is curious enough to look into the last 20 years of research.


Thank you!

(wouldn't it be nice if every time a research topic pops up here that there would be a small list of the essential keywords to look for more background information?)


Is there an easy way to stream it to Zoom/Slack etc...? Will be nice to use it as a virtual camera source


Is it fast enough to operate on a live video feed from, say, a webcam?


XSplit's VCam can do background removal without chroma-key. It's reasonably good, and emits a virtual webcam that VC clients or OBS can use as an input. Think it's 40 USD for a lifetime license. Has a bit of ghosting when you move quickly.


Yes. OBS offers it. Teams and Zoom have guesswork-based imitations.


The Zoom guesswork-based imitation is pretty good, and it seems to be optimized for a single person's movement. It gets confused when there are more actors like a child or dog entering from stage left.


And where can I find the OBS implementation of this work?



I found the chroma key implementation, but this seems to assume you have a green screen or similar: https://github.com/obsproject/obs-studio/blob/master/plugins...

I thought GP was asking for an implementation of background removal without a green screen.


Indeed. I'm fairly sure it's not in OBS. Which is why I asked for a link to where it was implemented. Surprisingly I didn't receive one...


That's definitely my impression, and to my knowledge, OBS doesn't offer it. I'd be thrilled to learn otherwise.


OBS only offers a green screen plugin, and the teams/zoom versions are nowhere even close to this good.


And Apple PhotoBooth deprecated this functionality too 32 bit debty.


Looking at the demo released, this seems to work much better than apple’s implementation.


Very cool. Next step would be to emulate lighting in the target scene, but that probably requires pose detection and facial landmarks for accurate shading.


I have been thinking that a killer zoom product would be a USB controller for an rgb led strip that helps match color with your virtual background.


Very good idea. And change in real-time if the background is dynamic. And allow the user to set styles such as warming the FG subject, shimmering as if there is a fire or candlelight in the room, etc.


Anybody know of any work done to improve greenscreen keying? The current old school techniques work quite poorly and require so much manual work. I would imagine with the new work coming out with neural nets etc. there would be possibilities for improvement. This is very cool work and good for certain applications, but seems to produce similar problems like greenscreen to some edges.


Nvidia have had automatic "greenscreen keying" without a greenscreen in some sort of beta for a long time, but it still hasn't moved beyond that: https://blogs.nvidia.com/blog/2019/09/26/nvidia-rtx-broadcas....


Modern green screen plugins/filters are miles better than what they used to be to the point that if the keying is hard, then the image must not have been produced well. By that, I mean an evenly list background (no light fall off producing gradients). Proper lighting of the subject. Proper distance from the background (helps reduce the edges and color tinting).


I work quite a bit with green screen keying. I see the same keylights, ultimattes and primattes used still even in big productions I have worked. Fixing the key can take weeks. Maybe industry is bit conservative and I haven't seen cool new stuff bubbling under, but would love to have new tools in the toolset to approach difficult shots.


I recently tried working on this with OpenCV but it didn't quite work like I expected. I had problems with artifacts caused by my own shadows


Does this work with reflective objects in the foreground, such as a car?


What's novel about this?

If you have a background picture, you have all the info you need to identify your subject - just plain subtraction. I think this is what the Photo Booth app on my circa-2012 MacBook does, quite effectively.


This is a question we’ve gotten quite a bit (second author here).

A good intuition is that if it were easy to do it already with any background, professional studios wouldn’t be spending so much money on green screens. Background subtraction is pretty poor in general without very constrained setups. Our goal is really to provide professional quality without any of the equipment.


And can your solution do what studios want, namely process a 4K-video artifact free, when played back on a cinema screen? I doesn't look like that tbh if I watch the second video ("Ours real" is your work?).

And yeah, it requires constrained setup and a lot of additional work, because even before you "subtract" the background you have to think about lighting (your demo video might have very nice background matting, but the lighting is off, so it's relatively useless except for toy applications (which there are a lot).

Also: did you compare somewhere withe the very basic fixed-exposure method? Beause for fixed exposure, background and camera placement I suppose this should work just as well... Still I think this is a really cool project, I didn't get disappointed like with the last link of that sort, where someone tried the same with horrible artifacting.


Ex-VFX person here.

Green screens are crap with hair, because its translucent the green/blue bleeds through which means that it has to be cleaned up by hand.

Then there are the situations where there isn't a green screen. Again manual cleanup is required. Each frame needs to be cut out by hand. 24 times a second.

The same with a difference matte. Cameras are noisy, so there is constant noise in the alpha channel. This makes the effect look wobbly and cheap.

What this method does is pull a key from a difference matte, and makes it look good.


Ex-VFX part-time software team member here.

I can assure you they read these papers. If it is good, it will be part of a future version of the software.


100%, Its how we managed to justify flying out to siggaph each year.

The foundry have been trying to get this into Nuke for years. The problem is that normally you get flickering, as you'll have seen.


Are there no tools that automate the cleanup of green-screen artifacts?


The green screen software itself is that tool. The parent is saying it has limitations


Not really. The green screen gets you 85% of the way there, the rest needs a human to make an artistic decision as to how much hair to cut round.


The project page has a video comparison against previous state of the art. You can't just subtract the background if it's not 100% static and stable. Further, the novelty seems to be less artifacts, especially around hair and eyeglasses.


> If you have a background picture, you have all the info you need to identify your subject - just plain subtraction

It's not really "just plain subtraction", it's keying. Which AIUI basically means setting the alpha according to the difference between the image and the reference.

Green screen works well for this because, excepting Zoe Soldana, people tend to hang out around the opposite side of the colour wheel, so there tends to be a good distance between foreground colour and background colour. If you're trying to do this against arbitrary backgrounds, you seemingly need to augment keying with additional techniques like image segmentation to get good results.


This new method works well for partially transparent regions (hair) and allows slightly larger background movement and color overlap between foreground and background.


I think they baited you with their "loook, movement" replacement videos. As far as I can tell, their inputs have a fixed background and are of constant exposure and camera position.


No, the camera is indeed allowed to change a tiny bit. For example, you do not need a tripod. Taking photos with a handheld camera works fine (although a tripod works even better). They explain it in greater detail in their paper: https://arxiv.org/pdf/2004.00626.pdf

Background subtraction methods on the other hand usually fail if the camera moves even a tiny bit or the lighting changes slightly. More advanced methods can recover eventually, but you still get a few frames with improperly removed background.


In the first example (the one with the girl), you can see that there are small camera movements. You can also see the effect this have when applying straightforward background subtraction in the second video.


I thought the same thing. I remember vividly playing with PhotoBoot on my aunt's MacBook, I really enjoyed the rollercoaster!


That would be https://www.youtube.com/watch?v=588kZu1JeFw ?

I mean, it's trying to do the same kind of thing, but this looks to be a lot better at it.


That's @!#!@$ magical!


The originally submitted URL (https://www.catalyzex.com/paper/arxiv:2004.00626) points to

https://github.com/senguptaumd/Background-Matting, which points to

https://grail.cs.washington.edu/projects/background-matting, which points to

https://arxiv.org/abs/2004.00626, which points to

https://arxiv.org/pdf/2004.00626.pdf, which is inlined at the originally submitted URL. I'm not sure what's going on here, but on HN the convention is probably to link to the project home page first, and after that maybe the Github page and if neither of those exist, to the arxiv.org homepage (but not the pdf since those change with each revision). So I've changed to the project home page for now.


Hi, I'm one of the folks building CatalyzeX (https://www.catalyzex.com). It's intended primarily as a free resource for machine learning practitioners (research engineers, developers, students, and generally anyone interested in R&D) to discover interesting ML projects and papers, easily access the code and datasets, and communicate with the authors or other experts.

The link share here was likely with keeping the relevance of this project to HN in mind, and that easy access to the code and authors would be valuable for anyone here looking to take it further.

Thanks for clarifying the convention here on HN, being transparent, and for updating accordingly. Much appreciated.

Always open to feedback if you have any as well! :)


Are you suggesting it is an advertisement for catalyzex.com?


I wasn't suggesting anything, but having just looked at the submission history it seems clear that it's promotional. The HN community doesn't favor that. It's fine to submit your own site or work occasionally, but not to use HN primarily for promotion.

Also, the submitted title ('Zoom’s virtual background swap but better. DL+GANs for background replacement') was too promotey.


I agree with you.

FYI your first sentence effectively says > I wasn't suggesting anything, but I was suggesting exactly that.


Peripheral: What is the benefit of halving these artificial backgrounds? Apart from „it’s fun“, which wears off after about a minute? In my experience (Zoom Meetings), there’s blurring/artefacts around the edge of the head and the image quality seems to suffer as well.

I had a meeting where one participant uses an actual green screen, and the difference was remarkable, with none of the issues above.


There are two parts to background matting. The first is removing the existing background and the second is replacing it with something else. Removing the background improves the focus on the foreground - people watching can see you better and they'll listen more closely because they're not distracted by what's behind you. The second part, replacing the background with something else, might be done because you don't want people to see where you are, or because you want to overlay your foreground video on a presentation. Being able to pretend you're on a holodeck or a desert island is a trivial use of the tech.


Some people feel the need to hide their shitty apartment.

I've seen someone advised that the background on their webcam makes them "look poor", where the concern was that looking poor is a (perverse) impediment to getting paid work, but they can't exactly move, especially under lockdown. It may be better to use a calm artificial background in that case.

See also people doing online-conference presentations and Youtube videos. I've seen quite a few of those are using virtual backgrounds.

Perhaps for the same reason - thousands of people may see the video, and some people, having made the effort to put on a nice suit/makeup/etc, get a haircut, and look their best, don't want thousands of people to see what their not so nice home looks like behind it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: