Few-Shot Video-to-Video Synthesis

greggman2 · on Nov 5, 2019

If someone wants a business idea take this tech and use it to offer a service to fix up old videos. At least for a few years (10-20) there should some market. I recently paid of have some family vhs tapes converted to .avi files. Those vhs tapes are originally conversions of 8mm film. This is what most of the video looks like

https://i.stack.imgur.com/tqPDO.gif

Actually that's probably better than average. Enough to bring back memories but pretty awful and hard to look at more than a few moments especially from the flicker.

Seems like a perfect job for computational video processing

Doxin · on Nov 5, 2019

Too bad about the quality of those conversions. I'm assuming you don't have the original 8mm film anymore, as depending on the grain size (more-or-less directly related to the film "speed" or ISO value) and whether it's black and white or color film you should be able to get anywhere from a crisp 720p to a 1080p resolution when scanning it, it'd avoid a lot of the loss of midtones you see in the video you linked.

If all you have are the VHS tapes you could try getting them converted by a different company, or doing it yourself. The flickering looks a lot like a VHS player not doing brightness correction correctly to me. The original film wouldn't have flicker, The film scanner would be unlikely to introduce it. A VHS player needs to do brightness correction on the signal it reads from the tape in any case since the signal strength may vary with tape age or wear, or even the signal strength at time of recording. A VHS player would normally use the sync pulse to detect how much it needs to amplify the signal, if that mechanism gets out of whack you can get wild brightness fluctuations. Some of the color could probably also be rescued if the VHS conversion was done more carefully.

If you don't have the VHS tapes anymore either (which is a perfectly reasonable thing after having them converted) you could still probably get a lot of the brightness fluctuations out by editing the avi files. I don't know of the top of my head any tool that'll do brightness equalisation like that automatically but I'm sure it exists, it's computationally not a hard problem. You'll probably not be able to recover much color from this stage though, the signal is thoroughly quantized once it hits the digital realm and no amount of fiddling will get you any more color resolution though you might be able to correct it a small amount with some color correction.

In all cases you're best off getting as close to the source material as possible, but I'm sure that even if the digital copies are all you have you can get it to a place where it's at least watchable.

alexcnwy · on Nov 5, 2019

The flicker would be a challenge but you can use something like DeOldify [1] or super resolution [2] on the frame sequence to improve the colour and resolution... although you may get some weird artifacts between frames using image models rather than models built for video.

Generally video models are quite far behind image models because they're so much higher dimensional (much more computationally expensive and expensive to annotate) but I'm sure it's a matter of time before someone releases a DeOldify type model for video (if one doesn't exist already).

[1] https://github.com/jantic/DeOldify [2] https://github.com/idealo/image-super-resolution

yboris · on Nov 5, 2019

For super resolution of video, the best thing I've come across is TecoGAN which uses adjacent frames for extra information, creating a consistent flow between frames.

https://github.com/thunil/TecoGAN

gibolt · on Nov 5, 2019

Alternate option for high-quality VHS digitization, with a few Amazon purchases:

https://www.youtube.com/watch?v=ZC5Zr3NC2PY

haraball · on Nov 5, 2019

Here's an old link I saved related to restoring old movies, one day I'll get around to testing these tricks: https://thebattles.net/video/8mm_restoration.html

taneq · on Nov 5, 2019

Deepfakes were just the very thinnest end of the wedge. We're only a few years away from video not being a reliable proof of anything at all.

TeMPOraL · on Nov 5, 2019

Video is already barely a reliable proof of anything at all these days. We've already been creating a library of believable-looking fakes for decades: movies and TV shows. Sometimes someone samples something from that library and tries to sell it as real event, and people get tricked.

Wowfunhappy · on Nov 5, 2019

That costs a lot of money though. Hard for a Russian troll farm (or what have you) to whip out tons of fully fake clips the way Disney can in a film.

As the tech gets cheaper, this calculous changes.

TeMPOraL · on Nov 5, 2019

It's hard for them to whip out fully fake clips, but it's easy for them to sample Disney, or SyFy, or news coverage, or comedy sketches that look very much like real interviews, etc. and cut them into a short clip to publish on social media.

logimame · on Nov 5, 2019

Images and videos have always have lied to us since the beginning. Depending on the shot, the perspective, the focus, the stage, and the editing, you can make an image mean many things. Even before deepfakes, we cropped images out of context and cut videos into more coherent narratives. It's just that right now these problems have become highlighted, and people have begun wary of the danger of images.

victor9000 · on Nov 5, 2019

And audio is only pedantically different.

summarity · on Nov 5, 2019

It's been 3 years since Adobe first showed off their voice impression and editing feature (unreleased): https://www.youtube.com/watch?v=I3l4XLZ59iw

Though this is very impressive, it seems to take longer and longer to make those tiny improvements that make all the difference wrt believability.

N.B. The most convincing TTS I've ever heard (predating Lyre by quite a bit) generated things like this: http://web.archive.org/web/20190803012012/https://instaud.io...

echelon · on Nov 5, 2019

I have real time voice to voice that runs on low end CPUs. You can impersonate celebrities and cartoon characters.

https://drive.google.com/file/d/1zRvJEGJjTpKvvzel-J0agh3fKBn...

I'm integrating it into a "Snapchat filter" type app with lightweight social features just as a means to bring it to market and hopefully attract Facebook or Snapchat or Tencent into buying it. I'm building it to sell, essentially.

I need capital so I can fund my real ambitious start-up of end to end computational filmmaking. Graph-based story language, light field camera optics, tracking and localization in prerendered environments, content-aware shaders, real time storyboard population and automated editing, posture estimation and mistake correction...

With patent protection, I think it could unseat Disney and make more money than they do with Marvel and Star Wars.

I need a lot of capital to build my lab. Optics (good sensors and glass), a modest studio with rigging and tracking set up for experiments, and a handful of engineers.

anigbrowl · on Nov 5, 2019

Hmm. I'd like to test that out, I did film audio for a decade so I feel like I could provide you with useful feedback. NDA is OK, same name at gmail if you want to get in touch.

spyder · on Nov 5, 2019

So something like the Celebrity Voice Changer app?

https://apps.apple.com/us/app/celebrity-voice-changer-face/i...

https://www.youtube.com/watch?v=UhDMZVoDR9s

echelon · on Nov 5, 2019

Oh wow, I didn't do due dilligence.

I checked the Android app store and all the apps used text to speech before vocoding. I had no idea this existed on iPhone.

Thanks.

kkotak · on Nov 5, 2019

I wonder how this affects the gaming industry.

dmos62 · on Nov 5, 2019

I hope it helps bring down world-building costs. Currently a lot of projects are impossible to indie developpers simply because of the amount of manual labour involved in modeling and texturing.

logimame · on Nov 5, 2019

One of the effects of deepfakes is that it makes the inherent value of simulating reality to the utmost detail vanish over time. Our whole mass entertainment industry is built by worshipping the quality of simulation (for example, those flashy VFX effects, realistic landscapes, and Lapunzel's long hair.) Right now, those details are achieved by hundreds of animators, motion capture actors, and technicians, and by having huge render/simulation farms. When virtual realism becomes easily replicable (by regurgitating previous data through a deep learning pipeline), the old values of simulating reality will soon vanish over time, and therefore the indie artists would be able to compete in a fairer level than the entertainment monoliths.

However, one of the problems of deep learning is that you need to have a good dataset first, but how are you going to build one? Well, the big game companies wouldn't reveal their models and textures that easily. For indie artists to utilize this technology, there needs to be a centralized community project built around gathering and preprocessing data, rather than waiting for someone like Adobe or Autodesk do it.

dmos62 · on Nov 5, 2019

Good points. Same holds for any other industry where data is important. Data is power, because it's legal to thwart competition by not sharing data. It's plain to see how without regulation this leads to uncooperative data behemoths.

yzh · on Nov 5, 2019

Also film industry.

mitchtbaum · on Nov 5, 2019

It depends how you see it.

hirundo · on Nov 5, 2019

"The code is ready for release, but we're still waiting for lawyers to resolve some legal issues. Once it's approved we can release it." -- issue #1

Kudos to NVIDIA Corp. for the planned release of this code. It seems to have a lot of commercial potential. Maybe they see it increasing demand for their hardware.

fatjokes · on Nov 5, 2019

Actually I suspect it's more because of the push in the research community for improved reproducibility.

BubRoss · on Nov 5, 2019

What are you basing that on? Having source tied to a paper is very common in computer graphics and has been for a long time.

godelski · on Nov 5, 2019

Really? I'm doing a Ph.D. in sci-vis and my experience has been that people don't release code. One time I emailed a group asking if they were going to open source their code (and asking questions about some parameters that were unexplained in their paper) and a week later they responded by telling me to go the tutorial page on pytorch (seriously.... just ghost me if you're going to pull that kind of crap). It seems to me that a lot of groups keep code as a "secret sauce" per-say. Personally I feel that's anti-scientific, reproducibility is a fundamental element to doing science.

bonoboTP · on Nov 5, 2019

Big and famous groups/companies usually do release their code nowadays.

I'm automatically suspicious when someone doesn't, even though I guess most of the time it's nothing nefarious. It's just extra work and effort to bring the code into presentable shape and that effort could be spent on the next paper. Once the paper is published, the material benefits have been reaped, publication count incremented. Of course this is a short-sighted view, because in the long term not only the paper counts matter, but also one's general reputation within the community.

Invictus0 · on Nov 5, 2019

I'd like to see these videos in higher resolution. I have to squint to resolve any detail here.

chairmanwow · on Nov 5, 2019

My guess would be that the videos processed in their work are quite small to limit computation time

p1necone · on Nov 5, 2019

I imagine this would be much less convincing in higher resolution.

TeMPOraL · on Nov 5, 2019

It's already not very convincing if you look closer. For instance, look at the way hair behaves (or rather, how it doesn't). Or notice the artifacts around shoes.

diesal11 · on Nov 5, 2019

The Paper includes some higher resolution examples which are quite convincing

p1necone · on Nov 5, 2019

The paper is a PDF file, is there another video component I'm missing?

rasz · on Nov 6, 2019

You cant, its a hard rule of image/video manipulation papers, results must be 64x64 pixel or smaller.

anigbrowl · on Nov 5, 2019

I mean that's just a time/GPU power issue.

Mathnerd314 · on Nov 5, 2019

It seems to do motion transfer alright but the legs in the dancing video get warped instead of moving so there's some frames where they have triangle legs vanishing at the knee. https://youtu.be/8AZBuyEuDqc?t=34

Maybe if you scale up the training data the model is going to learn some better warping, but I think to really get better photorealism there has to be a 3D component, some kind of distance/movement estimation like in https://arxiv.org/pdf/1704.07804.pdf and a shape inference/transfer step.

aledalgrande · on Nov 5, 2019

Why are researchers pushing deepfake?

TeMPOraL · on Nov 5, 2019

They're not pushing deepfakes, they're pushing video-to-video synthesis. Deepfakes is one of the many things you can do with it, and not nearly the worst or most interesting (despite the media attention it got).

EGreg · on Nov 5, 2019

In five years they will have adversarial networks produce hyperrealistic videos of anything.

Can’t wait to see FOX News anchors deepfaked to have the exact opposite views.