I've tried a number of ML projects like this too, and sometimes they do work – Bodypix is one example (though not precisely the same thing).
That being said, its usually pretty difficult to get code from papers to build/run successfully due to dependencies (e.g. depending on a specific version of Python and OpenCV 2 and requiring CUDA support)
Could anyone explain why version numbers make ML stuff so brittle? Why is the CUDA version supposed to match? Why would a newer version of python (3.7 vs 3.6) would ever break anything?
TensorFlow is a crapshoot as far as I understand, constantly changing making newer versions incompatible; but why do other libraries break backwards compatibility without a major version bump?
Because dependency management for anything C/C++/Python related has often been a massive cluster* * * *. Rust's npm style package management is probably the greatest innovation to have hit low level programming in a long time. Not to mention anything Nvidia related is often closed sourced so you are programming against a black box API by a vendor infamous for their bad driver quality on non-Windows systems.
You might be relying on a function that only exists in 3.7 and not in 3.6, code written in 3.6 would work but new code using 3.7 features won’t be backwards compatible. With compiled code the errors are usually very hard for “more used to scripts“ people to decode. You get stuff like missing symbols in the linker phase.
ML projects usually have a lot of libraries so you also get in the transient dependencies breaking quite often...
Yeah, managing dependencies is tricky. I've been super excited about Nix and Guix lately for that reason; if you have a single Nix/Guix revision and a list of packages, you have all the information you need to build exactly the same package tree. (With bit-for-bit reproducibility where possible, no less!)
Some language-specific package managers can do similar things, but you really only get reproducibility for the whole system with a general-purpose package manager. Poetry gets you pretty far within the Python ecosystem, but if you need a specific version of Python/specific native libraries/etc... it doesn't get you all the way there.
I tried to see what would happen when I put in some stock photos of people into the Colab notebook linked from the repo's README. Some worked better than others. Some totally didn't resolve well at all. Overall I think it's interesting, but there are definitely a lot of edge cases.
The next thing might be mapping from that static 3D model to some kind of rigged model that can be animated. Then you would not only need to place the bones and joints but also separate the clothing from the body. Extremely hard but DL has been able to pull off some incredible stuff.
Wow. It's such "high resolution" that the distorted triangular polygons of the untextured models are clearly distinguishable and painfully obvious on my phone.
I guess my idea of "high resolution" differs from what the rest of the world describes as such.
It's interesting how the legs are often an unequal length in the reconstruction when the person is walking, due to some incorrect interpretation of the camera perspective.
I would have guessed this wouldn't be a problem given that there are multiple photos of the same view, such that you can resolve the depth ambiguity from a single camera. I.e use feature detection to identify feet in > 1 views, and then use the two images to resolve the depth of the epipolar lines that lie on the optical axis and reconstruct the shoe in 3D space.
I am interested in high resolution 3D mapping of skin surface e.g. for facial scar revision etc - what is the best technology available for this either research or commercial?
Thanks, I think this deserves an HN post of its own. Some of the things that were done to entice Sadeghi to join sends shivers down my spine.
Edit: I've been looking at the details on the rear view of some of the "Single-View Reconstruction" examples, and I'm starting to worry that this may actually not be reproducible.
I have followed this case since the beginning. I am surprised how the academic community (who's fully aware of this case) continues reviewing his work without questioning work ethics as though nothing happened.
TLDR:
Dr. Iman Sadeghi is the man behind hair rendering tech for Disney and Dreamworks. He left his job at Google to join Hao Li's company Pinscreen (which by the way is funded by big names like Softbank).
When Sadeghi saw red flags inside the company, he raised the issue from within, and finally wanted out. While he was leaving one day, Li and his colleagues legit assaulted (violently) Sadeghi to give up his company laptop. This, by the way, is recorded on CCTV cameras and can be viewed online.
The fraud case is about falsifying results in their SIGGRAPH 2017 Technical Papers submission. They claimed to generate avatar hair shapes in their paper, and when their reviewer asked them to give results on many faces, the company hired artists for as much as $100 to generate them manually for the results. Of course, they later claimed to make it fully automatic AFTER publishing the paper, but that doesn't justify the stance that they published false results in one of the biggest computer graphics conference. Hell even at that public demo, they showed pre-cached avatars and claimed them to be generated real-time.
Terrible grammar, but probably means something like the following:
"These guys were desperate to find hot white girlfriends, but nobody would date them so they resorted to writing this lewd software. I'd advise them to work on something useful instead."
I've found that the general rule is, if you see a Japanese-language comment on an English-language website it's most likely 1) not written by a Japanese and 2) offensive to varying degrees.
My guess is that it’s an inside joke of some kind, targeted at English speakers with some knowledge of Japanese.
When I was a kid my classmates would often ask me to teach them dirty English slangs, which they’ll say out loud in front of teachers. They got a kick out of the fact that they can say naughty things without the adults noticing.
So yeah I think it’s grade school humor, fairly innocuous stuff but kind of weird to see on HN.
So you say it wasn’t a joke, you wrote the original comment in Japanese because it was your sincere opinion and you wanted the researchers to understand.
Not sure I follow.
If that were the case you should have written in English, obviously. The paper is written in English, some of the authors have Japanese names but we don’t know if they’re native Japanese speakers, the last author (team leader) has a Chinese name, and everyone who does CS research at this level will understand English just fine.
I'd guess other non-Japanese-speakers? Throwing the comment at Google Translate/Papago is easy enough, and might register as "there's a Japanese troll/spam problem". Maybe generating this impression is the objective? It'd be a form of nerd-sniping - if you feel satisfied with yourself over going the extra step of using a tool, you might stop thinking critically at that point and be susceptible to being duped.
That the grammar is "terrible" is an exaggeration; I have decent confidence.
Although the main gist of it is right, the translation is embellished, and I suspect deliberately so.
The tone is deliberately harsh. "lewd" is usually (and certainly in this context) not a good translation of エッチ; "lewd" is a negative word that is better corresponds to concepts like いやらしい or maybe 淫らな. Nothing like "nobody would date them" is written or implied, just that they can't find that type of woman (そういう女性) to date them, for whatever reason. It doesn't imply they don't have girlfriends, or even that a woman such as what they modeled in the software wouldn't want them; no speculation is posited about why that woman isn't there. 付き合ってくれる人がいない does not mean "nobody would date s/o". I also didn't use any phrase that corresponds to "desperate" like 窮余 or 絶望 or what have you.
Suppose you tried this on your own camera or maybe in a different lighting: it would totally break this.
Paper definitely seems like an improvement, but I guess no one should get excited to use this in experimentation/production.