This blows my mind. This is probably a naive thought; This technique looks like it could be combined with robotics to help it navigate through its environment.
I'd also like to see what it does when you give it multiple views of scenes in a video game. Some from the direct pictures and some from pictures of the monitor.
They've only showed it working with static content - they'll need to do it with video (multiple synchronised cameras) and in real time for ant robotics application.
It'd be interesting to see what happened if they encoded an additional time parameter on each 'view' (input image pixel). Surely someone is already trying to extend this technique that way.
Currently view coordinates relative to the volume are required so you first have to solve the SLAM problem before you can optimize a network representation of a given volume.
No, the high dimensional field takes 12 hours and the time to render the field to an image is not going to matter for robotics where computer vision needs to be done in real time.
I'd also like to see what it does when you give it multiple views of scenes in a video game. Some from the direct pictures and some from pictures of the monitor.