Imagine you could capture a 3D scene and later revisit that scene from different viewpoints, perhaps seeing the action as it unfolded at capture time. We are accustomed to snapping 2D photographs or videos, which are then compactly stored on our phones or in the cloud. In contrast, the corresponding process for 3D capture is quite cumbersome. Traditionally, it involves taking lots of images of the scene, applying photogrammetry techniques to reconstruct a dense surface reconstruction, and then cleaning up manually. However, the results can be spectacular and have been used to convey a sense of place not otherwise possible with 2D photography, for example, in recent interactive features from the New York Times.
Recently, many researchers have investigated whether the revolution in deep neural networks can put these same capabilities within reach of everyone and make it as easy as snapping a 2D picture. One technique in particular—neural volume rendering—exploded onto the scene in 2020, triggered by the following impressive paper on Neural Radiance Fields, or NeRF. This novel method takes multiple images as input and produces a compact representation of the 3D scene in the form of a deep, fully connected neural network, the weights of which can be stored in a file not much bigger than a typical compressed image. This representation can then be used to render arbitrary views of the scene with surprising accuracy and detail.
No entries found