Graphics rendering has always revolved around a basic premise: faster performance equals a better experience. Of course, graphics processing units (GPUs) that render the complex three-dimensional (3D) images used in video games, augmented reality, and virtual reality can push visual performance only so far before reaching a hardware ceiling. Moreover, as Moore's Law fades into history, the possibility of squeezing out further improvements declines.
All this has led researchers down the path of artificial intelligence—including the use of neural nets—to unlock speed and quality improvements in 3D graphics. In 2022, for example, Nvidia introduced DLSS 3 (Deep Learning Super Sampling), a neural graphics engine that boosts rendering speed by as much as 530%.a The technology uses machine learning to predict which pixels can be created on the fly using the GPU.
These best guesses—or hallucinations—radically change 3D rendering. "For decades, we have invested in algorithms that can more accurately model objects and light and the way they interact in real time," says Bryan Catanzaro, vice president of applied deep learning research for Nvidia. "AI creates the opportunity to identify correlations in signals from the graphics rendering process," making it possible to minimize compute-intensive work undermining speed and consuming resources.
Relying on AI to predict pixel creation fundamentally reshapes computer graphics. In addition to Nvidia, Intel and AMD have introduced 3D modeling frameworks using similar shortcuts to render graphics faster—usually without any noticeable degradation in image quality. However, all of this may only be the start. Soon, the burgeoning field could also spawn new forms of graphics by combining generative AI tools like Open Al's Dall-E 2 and Google's MiP-NeRF framework with the likes of DLSS.
"AI is just better at guessing the missing pixels than the handcrafted models that we used years before," says Anton van den Hengel, director of applied science at Amazon and director of the Centre for Augmented Reasoning at Australia's University of Adelaide. "We're entering a far more advanced era of 3D modeling."
Photorealism has always been the Holy Grail of 3D modeling. In the 1990s, researchers began to unlock the secrets of 3D graphics and over subsequent decades—particularly after GPUs arrived—video games and other graphics-intensive applications have evolved remarkably. Yet, these systems continue to encounter a basic physics problem: generating real-time graphics—largely an exercise in geometry—is GPU-intensive and tossing brute force at the problem can only speed things up incrementally.
The challenge grows exponentially with complex models that involve dozens or hundreds of possible objects and angles—or when calculations take place in the cloud. For example, it's no simple task to display a swarm of butterflies or human hair; things become even more difficult when synthetic objects appear on a constantly changing background. "Realistic images require a deep understanding of the physics of light transport and the way image creation works in regard to math," says Jon Barron, senior staff researcher at Google. "There are only so many hardware-based techniques you can use."
Things become even more complex when augmented reality, virtual reality, and the emerging metaverse enter the picture. "For all the talk about augmented reality and virtual reality, we have very little to show," says van den Hengel. "For years, we've been hearing these technologies are going to change the world and they are just around the corner, but they haven't quite arrived. In order to get to ultrarealistic and useful 3D modeling, there's a need to step beyond hardware and incorporate AI."
Hardware advances in GPUs cannot solve the problem, mostly because engineers are running out of ways to squeeze more transistors onto chips. Rather than 3D graphics reaching its logical limit, a software-based approach—such as DLSS—is becoming a key to unlocking speed gains while reducing computing cycles' energy demands. "AI has the intrinsic power to fill the information gap," and enhance the quality of computer-generated images, says Shigeru Kuriyama, a professor in the Visual AI Lab of Toyohashi University of Technology in Japan.
Around 2010, when researchers discovered they could repurpose GPUs to train deep learning models, the 3D modeling and rendering scene began to change dramatically. Nvidia introduced the first version of DLSS in 2018 and it has evolved through three iterations to become a dominant force in 3D graphics. Without the likes of DLSS, fast rendering and photorealistic depictions simply are not possible. "Even the most powerful GPU wouldn't be able to generate high-quality ray-traced 3D models in real time. The games and applications running on them would not be enjoyable," Catanzaro says.
DLSS 3 succeeds by predicting which actual pixels can be swapped out on the fly for AI-generated pixels. A hardware technology called Optical Flow Accelerator compares frames and identifies opportunities to make changes.b DLSS 3 was trained on billions of samples and the resulting training set was compressed by a factor of approximately 1,000, Catanzaro says. A GPU on the user's device determines which pixels it can substitute using the machine learning model and it renders the desired images accurately. It is a bit like the television game show "Wheel of Fortune" or an old-fashioned crossword puzzle: a person can view some letters and figure out the right word. In 3D modeling, the goal is for the AI model to find as many potential replacements for actual pixels as possible and automate the pixel-swapping.
When Catanzaro and a team at Nvidia put a microscope to DLSS 3 performance, they found the machine learning algorithm automatically rendered as many as seven out of eight pixels in a game like Portal. Remarkably, DLSS 3 makes it possible for a system to jump from about 20 frames per second to around 100 using a 3D model. Such speed and performance gains are significant. "The technology breaks through conventional bottlenecks," Catanzaro says.
In fact, the mathematics that surround DLSS 3 and similar AI models are somewhat mind-boggling. A frame in a typical graphics video stream contains somewhere in the neighborhood of four million pixels, Catanzaro notes. If the stream is running 100 frames per second, the GPU is processing approximately 400 million samples per second. The secret to success lies in the fact that humans only need to see one million or fewer samples per second to be convinced the scene is real. A trained neural net can figure out which pixels are essential and render them the right way. "This makes it possible for the model to function in a range that avoids uncorrelated random noise that would result in an untenable model," he says.
What makes neural network models like DLSS so attractive is that they introduce a smart integration between hardware and software, Kuriyama says. By introducing AI-based, data-driven solutions developed for interpolations, extrapolations, super-resolutions, scaling up, and hole-filling, the technology is shifting the industry away from chip manufacturing technologies and toward AI embedded systems, he adds. "That's why Nvidia, Intel, and AMD are taking the matter so seriously," he says.
Nvidia made the biggest visual splash with DLSS 3, but Intel and AMD also are pushing performance boundaries with their neural modeling technologies. Intel's XeSS (Xe Super Sampling) framework serves as an AI-powered accelerator that reconstructs subpixel data from neighboring pixels. It produces about a 2x performance boost.c AMD's RDNA 3 graphics architecture packs a pair of AI accelerators in each compute unit. AMD claims the framework delivers acceleration approaching a factor of 2.7x with 50% more Ray Tracing per CU.d
Nevertheless, accelerated rendering through deep learning remains in the early stages. One problem is DLSS 3 and other AI models fall short for displaying certain types of effects, which may lead to jitter or a shimmering effect, as well as other types of artifacts. Distortion also can result, particularly for complex animated images with a high level of detail, or when a scene changes rapidly. "These systems aren't able to render these images in a high-quality way, for the specific scenes where learning is insufficient," Kuriyama notes.
Augmented reality, the metaverse, and more realistic virtual reality push demands further. AI's ability to generate a higher level of object detail is only part of the challenge. There also will be a need to step beyond imaginary worlds and match computer-generated 3D graphics with actual physical landmarks such as stores, coffee shops, and historic sites. In addition, Barron points out, better 3D modeling is needed to advance robotics and autonomous vehicles. "These devices send and receive 3D data so anything that can reduce the data required for calculations is valuable."
3D neural modeling also could revolutionize generative AI. For example, Google has developed a framework called MiP-NeRF 360 that uses AI to generate 360-degree photorealistic representations of objects.e Barron and others are experimenting with diffusion models that generate 3D images using text and 2D diffusion techniques.f Combining an engine like Open AI's Dall-E 2 or Google's Dream Fusion with tools such as DLSS make it possible to extend 3D modeling capabilities, Catanzaro says. "It's likely the next frontier in 3D modeling."
No one questions the value of neural 3D rendering techniques. What's more, additional training data almost certainly will fuel future gains across a wide array of tools and technologies. "Just when Moore's Law is expiring and graphics as usual has run into a roadblock, AI has appeared as a valuable tool," concludes Catanzaro. "It provides us with new and powerful methods to push graphics forward, by being smarter about the rendering process.
"We are at the cusp of enormous innovation in the 3D rendering space."
Poole, B., Jain, A., Barron, J.T., and Mildenhall, B.
DreamFusion: Text-to-3D using 2D Diffusion.
Sept. 29, 2022.
Mildenhall, B., Hedman, P., Martin-Brualla, R., Srinivasan, P., and Barron, J.T.
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images.
November 26, 2021.
Tewari, A., Thies, J. et al
Advances in Neural Rendering, Computer Graphics Forum, May 2022, Pages 703–735.
Dundar, A., Gao, J., Tao, A., and Catanzaro, B.
Fine Detailed Texture Learning for 3D Meshes with Generative Models. March 17, 2022. https://doi.org/10.48550/arXiv.2203.09362
©2023 ACM 0001-0782/23/8
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from firstname.lastname@example.org or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.
No entries found