Universal Studios Licensing LLC

Using deep learning to “inflate” a flat image

Nvidia Research has shared details of a new deep learning engine for its Omniverse platform that generates a 3D object from common 2D images. Using GANverse 3D, professionals like architects, developers, and designers can “easily add new objects to their mockups”—or any other virtual world on the Omniverse platform. And they can do it without “expertise in 3D modeling, or a large budget to spend on renderings.”

The PR-savvy company shared the news by showcasing a 3D model derived from “the beloved crime-fighting car KITT,” which you might know from the David Hasselhoff ‘80s TV show Knight Rider.


Why it matters

3D technologies are breaking through in a big way, but they’re still plagued by a few deep problems.

In a recent interview with Spatial Reality, spatial-computing pioneer (and co-founder of early drone imaging company Redbird) Emmanuel de Maistre explained one of AR’s biggest barriers to adoption: the expertise required to create 3D experiences.

“It’s not an easy process to create an entire AR experience, and that’s true from gaming assets to BIM models. If you look into the enterprise construction side, creating and configuring 3D assets usually requires specialized and complex software. You would need to research tools like Maya, Unity, Unreal Engine, or Blender—you have a whole list. The problem is, to use these tools, you would need to be a specialist or a graphic engineer. So yes, there’s a big challenge there.”

Emmanuel de Maistre

In other words, Nvidia’s technology is a big deal because it’s proof that big companies are actively working to reduce (or even eliminate) the learning curve for creating 3D content. And they’re not doing it by making specialist software easier to use. They’re teaching AI to do the work. So that someone like you or me can produce 3D content with as little friction as possible.

Of course, it remains to be seen how effectively the tool works with different types of objects. For now, it seems to be trained mostly to work with images of cars. And the quality of the 3D models aren’t particularly high so far (they look like they belong in an older generation video game).

But it’s important to remember that these models were created by a computer. And they seem good enough for a lot of applications. Furthermore, given the history of computing, it’s very likely the models will get better over time.

How it works

According to Nvidia, the tool is technologically significant because it uses real-world 2D images as training data. Previous models required 3D shapes as training data.

To generate the data set, Nvidia researchers used a generative adversarial network (or GAN) to synthesize images of single objects taken from a variety of angles. Next, they plugged these “multi-view images” into a framework that infers the 3D mesh models from 2D images.

“Once trained on multi-view images,” Nvidia explains, “GANverse3D needs only a single 2D image to predict a 3D mesh model. This model can be used with a 3D neural renderer that gives developers control to customize objects and swap out backgrounds.”

For more on this, check out Nvidia’s blog.