Ro Gupta is CEO and founder of CARMERA, a spatial AI company that detects road changes and maintains maps for automated driving, human navigation, and a host of other applications. This means he has hands-on experience with cutting-edge technologies like crowd-sourced capture, cloud computing, and machine learning. It also means he has a unique perspective on the issues and potential benefits of putting those technologies to work in a business context.
When I caught up with Gupta earlier this month, we talked about the value of maps that track change at high frequency, how his business survived the autonomous driving hype cycle, the ethics of constant data capture, the problem with “walled gardens” of data, why he’s looking at fast and cheap technology to move the industry forward, and whole a lot more.
Dig in below.
Sean Higgins: I like to start all of these interviews with the same question. What does spatial computing mean to you?
Ro Gupta: I think the important thing is to ask, what’s the difference between spatial computing and other forms of computing? My last company concentrated on more basic computer science for web applications. Moving to spatial computing for robotics reminds you of the really significant differences between the two.
One of the fundamental ones is that you have to think about atoms not just bits, and it’s four dimensions—x, y, z and time—versus just one or two or three. These factors really do add a new level of complexity that you wouldn’t have to account for in more generic computing problems. The spatial computing we do straddles both computer science and robotics. And that’s what I think sometimes trips people up.
I interviewed you about your company, CARMERA, about five years ago. At that point you were working on gathering maps and tracking change for applications like AEC, but now you’ve focused on automotive. Can you tell me what changed?
When we spoke, we were using fleet vehicles as a way to generate 3D maps, and then processing that data to detect change—that fourth dimension—at street level. And a lot of that core technology hasn’t changed.
Back then, we were seeing lots of new applications that needed much better temporal density in street level maps. And I think that happens when you’re in the nascent stage of any technology, you see so many potential applications, so many things that could be improved by your new solution.
After probably a year or two, however, all that general potential began to focus on a few really viable and attractive commercial opportunities—one of which was autonomous navigation. And even though the autonomous vehicle market had really kicked into gear around 2017 or 2018, there was still a dearth of next-generation mapping and spatial IQ companies serving it. So we figured, let’s lean into this.
Automotive has been a really nice way to build our product. It forces you to design for both the highest levels of quality and scale, not just one or the other like in other industries. That said, we believe that our technology can generate a much more horizontal spatial index. And that this index can solve the temporal density problem for other applications.
Given all the technological advances we’ve seen in the last five years, why was change detection still an unsolved problem when CARMERA got into the game?
It’s because digital map creation is really expensive. That’s why only a few enormous companies like Google have been willing to make that investment. And they’re not even doing high-frequency maintenance yet, because you can’t just use the same techniques you did for map creation. Even Google can’t afford to just keep driving 3D-capture vehicles around non-stop, right?
The other thing is that the old map data was fine when it was used for applications like a human getting directions for driving to a location, or researching where a restaurant is. If something had changed in the map, it wasn’t the end of the world.
You mentioned that CARMERA’s temporally dense maps could be used for applications outside of automotive. What other applications are you thinking of?
If you think about the maps we’re doing for autonomous vehicles, those maps are helping the machine act more like a human would, at least in terms of driving. That’s basically the point of autonomy.
But these evolved, next-generation maps can do another thing—they allow humans to be more machine-like, or almost like superhuman. Now even Apple Maps has a much more natural-language approach. It offers directions like “turn right at this McDonald’s” instead of saying “in 1250 feet turn right.”
There are lots of other examples. A next-generation map that was built for autonomous vehicles would have very detailed information about curb usage. That has important implications for ride-sharing and delivery, where can a driver pull over for pick-up and drop-off, where can they pull over for safety reasons or pick up a passenger, where is there likely to be vacancy, and so on.
I’m going beyond just street-level a little bit, but temporal data is also crucial for climate change and resiliency. It’s very helpful for companies that, for instance, do big infrastructure projects for cities. It can give them up-to-date information on what roads are liable to get taken out in the next Category 4 storm.
And that’s the kind of detail you can get from a next-generation map, but you couldn’t get it from a traditional consumer map.
When you say next-generation map, what do you mean? I get the feeling you don’t just mean “temporally dense.”
We actually wrote a post about this recently on our blog. We’re seeing a much less binary approach to maps than before. When you and I first met around 2015 or so, this whole concept of high definition (HD) maps was just starting to get more popular compared to standard definition (SD) maps, like MapQuest, Google, Waze, etc.
The thought with HD maps was that it was a very distinct type of map that is 100 times more accurate and 100 times more dense. And it’s only for machines—so it’s a totally separate thing from the SD maps humans can use. But what we’ve been seeing, especially in the last 18 months or so, is much more of a convergence. Now there’s more of a continuum between HD and SD.
Internally, we have been using a term for a while: medium definition (MD). We use it to indicate a middle ground for fidelity. By rationalizing the fidelity requirements, you get a map standard that’s much more scalable than HD equivalents today.
For certain tasks this is ideal. For example, think about change detection. There you’re just trying to spot change, so you want a lot of coverage or scale, but you are willing to sacrifice on other things. In contrast, when you’re actually interpreting the change and reflecting it back in the map, you want the reverse: high granularity and spatial accuracy but just for a few pin-point locations.
It sounds like the industry has matured a bit. Like it’s starting to look past the excitement of ultra-HD maps, and get down to the problem of how to make autonomy practical.
People are now realizing that boiling the ocean isn’t necessary—or viable. They are getting much more pragmatic about their sources, and asking what output fidelity they need for different uses. It’s a good thing. Because to truly commercialize autonomy—to realize it at scale—that’s ultimately a bean counting problem. It has to work economically.
This is a good time to ask, where would you say autonomous vehicles are sitting on the hype cycle? Right this moment?
We think about that a lot. In 2019, it felt like the initial hype was dying down, and it was starting to feel like the trough was definitely coming. And then certainly this year, it seems like everyone has sobered up. Now, I actually think that we are heading into the so-called slope of enlightenment. I think we are at a really good phase in the industry.
Like in any frontier industry there is always that initial hype, and that S curve. The problem is that it’s really disorienting. You don’t know when the trough is coming, so it’s really hard to throttle up or throttle down, to stay with the competition for talent and capital but not over invest—because you know that there’s going to be some kind of sobering up.
How do you track this hype, or keep an eye on how things are developing so you know how to approach your business?
I’ve always used CES as a barometer. The other barometer we use is China. China can move faster, in large part because of the regulatory environment. So they’re doing real, true L4 deployments—multiple ones, not just one Waymo and one Phoenix, Arizona.
But I also think that the Level 2+ driver automation is also really progressing. The SOPs, or start-of-production timelines, were definitely pushed back. But they’re still happening.
I think, with the effects of the pandemic and the hype cycle dynamics, there was sort of a head fake in the industry in terms of what’s really real, and what’s not. Now, I think people know what is really real, and what timelines are that they can confidently build a supply chain against.
So what is real and not real? What has lived up to the hype, and what still hasn’t materialized, or worse, failed?
The stuff that has truly advanced like bonkers is sensors, in terms of miniaturization and higher resolution for lower cost. I’d also say we’ve seen huge advancements in autonomous things that fly, that roll on wheels in contained areas, that orbit. That has been like the Cambrian explosion.
But we’re still a bit dismayed about the lack of next-generation GNSS and localization. That’s still a pretty thorny problem—and has been since we spoke in 2015. There are a bunch of narrow localization tools and techniques for specific remote sensing scenarios, but there isn’t an all-in-one solution. Therefore, you still need to use a whole portfolio of approaches to solve that problem.
I’m in touch with a lot of VCs who have been investing in this area, and they’ve all been disappointed about how far we are from seeing turnkey solutions. There’s no one company where you can say, hey, let us run our spatial data through your SaaS service or SDK/API, and spit back an output with improved location accuracy that meets the quality, consistency, and ubiquity needs of automotive. Right now, everyone still needs to do it themselves in a bit of a hodgepodge way.
I want to ask you about another important issue in spatial computing: walled gardens, where one company collects data, but heavily restricts its usage. Where does CARMERA stand on this issue—Do you think walled gardens are good, bad? Do you wall off your data?
We’re seeing a big change in automotive mapping. For the past few years, Mobileye has been the big company collecting street level data. They’re owned by Intel, and they have a black-box, walled-garden approach. You have to use their whole hardware system, and you can’t tinker with it. It’s very restrictive and limits an automaker’s ability to customize to their specific preferences.
I would say that, in general, high quality mapping data has actually been very walled off for some time. No one would argue that Google hasn’t had the best data for a long time. And though their services are accessible, the data itself is kept very closed off. That means people can’t use it for a variety of applications like we’ve talked about.
But lately there’s been a big trend against this in the automotive space. People are working to offer alternatives and reduce the dependency on just one proprietary walled-garden source of road intelligence. And that’s how we’ve aligned ourselves in the industry.
And indeed things are changing. Reducing our dependency is not a pipe dream anymore, because sensors are proliferating, and because basic layers like map tiles are essentially commoditizing. People take these primitives as a given now, rather than the final product. They’re using them to solve higher level problems.
At the end of the day, a car company, a delivery company, or insurance company—whoever it is—doesn’t care where the data or intelligence came from. They want it to be good and cheap and fast.
Companies like CARMERA make it seem like we’re headed for a future of constant 3D capture, from sensors placed all over the built environment. What are the ethical implications that CARMERA is considering as we move in this direction?
The biggest thing is treating these problems, as best as you can, with forethought instead of afterthought. With intentionality.
I’ve been through this kind of thing before. My previous startup, Disqus, was in the whole social web hype cycle. So I had a front row seat to the development of web comments, social media, and technology that gave anyone a microphone for saying anything. Back then, the people at Twitter, Facebook—and many of us in that industry—were treating the consequences as an afterthought. Some more than others.
I think the big learning from that experience is that we need to try to proactively examine any unintended consequences of our technology, as opposed to just putting fires out afterward. At CARMERA, for instance, we are excited about the good uses of this data, but we also think a lot about what nefarious uses could be. And we make sure to work with distribution partners that we know and trust, so we’ll be able to control the uses of the data downstream.
Looking at your company’s blog, and your communications, it seems like you’re paying close attention to how that data is collected.
We treat opt-in data differently from passively collected data. We scope and design around the major differences between working with data from audio-enabled interior sensors, which are in someone’s personal space, versus visual-only sensors on the outside of the vehicle. The unintended consequences can be quite different.
Those are examples of some of the ethical issues that we have to think about ahead of time, in part because they reflect our core values. But even if we didn’t believe ethics were important as a company, if you want to play in automotive, you don’t even have a choice. A lot of those checks and balances are non-negotiable, for example GDPR, functional safety, and cybersecurity standards.
Here’s a softball to close things out. What developments in spatial computing are you keeping an eye on? Is it wide-area mapping products? Is it consumer tech?
It’s really easy to point to the flashy, shiny stuff, because there’s just a lot of amazing new hardware and devices out there. And it would be hard to pick one out. But the problem is that I’ve become a little desensitized to those. Even at conferences like SPAR and CES, I’ve walked the floor and seen these amazing demos of advanced tech. But what excites me more is to find something that can be in a billion devices, or on millions of miles of road.
If you think about good, fast, and cheap, I’m much more excited about the “fast” and “cheap” parts. What are the products and technologies that are moving the needle on “fast” and “cheap” everywhere? When you’re looking at the small scale, frankly, there are many “good” solutions out there, but these solutions often don’t stand up to the challenges of real-world, large-scale deployments. I think we’ve all been convinced by the small-scale “good” stuff, whether that’s AR glasses or a new drone imaging technique, or whatever.
And at street level, “fast” and “cheap” everywhere is mobile. What’s possible with phones that are even two or three generations old is just amazing. The optical quality is incredible. I am also keeping an eye on these other new sensors that are making their way into phones, like ToF. You can see a path for them, from being a cool novelty to expanding to a scale of millions or billions. I think that’s amazing. Aerial and satellite are also looking really promising right now—they’re proliferating and enabling “fast” and “cheap” everywhere.
So my answer is not the flashy one you may have wanted. But ultimately good, fast and cheap everywhere—that’s what gets me excited.