Trainers, it has come to our attention that we finally have the answer to how is Niantic using our data. In a recent post published on Niantic’s official company blog, Niantic has shared detailed description of what they are building using your PokéStop scan data: a massive geospatial AI model.
The model has no name at this moment, but Niantic is calling it the world’s first Large Geospatial Model (LGM), similar to how Chat GPT is a Large Language Model (LLM). The model does not exist yet, Niantic has shared this blog post as an announcement of intent to train and build such a model.
Here is what Niantic is building, how they are using and planning to use our data, and what the purpose of this new artificial intelligence model is.
What is a Large Geospatial Model?
A Large Geospatial Model is Niantic’s term for describing an AI model which helps computers understand and navigate the physical world. It is an AI model built and trained using large amounts of data:
- billions of images of the world around us
- billions of hours of scanned locations of the world around us
All of these data points are anchored to actual physical locations, which gives the model a sense of location and, through 3D vision, understanding of what it is looking at. Sounds a bit scary, doesn’t it? Well, it doesn’t get better from here on.
One to many – an amalgamation of local models
Niantic has also shared that their vision for this model is to be an amalgamation of many local models, where some local models have seen the front of a building, while others have seen the back of the building. The Large Geospatial Model they are proposing would be able to tap into both of these local models and create a complete 3D image of that building, distilling and creating new information by interpolating local knowledge.
Here’s what they shared:
Imagine yourself standing behind a church. Let us assume the closest local model has seen only the front entrance of that church, and thus, it will not be able to tell you where you are. The model has never seen the back of that building. But on a global scale, we have seen a lot of churches, thousands of them, all captured by their respective local models at other places worldwide. No church is the same, but many share common characteristics. An LGM is a way to access that distributed knowledge.
An LGM distills common information in a global large-scale model that enables communication and data sharing across local models. An LGM would be able to internalize the concept of a church, and, furthermore, how these buildings are commonly structured. Even if, for a specific location, we have only mapped the entrance of a church, an LGM would be able to make an intelligent guess about what the back of the building looks like, based on thousands of churches it has seen before
Niantic’s existing models are what we would describe as local models, in a sense that every small or medium sized neural network we trained as players (through PokéStop scanning), is a small or medium sized local model.
Niantic is planning to use them as contributions to a global large model, “implementing a shared understanding of geographic locations, and comprehending places yet to be fully scanned.”
The model should think like a human being
Niantic have also shared that this new model should think and understand space like a human being – it should recognize streets, understand common architectural patterns, draw navigational conclusions even if it never walked a particular street.
In a very concrete example, the model should be able to navigate an European old town, because it knows how the streets are laid our, but because it also understands the cultural background of how European old towns were built, so it can create conclusions from that.
That is a particularly interesting, and scary, aspect of this entire story. One can comprehend that 3D vision models understand that they are looking at the Eiffel tower, but to understand the whole geometry and architecture around it… sounds dangerous.
What Niantic has done so far
Up to this date, Niantic has made large strides towards creating the LGM, but alas, they are still very far off. The biggest thing standing between them and creating this model is the sheer amount of data they need to train it.
Remember, 3D scans are not readily available on the internet, they are created by Pokémon GO players using the Scan a PokéStop feature. Unlike Chat GPT, who could use the entire internet as a training ground, LGM needs our data and our input.
Here is the work so far, as shared by Niantic:
- Over the past five years, Niantic has focused on building our Visual Positioning System (VPS), which uses a single image from a phone to determine its position and orientation using a 3D map built from people scanning interesting locations in our games and Scaniverse.
- With VPS, users can position themselves in the world with centimeter-level accuracy. That means they can see digital content placed against the physical environment precisely and realistically. This content is persistent in that it stays in a location after you’ve left, and it’s then shareable with others.
- For example, Niantic recently started rolling out an experimental feature in Pokémon GO, called Pokémon Playgrounds, where the user can place Pokémon at a specific location, and they will remain there for others to see and interact with.
VPS coverage is built from user scans, and today they have 10 million scanned locations around the world, with 1 million of those processed and usable with their VPS system. The coverage maps clearly shows where players are scanning the most:
In addition to this, Niantic has trained more than 50 million neural nets to date, where multiple networks can contribute to a single location. This video shows one of those neural networks in action: