(Satellite imagery) embeddings for the rest of us
Earlier this week, I attended a workshop on Geo Foundation Models put together by Weiming Huang and Nick Malleson, both at Leeds. It was a small group, high interaction day with plenty of coffee breaks to follow up con comments, and discussions as close to blue sky as academia permits these days. As the cool kids say, extremely high signal to noise ratio.
My contribution was a five minute talk on why embeddings (and, in particular, embeddings from satellite imagery) are a very cool technology we should be paying more attention in social research and policy. I titled it, in an attempt to sound clickbait’y, “(Imagery) embeddings for the rest of us”. The point I wanted to make was that, in my view, embeddings are one of the coolest (new) ways we have to democratise access to much of the value that satellite imagery has to offer. This is particularly so for communities who have not been able to engage much with but stand to benefit from satellites. But I’m getting ahead of myself. In this post, I wanted to give a quick overview of what those five minutes were. Here you go.
I started framing embeddings from Imago’s perspective. At Imago, we work to make satellite imagery more useful, useable, and used across social research and policy. A big part of this is about developing data products that translate pixels into data that meet our users “where they are”. That is, we take relevant information from pixels and provide it in familiar formats (e.g., Census geographies), in transformed/aggregated ways (e.g., tabular) that resonate more with how social mindsets would think.
Then I moved on to embeddings. It was a bit silly to include a slide on what embeddings are for a room full of experts on this area. Nevertheless, I did it because I thought it’d be useful to frame how I see embeddings in this context. As such, I defined embeddings as the internal representation a neural net builds from an image. This ends up being a vector of values that provides a dense but compressed representation of the statistical information encoded in an image. In more human-friendly terms, this is a bit like “an image, as seen by a computer”.
Now, here’s where the talk starts getting more interesting (hopefully). Why are embeddings, such an obscure property of modern neural architectures, so important for socially-minded folks? In my view, at least for three reasons. First, it’s a very direct translation of what is essentially a multi-dimensional tensor (an image) into tabular format. The embedding is a flat, one-dimensional (mostly) representation of an image1, a very complex data structure. It’s a “buffer” that helps the non-initiated user to not have to touch a raw image, with all of its challenges, but to still get most of the benefits of doing so. Think of it as having the cake and eating it. Second, embeddings are not limited to a single type of image. Modern models are multi-sensor (i.e., they can incorporate different satellite feeds) and even multi-modal (e.g., combining satellite with other data types such as traditional geographic features). To me, this is an opportunity to make many data feeds, traditionally left aside for being “too hard to work with”, a first-class citizen in “tabular-land”, and maybe even a way to integrate seamlessly different communities that’d usually not speak to each other. And third, it’s 2025, we’re talking about embeddings because they’re tightly linked to foundation models, pre-trained general purpose encoders that can generate such embeddings easily once available. This is a really exciting field to observe these days. In the talk, I included a slide with examples of such models released (mostly) openly from Google, NASA, ESA, IBM and Cambridge. All of those examples were, at most, three months old.
As an illustration, I gave a few examples of what one can easily do with these embeddings. They’re ideal for “semantic search”, where you are interested in finding locations that look, in fundamental ways, similar to another one (e.g., what is another area in the UK that looks like my neighbourhood in Liverpool?). They’re natural inputs for unsupervised classifications built with K-means or more modern algorithms (e.g., what are the key types of areas in this region?). And they have a lot of potential to help easily spot change (e.g., has this area changed between the two periods for which we have images?). Of course, these are the “standard” uses embeddings are being used for. One of the things I’m really excited about taking embeddings to a much broader audience of domain experts is seeing what they can do with them to help solve their specific challenges.
I closed the talk reflecting a bit why this is not commonplace yet. In particular, I brough up three thoughts. The first one is that what I had just said sounds likely obvious to the group I was speaking to (experts at the intersection of AI and Geo), but to pretty much no one else. We were about 30 people and I’m not sure many more in the UK (in relative terms) might appreciate the power of these ideas. We need to change this and I think this group is ideally placed to do so. Then I moved on to the two key challenges I see in widespread adoption of embeddings among social folks. One is definitely technical: it is still tricky and cumbersome to work with embeddings. This I’m less worried about because I can see how there are ways in which we could lower the access barrier and, more importantly, we already have vehicles (e.g., Imago and the rest of SDR-UK data services are a great example). So, things seem in motion on this front. The other one is more philosophical: embeddings are tremendously useful, but they’re not the most transparent way to work with imagery, precisely because the compression that makes them very useable also makes them obscure. For good reasons, social scientists and adjacent folks tend to be very sceptical of obscure measurements. But this doesn’t mean there’s no value at all in engaging with the technology. We need a lot of evangelism and a bit of research to bring better understanding of how to use embeddings productively in these domains.
And that was that! In classic professorial style, I totally over-run my five minutes, something I’m not proud of. Weiming very politely brought everything back in line and we moved on. Again, thank you so much for putting together such group and for thinking of me as part of it! If any of the above resonates with your, please do get in touch with us at Imago! We’d love to hear from your and start a conversation.
-
Here I focus on images because this is where my interests lay, but the idea of embeddings extends to pretty much any data type a neural net can deal with. Which is to say, to pretty much any data type you can think of. ↩︎