This was a much more insightful read than I anticipated. The first part is a fantastic introduction to the idea of state of the art foundation models today, in particular in the Geo space. The second is more prospective, and thus a little more speculative. Either way, very good food for thought.

Metadata

  • Author: Krzysztof Janowicz, Gengchen Mai, Weiming Huang, Rui Zhu, Ni Lao & Ling Cai
  • Category: article
  • Document Tags: paper
  • URL: www.tandfonline.com/doi/full/…

Highlights

Given these three motivating factors, GeoFM can be defined as follows: Geo-foundational models are foundation models specifically trained on heterogeneous spatiotemporal data, capable of reliably performing advanced spatiotemporal reasoning, and designed to incorporate spatial, temporal, and other contextual factors into their output to support a wide range of (geo)spatial downstream tasks in geography and neighboring disciplines that benefit from a spatial or geographic perspective.

Just as LLMs encode the syntax, semantics, and pragmatics of human language, GeoFM could encode the language of space, i.e., the place-agnostic properties that define geography – spatial dependence and heterogeneity (Anselin,Citation1988) and its related concepts such as scale, adjacency, spatial and temporal scopes, and so on.

why do we need geo-foundation models (GeoFM) at all, and what exactly are they or will they be? First, foundation models can only generalize within the scope of their training data.

Second, many geospatial tasks are highly specific

Third, geography is inherently local/regional or contextual.

GeoAI advances along two major dimensions: (1) it applies novel methods and technologies from the broader AI and machine learning community to geographic and geospatial research questions and (2) it feeds its own, novel theoretical and methodological contributions back to the broader AI community

location embeddings can be trained separately and concatenated with the embeddings representing learned building footprints, land classes, and so on (Mac Aodha et al. Citation2019; Yan et al. Citation2019; Mai et al. Citation2020). Note: This is an idea I’ve had for a while and would be good to check some of these references to see how they approach it. Although modern foundation models were not yet on the horizon in the early 2010s, it was already clear that the era of custom, single-purpose models was slowly giving way to workflows developed around reuse and transferability. This shift raises a key question for GeoAI research: how can we distinguish progress driven by GeoAI-specific innovation from improvements mostly gained through the application of transfer learning (and related methods) from general-purpose models?

The successful combination of few-shot, prompt engineering, and transfer-learning methods on top of powerful general-purpose models raises the old question again: is spatial really special?

we can roughly classify the existing GeoFM-related research into the following categories: 1) adapting existing FMs on geospatial tasks via prompt engineering and task-specific fine-tuning; 2) developing advanced LLM agent frameworks for geospatial tasks; and 3) developing novel geo-foundation models via geo-aware model training and fine-tuning.

we further classify the current GeoFMs in four categories based on the data modalities they support and their application scenarios: geospatial language foundation models, geospatial vision foundation models, geospatial graph foundation models, and geospatial multimodal foundation models.

three major ways of realizing GeoFM or using generalist FM

For now, it is unclear whether one of the paths is preferred to approach the vision of generally capable GeoFM so that the research community could consolidate our efforts, or if this is task-dependent, and, hence, varying paths should be taken for different types of tasks.

Designing architectures that can jointly process such heterogeneous data, scale to large datasets, and accomplish effective cross-modality alignment remains a major open challenge.

A fundamental question is whether those subjective and complex human experiences should become part of GeoFM.

this raises concerns about GeoFM misrepresenting geography, be it by introducing bias or by learning representations that do not align with those of groups or societies.

spatial priors should ideally be incorporated into the pre-training of GeoFM […] those priors change across scale, resolution, modality, and so forth, and it is presently not clear how to best handle those. For instance, should they be explicitly engineered or implicitly learned?

Without co-evolving our data and benchmarks, the true potential of GeoFMs will remain constrained.

most present work on AI alignment does not account for regional, e.g., cultural, differences. However, as geographers, we know that the aforementioned societal goals, values, and norms vary greatly across geographic space and time – without any being inherently superior to others.

skills that help us better interact with such agents, critically think about their outputs, align AI with societal goals, and so on, will increase in importance.