Logo

The Planetary Computer, from a Home Lab

Author

Cole

Date Published

I love data, so I was excited when I found I could pull a Sentinel-2 image of my own backyard, for free, from a Python notebook running on my home server. The Microsoft Planetary Computer is the catalog that makes that work. It hosts roughly fifty petabytes of Earth observation and environmental data in cloud storage, exposes it through a standard API, and asks for nothing more than a free account key.

I’ve used it for two projects on this blog: a species distribution model for spotted lanternfly in Great Smoky Mountains National Park and an eight-year look at NDVI change in the same park. This post serves two purposes, first I needed an excuse to really read through all of it, and second is to share it in the hopes that more citizen scientists leverage it. In this post I wanted to explore what kinds of questions it can answer, and share how to use it without abusing it. The audience I have in mind is the person who runs a JupyterLab container at home, has a GPU around, reads ecology papers for fun, and wants to test their own ideas.


A Primer on the Language

If you’ve worked with satellite data before, skip this. If you haven’t, the following five concepts cover most of what I've seen in many dataset descriptions.

Resolution is the size of one pixel on the ground. A 10-meter resolution image means each pixel covers a 10 m × 10 m square. A 30-meter pixel covers about a tenth of an acre. A 1-kilometer pixel covers about 250 acres. At 10 m you can pick out individual buildings and large trees. At 1 km you can see whole forest stands but not the trees in them. Higher resolution isn’t always better; it’s more data to move, more disk space, and often unnecessary.

Revisit interval is how often a satellite passes over the same point on Earth. Sentinel-2 has a five-day revisit, which means it photographs roughly the same spot every five days. Landsat is sixteen days. Daily revisit is a different class of instrument entirely. Clouds get in the way, so a five-day revisit usually means a usable cloud-free image every two to four weeks, depending on latitude and how often it rains where you're looking.

Spectral bands are the slices of light a satellite records. The human eye sees three: red, green, blue. Sentinel-2 records thirteen, including several in the near-infrared and shortwave-infrared parts of the spectrum that aren’t directly visible. Healthy vegetation reflects strongly in the near-infrared, which is why “vegetation index” math works at all. Different bands answer different questions: vegetation health, water content, mineral type, surface temperature.

Optical versus radar. Optical satellites are cameras. They need sunlight, and clouds block them. Radar satellites send out their own radio waves and listen to what bounces back, so they work at night and perform better through clouds than optical. That’s why radar matters for places like the tropics or the Pacific Northwest where the sky is gray half the year.

STAC and Cloud-Optimized GeoTIFFs. This is just an unavoidable terminology dump. TIFF is a file format for images, GeoTIFF extends that to geographic data. STAC stands for SpatioTemporal Asset Catalog. It’s a JSON-based standard for describing where and when satellite images were taken, so software tools can search across petabytes of data without knowing anything about the underlying storage. A Cloud-Optimized (ironically not referring to the clouds that might be in the GeoTIFFs themselves) GeoTIFF is a regular GeoTIFF image with its internal layout reorganized so programs can fetch just the pixels they need over HTTP rather than downloading the whole file.


What’s in the Catalog

The Planetary Computer hosts well over a hundred datasets. Reading the list straight through is an alphabet soup of acronyms. I had to step back and think about the groups from the style of data they contained to better understand what all was in there.

Optical imagery: the cameras

Sentinel-2. Two European Space Agency satellites that image the entire land surface every five days at 10-meter resolution, across thirteen spectral bands. Running since 2015, free under European open-data policy. This is what I used for the NDVI change-detection project on the Smokies.

Landsat. The historical record. The current generation (Landsat 8 and 9) collects 30-meter imagery on a sixteen-day cycle, but the archive runs continuously to 1972. If you need to see what a place looked like before Sentinel-2 existed, Landsat is the only continuous source.

Harmonized Landsat-Sentinel. A combined product that processes Landsat and Sentinel-2 imagery so the values come out comparable. The two satellites have slightly different bands and calibrations; this product smooths the differences out so you can stack them into one time series. The practical effect is an image every two to three days instead of every five or sixteen, which matters in cloudy regions.

MODIS. Sees the whole planet every day, at 250-meter to 1-kilometer resolution. Best for continent-scale questions where you don’t need fine detail but you do need frequent updates: drought, fire, snow cover, ocean color.

NAIP. High-resolution aerial imagery of the continental United States at 0.3 to 1 meter resolution, flown by the USDA every two to three years. At 1 meter you can count individual trees and see backyard pools. Only covers the US, but where it covers it’s the clearest imagery in the catalog.

Radar imagery: seeing through clouds and night

Sentinel-1. Two European radar satellites that image the planet every six to twelve days at 10-meter resolution, day or night, regardless of weather. Radar pixels measure surface roughness and moisture rather than color, which makes them good for detecting flooding, wetland dynamics, vessel traffic, and forest structure. The catalog hosts a terrain-corrected version that’s been adjusted for the way mountains distort radar returns; that’s the version most people want.

ALOS PALSAR. A Japanese radar that uses a longer wavelength than Sentinel-1. Longer wavelengths penetrate deeper into vegetation canopies, which is why ALOS shows up in forest biomass work. The catalog hosts annual global mosaics.

Elevation: the shape of the ground

Copernicus DEM. A 30-meter global digital elevation model. Default choice for slope and elevation predictors in most current workflows. A 90-meter version exists for coarser work.

NASADEM. A reprocessed version of the data from the original Space Shuttle radar mission, also at roughly 30 meters, covering most of the globe except the highest latitudes.

3DEP. The USGS elevation program for the United States. Higher resolution than the global products and includes lidar-derived rasters for much of the country. I used 3DEP for the lanternfly project to compute slope and elevation predictors.

Climate and weather

ERA5. The most-used global record of what the weather actually did. It’s a reanalysis dataset: the output of a weather model fitted to all the historical observations available at the time. Hourly, global, back to 1940, at 25-kilometer resolution.

Daymet. Daily, 1-kilometer gridded weather for North America going back to 1980. Minimum and maximum temperature, precipitation, vapor pressure, snow water equivalent.

TerraClimate. Monthly global climate and water-balance data at 4 kilometers, going back to 1958. Includes climate water deficit and actual evapotranspiration as pre-computed variables, which other climate datasets generally make you derive yourself.

gridMET. Daily weather and fire-weather indices for the continental US at 4 kilometers. Fuel moisture, energy release component, and the rest of the fire metrics are pre-computed.

Living things and land

GBIF. The Global Biodiversity Information Facility is the largest aggregator of species occurrence records in the world. iNaturalist, eBird, museum specimens, herbarium sheets, government biodiversity surveys all flow into GBIF. The full corpus is over three billion records. The Planetary Computer hosts the entire database as Parquet files, so you can run analytical queries across all of it without downloading the whole thing locally. For my lanternfly project I pulled 32,641 records and thinned them to 8,835. For the observer-effort surface I pulled 100,000 records of any organism across the park. Both queries took minutes.

MoBI. Map of Biodiversity Importance, a NatureServe product covering more than 2,200 imperiled species in the continental US. It includes pre-built rasters of species richness by taxonomic group, including a layer specifically for pollinator invertebrates.

ESA WorldCover. A global 10-meter land cover map with eleven classes (tree cover, cropland, built-up, water, and so on), derived from Sentinel-1 and Sentinel-2.

IO-LULC. Annual global land cover at 10 meters, nine classes, updated every year since 2017.

USDA Cropland Data Layer. A 30-meter annual map of US crop type going back to 2008. Useful any time agriculture is part of the question: pollinators, water quality, wildlife corridors.

MTBS. Monitoring Trends in Burn Severity. Fire perimeters and severity classifications for every significant US fire since 1984. For my Appalachian region the record is sparse, but it captures the Chimney Tops 2 fire that showed up in my NDVI work.


What you can actually do with this

The two projects I’ve written up illustrate two different workflows.

The first is species distribution modeling. You pull species occurrences from GBIF, pull climate, elevation, and land cover predictors from the catalog, train a model on the relationship between presences and environmental conditions, and project the result onto whatever region you’re interested in. The lanternfly project did exactly this: climate from PRISM (which I sourced outside the Planetary Computer), elevation from 3DEP, occurrences from GBIF, a Random Forest doing the fitting. The model identified low-elevation southern and western park boundaries as the most environmentally suitable areas, and combining suitability with an iNaturalist observer-effort surface revealed where the species could establish unobserved. The same template works for almost any taxon with enough records in GBIF.

The second is time-series change detection. You pull every cloud-free image of a region over some span of years, compute an index at each pixel for each year (NDVI for vegetation, NDMI for canopy moisture, NDWI for water), and look at the differences. The Smokies NDVI project did this with Sentinel-2 from 2018 through 2025 and found that 6.1% of the park-plus-buffer area had measurably changed, with loss outpacing gain by about 50% and a single drought year (2021) driving 41% of the changed pixels. The same template gives you post-fire recovery curves, agricultural intensification trends, urban tree-canopy change, riparian-buffer monitoring, and wetland hydroperiod tracking.

A few other questions the catalog supports cleanly:

  • Hemlock mortality mapping in the southern Appalachians, using a multi-year Sentinel-2 moisture index time series. Adelgid-killed hemlock stands have a distinctive drying signature.
  • Phenology shift detection. Fit the seasonal vegetation curve at each pixel year over year and ask whether spring is arriving earlier than it used to. Pair the satellite-derived greenup dates with Daymet spring temperature trends.
  • Detection bias mapping beyond invasive insects: running the same effort-surface approach as the lanternfly project for any taxon with citizen-science records, to find the gaps in our knowledge of common species.

Using the Catalog Politely

The data lives in someone else’s cloud storage and downloading it costs Microsoft real money. A few habits keep your usage reasonable.

Subset before you read pixels. The API lets you filter by bounding box, date range, and cloud cover before you fetch anything. A query for “all Sentinel-2 scenes over my county in July 2024 with less than twenty percent cloud” returns a manifest of a few dozen scenes in under a second. Use those filters aggressively. The alternative is making both Microsoft’s bandwidth and your own pipeline do work you’ll throw away.

Cache aggressively. Once you’ve computed an annual median composite for an area, write it to local disk as a Cloud-Optimized GeoTIFF or a Zarr store and reuse it. Don’t recompute the same composite every time you iterate on the analysis downstream. My Smokies NDVI project stored eight annual composites locally, a few hundred megabytes total, and after that ran entirely offline.

Mirror what you depend on. If you’re building something with longevity (a paper, a thesis, a long-term monitoring project), keep your own copy of the inputs with version stamps. Cloud catalogs change. Datasets get reprocessed. Versions get superseded.


The closing point

The Planetary Computer is some of the cleanest free Earth-data access in remote sensing right now, and it brings a lot of analyses that used to require institutional resources into reach of a personal JupyterLab. If you have a question about a place and some Python skill, the answer is closer than it used to be.