Logo

Watching the Smokies Change: Seven Years of Forest Cover Shifts from Space

Author

Cole

Date Published

It's another Smokies post! Did you know they are one of the most biodiverse temperate forests on the planet? It's survived logging, development pressure, and invasive species. But satellite data suggests it is not standing still, and over the last seven years, roughly 6% of the land area surrounding and including the park has seen meaningful changes in vegetation density.

This project started as a question: can I detect where the forest is winning and losing ground, using only free public data and a Python notebook? The answer turned out to be yes, and the results are specific enough to be interesting.

The Notebooks of course.


The data source: Sentinel-2 from Microsoft Planetary Computer

Sentinel-2 is a pair of European Space Agency satellites that image the Earth's entire land surface every five days at 10-meter resolution. They've been doing this since 2015. Every scene is freely available through Microsoft Planetary Computer, which serves the imagery as Cloud-Optimized GeoTIFFs, meaning you can fetch only the pixels you need over HTTP, without downloading full scenes.

https://upload.wikimedia.org/wikipedia/commons/1/1d/Sentinel-2A_satellite_-_Lowered_onto_launch_adapter.jpg

Sentinel-2A being lowered onto the launch adapter. Credits: ESA/M. Pédoussaut

For this project I pulled every July–August scene over GSMNP from 2018 through 2025, filtering to scenes with less than 20% cloud cover. That's the peak of summer greenness, which matters for vegetation analysis: you want leaves fully deployed and no snow, so July–August is the standard window.

A single Sentinel-2 scene over GSMNP (July 24, 2025), showing the true-color RGB composite and the Scene Classification Layer (SCL) used for cloud masking. Green pixels in the SCL are classified as vegetation and kept; cloud and shadow pixels are excluded.

The GSMNP area spans multiple MGRS tiles (Sentinel-2's grid system), and the park's steep terrain makes cloud shadows a persistent masking headache. Both issues are handled by compositing: stacking all valid scenes per year and taking the median at each pixel. Cloudy observations drop out; the median of 8–15 clear scenes per year is stable.


NDVI: the Currency of Vegetation Analysis

The metric used throughout this project is NDVI: the Normalized Difference Vegetation Index. It's a ratio of two spectral bands: near-infrared (NIR) and red.
NDVI = (NIR − Red) / (NIR + Red)

https://upload.wikimedia.org/wikipedia/commons/0/0c/Ndvi_example.jpg

Healthy green vegetation absorbs red light for photosynthesis and reflects near-infrared strongly. This makes the ratio high, typically 0.6 to 0.9 for dense forest. Bare soil, stressed vegetation, or water all score lower. NDVI doesn't measure biomass directly; it measures greenness, which correlates with canopy density, leaf area, and photosynthetic activity. It saturates in very dense canopy (values above ~0.85 compress together), which is a known limitation in old-growth forest like GSMNP.

Sentinel-2 bands B08 (NIR, 10 m) and B04 (Red, 10 m) go in; a single floating-point raster comes out. One per year, eight years total.

Eight annual NDVI composites, each the median of all cloud-free July–August Sentinel-2 scenes for that year. Deep green indicates high vegetation density (NDVI ~0.85–0.90). The grey patch visible in 2022 and 2023 is a coverage gap in the eastern AOI, fixed by extending those two years' search window to June–August. The park boundary runs across the middle of each panel.

The composites are visually stable year to year, the Smokies are consistently, densely forested. The signal of interest lives in the subtle differences between years, which the naked eye misses but differencing does not.


Change Detection: Differencing and z-scores

Two complementary approaches:

Pairwise differencing - subtract consecutive annual composites (2019 minus 2018, 2020 minus 2019, and so on through 2025 minus 2024), plus a total arc (2025 minus 2018). Pixels where |ΔNDVI| > 0.10 are flagged as significantly changed. That threshold is standard in the literature: it's large enough to filter out sensor noise and atmospheric residuals while small enough to catch real disturbance.

Z-score - compute the mean and standard deviation of NDVI at each pixel across all eight years, then express each year as standard deviations from that pixel's own baseline. Pixels with |z| > 2.0 are anomalous relative to their own history. This catches slow-building stress that pairwise differencing might average across.

The two approaches largely agree on where change is happening; the pairwise deltas are more interpretable for the visualization.

After thresholding, small isolated patches (under 0.5 ha) are removed with a morphological opening pass, like smoothing boundaries and filling pin holes. The remaining raster is vectorized into polygons. Each polygon carries its area, mean ΔNDVI, peak transition year, and direction (gain vs. loss).


What the data shows

Left: Total NDVI change, 2025 minus 2018. Green is net greening; red is net browning. Center: Which consecutive-year transition drove the most change at each pixel — the 2021→2022 transition dominates across most of the changed area. Right: Vectorized change polygons (≥0.5 ha) overlaid on the 2025 NDVI composite. Lime = gain, red = loss.

The numbers:

  • 47,630 ha of significant NDVI change detected across the study period
  • 586 km² changed area, roughly 6.1% of the full analysis extent (GSMNP plus a 10 km buffer)
  • 17,562 polygons after the 0.5 ha filter: 6,678 gain / 10,884 loss
  • Loss outpaces gain by area: 28,843 ha of net browning vs. 18,787 ha of net greening

The center panel of the figure above is particularly telling: the 2021→2022 transition drove 41.6% of the changed pixels, making it far and away the dominant event in the study period. That aligns with the severe drought that affected the southern Appalachians in summer 2021 and the subsequent stress signature carrying into 2022.

The Chimney Tops 2 fire scar

One of the verification checks in the project plan was whether the Chimney Tops 2 fire scar would be detectable. That fire burned in November 2016, before the study window starts, and affected the area around Chimney Tops trailhead (~35.68°N, 83.48°W). By 2018, recovery was already underway.

The data confirms it: the Chimney Tops area shows +0.072 NDVI gain from 2018 to 2025, a clear and steady recovery signal. That's the green clusters near Gatlinburg in the map.

The 17,562 change polygons (≥0.5 ha) overlaid on the 2025 NDVI composite. Gain polygons (lime) are scattered throughout, often corresponding to forest recovery and successional areas. Loss polygons (red) cluster at the park's edges and in lower-elevation zones. The boundary between the park interior and surrounding land is visible: inside the boundary, loss polygons are sparser and smaller.

The spatial pattern of loss is informative even without classifying causes. Loss polygons concentrate outside the park boundary, areas that are logged, developed, or experiencing more agricultural pressure. Inside the park, the signal is more mixed: gain clusters in known recovery zones, loss is patchier and likely reflects drought stress, ice storm damage, and pests.


Why loss is outpacing gain

The project is intentionally agnostic about why change happened, classifying disturbance type from NDVI alone is unreliable without additional data. But the aggregate picture is consistent with what's documented in the region:

Drought - the 2021 drought was severe across the southern Appalachians, and drought-stressed vegetation shows reduced NDVI. The 2021→2022 transition dominance is likely drought-driven.

Insect disturbance - hemlock woolly adelgid has been progressing through GSMNP since the early 2000s, killing eastern hemlocks in riparian corridors. A dead hemlock stand shows up as NDVI loss. Beech bark disease affects American beech at higher elevations.

Post-fire succession - the Chimney Tops 2 fire and smaller burns show as net gain as forest recovers, but early succession (shrubs and pioneer species replacing mature forest) may register as loss before canopy closes.

Development at the park edge - the analysis area extends 10 km beyond the park boundary, and the edge effect is clear in the polygon map. The southern and eastern edges show the highest density of loss polygons, tracking development pressure in surrounding communities.


The stack that made this possible

This is the part that's genuinely exciting from a tools perspective. A project that would have required institutional remote sensing resources ten years ago now runs in a personal JupyterLab container with a consumer GPU and an internet connection.

Microsoft Planetary Computer - a STAC catalog serving petabytes of satellite data as Cloud-Optimized GeoTIFFs. No account required for read access. The pystac-client library connects to it in four lines of Python.

stackstac - turns a list of STAC items into a lazy xarray DataArray. It handles reprojection, alignment, and chunking automatically. Combined with dask, it fetches only the pixel ranges each computation needs.

Dask Distributed - parallelizes the heavy lifting. Each annual composite ran as a dask graph: query → cloud mask → NDVI → median → write. Two workers, four cores, processing one year at a time to keep memory bounded.

rasterio + rioxarray - read/write GeoTIFFs with full CRS and transform metadata. rasterio's features.shapes() does the vectorization from raster threshold to polygon GeoDataFrame in a single call.

GeoPandas - polygon filtering, attribute joins, geometry simplification, and export to GeoPackage.

The imagery itself is 10-meter resolution global coverage, freely available, updated every five days. That's the part that still takes a moment to register.


Limitations and what comes next

NDVI saturation - in old-growth canopy with NDVI above 0.85, a 0.10 threshold misses subtle stress. NDMI (Normalized Difference Moisture Index, using SWIR instead of Red) is more sensitive to canopy water content and would catch early drought stress that NDVI flattens over.

No cause classification - knowing that a polygon lost NDVI doesn't say whether it's drought, insects, logging, or development. Adding ancillary data (land use maps, disturbance databases, species range data) would let you attribute change to categories.

2022 and 2023 coverage - those two years had fewer cloud-free scenes than average. The fix (extending the search window to June–August) worked, but June composite values are slightly lower than pure July–August values, introducing a small systematic bias in those years' z-scores.

The static maps tell the story well. The forest around Great Smoky Mountains is changing - measurably, specifically, and in ways that track with what ecologists already know about drought, insects, and land-use pressure at the park edge. Getting here from a Python notebook and free satellite data took a weekend of compute time and a few hours of debugging. That's the part worth writing down.


Analysis pipeline: pystac-client, stackstac, dask[distributed], rasterio, rioxarray, geopandas, matplotlib. Imagery: Sentinel-2 L2A via Microsoft Planetary Computer. Study area: GSMNP + 10 km buffer, EPSG:5070, 10 m resolution. Study period: July–August annual composites, 2018–2025.