Abstract: Climate change has already had a major impact across the United States. In the West, we are experiencing less winter snowpack accumulation, longer springs and hotter summers. This coupled with a hundred year’s of fire suppression creates a perfect environment for the wildfire to spread. Both operationally and to inform public policy, we need physical spread models, as well as impact assessment models built on them. However conventional approaches are generally coarse in scale, out of date, and error-prone.
The most important physical variable is the accumulation of brush and forest debris beneath forest canopy - but this is difficult to detect with 30-60m pixels. Fuel moisture is similarly important and highly variable in space and time - but typically measured in only at a few dozen weather stations per region. The locations of people and infrastructure are clearly primary socioeconomic variables. Yet conventional models treat all people as identical and don’t map anything much smaller than a census block. This session explores how modern data science techniques and GPU-accelerated analytics can be combined with better data to improve outcomes.
Taking California as a case study, we start with disaggregate data from a Microsoft ML model, providing a building footprint for each building in the state (10 million in total). We then consider forest structure data from the California Forest Observatory, also generated using ML models. At 10m/pixel, this amounts to 400m pixels per time slice (4) per variable (5). To concentrate on wildfire risk for households, we use the fire science concept of “defensible space” and buffer buildings to extract surrounding forest conditions. Over the last four years, California has experienced some devastating fires, and their perimeters are available as open data. Lastly, we resample census demographic data at the household level, to get estimates of total population and subsets of that population with special needs.
What can we learn from this data? First, we deploy unsupervised learning techniques and in particular cluster analysis. When applied to the biophysical data, this supports the classification of the ‘defensible space’ surrounding millions of buildings. When we add the social and fire history data, we can further characterize the relationships between people, landscape management, and outcomes. Second, when we apply supervised techniques such as Random Forests, we can explore the factors correlated with particular outcomes.
Bio: Abhishek Damera’s work as a Data Scientist at OmniSci involves using the state of art machine learning algorithms to capture the underlying trends in the geospatial data. Prior to this, he has done his Master’s at UC Berkeley in Transportation Engineering, where most of his work is focused on classifying the roads according to vehicular speed profiles.
Data Scientist | Omnisci