[note: I lightly edited the words in this article and updated the graphs with newer data after I became aware of some potential biases in the original data. This is the new version.]
I recently got inspired to study climate change, and as is my wont I decided to look for data from "unimpeachable" sources, download it, and do statistical and visual analysis on my own.
I don’t like accepting authority or the so-called scientific consensus without double-checking. I do like data visualization and data science and have publications in those areas. But I am also skeptical of any discipline that adds "science" to the end of its name to gain credibility.
In reality, in very early 2020 I moved from San Francisco to Palm Springs, California. Palm Springs is very hot in the summer (occasionally close to 120ºF). I was worried this might have been a bad move: That it might become unlivable before too long because of climate change (in this case warming).
There’s also a property in northeastern California I’m involved in where fire is a concern, access to water is an issue, etc. It too gets quite hot in the summer. We are concerned about the climate and how it might impact this property (I may want to, for instance, retire there).
Lastly, after 32 years working at NASA, I went to work for a commercial satellite imaging data company. One of our company’s major messages is that we are providing data to help monitor and address climate change. So for many personal - as well as humanitarian - reasons, I am getting more interested in climate change.
So I did some data analysis of my own - on warming. Here's how I did it, and here's what (I think) I found:
I wanted to find long-term temperature records. From thermometers. Not from proxies (tree rings, ice cores, etc). It turns out thermometers are not that hard to make, and they have been around for a while.
The data source I focused on is the "Automated Surface Observation System" (ASOS). It is a worldwide database of (approximately) hourly temperature data from airports. Many hundreds of airports. Some of the airports' data series are only a few years old, but some have been recorded since the dawn of commercial aviation - since the 1920s.
(ASOS records hourly temperature, wind speed, and humidity - which is what pilots need to know to decide whether they can take off or not. I just looked at temperature).
I focused on the "longest duration" ASOS timeseries: 75, 85, or more years old.
First I looked at Palm Springs CA (where I live - see above), Then I looked in Hawaii (Molokai) again for personal reasons. Then I looked at Sacramento CA (the closest airport to the property in NE California - see above).
Then I looked at some arctic sites (Murmansk, Newfoundland...). Then other deserts (in the near east and middle east), Sites in Europe (Munich, Stockholm, etc). I looked at other US sites, sites in S. America, Australia, China, India. etc.
I did not cherry pick. I just looked at each "interesting region" (interesting to me) and chose the very first airport in the list which had data going back to the early 1940s or before.
Here is what I found:
Let's start with Palm Springs. I visualized each site (starting with Palm Springs) using four graphs of their respective ASOS data: The first graph is always a "sanity check": I just look at all the hourly data for the year 2020 (January 1st to December 31st - every hour) and plot it.
There are a number of things to notice here: 1) the high frequency oscillations are daily ("diurnal") oscillations in temperatures. 2) The very low-frequency cycle (low-high-low) that takes all year is the annual, seasonal temperature oscillation. And 3) the medium scale chaotic modulation is weather - perturbations that take a few days to a few weeks.
Now let's look at the entire data series for Palm Springs: From the 1940s to June 2022, about an 80 year span of hourly data:
I use a "smoothed kernel density scatterplot". The annual oscillations are obvious, but the finer-scale stuff is blurred out (due to the smoothing kernel). We notice a BIG gap of several decades in the mid-20th century (why?). We see something changed with the sensor system about half way through (not too surprising. Technology upgrade?)
So Palm Springs data has flaws. (General principle: try to visualize all the data first).
But let's continue.
The next plotting technique for the same Palm Springs data is more quantitative: A sequence of violin plots with each individual violin plot summarizing all the hourly data in a four-year bin (about 35,000 hours per bin). The final bin always contains exactly four years of data ending in August 2022.
Bins with no data are not drawn (it skips the big gap from ~1950 to ~1974. This is a problem.
There are several things I notice above: 1) The medians (as well as the maxima, minima, and quartiles) vary from box to box. I.e. the climate fluctuates. 2) The fluctuations of the medians from box to box are visible, but they are about 20x smaller than the total range spanned by each 4-year box.
The blue line is an ordinary least squares (OLS) regression fit to the medians, and red lines are OLS fit to the quartiles.
In Palm Springs (as in all deserts) the summer daytime highs are very hot and the winter nighttime lows are much colder. There are a few outlier points (errors probably)
For the final plot for these data we zoom in on the 4-year medians of the boxes (~35,000 hourly data points per median). And we also plot a linear least squares fit to the medians (shifted to zero - see comments). The least squares fit (blue line) looks like it is going up about 3ºC across the ~80-year span. However the first point is quite low. The rest of the series looks somewhat random.
Again, Palm springs has big gaps. That's disappointing. And the Palm Springs area has been extensively developed (paved and suburbanized) over the last 80 years. The blue line above is an OLS regression and the red line is a Thiel-Sen estimator (less sensitive to outliers though in this case it adds nothing).
Next I plotted another desert site with about 80 years of data - again with some gaps as you can see, though they are more spread out. It is Arak - a relatively small desert city in Iran. Here are the plots.
Next I looked at Molokai, in Hawaii:
Some things to note here: 1) tropical places have less daily and seasonal temperature variation compared to deserts. 2) There's another big gap in the data. 3) There are some extreme outliers. I suspect the outliers are when the station got damaged and a thermometer saw rain or direct sunlight (?).
A few outliers don’t effect medians. That’s one of the reasons we summarize with medians.
Next I looked at Newfoundland (CYQX) - pretty much the most Northeast part of North America. It had a long dataset (about 80 years).
Data is presented without comment:
Then I looked at Stockholm, for the same set of reasons:
And I looked at Sacramento, California because it is the closest airport to the N. Cal. property I am involved in that also has a long time series. We are very concerned about wildfires at this property. Again, presented without comment:
I started to ask friends where they were born, in order to demonstrate the visualizations and get a "random" sample of locations. So next I looked at Kelly Air Force base, Texas. Here it is. Over 80 years of data.
Similarly I looked at Murmansk. I keep expecting to find warming in or near the Arctic:
Next I wanted to look at China. Longer ASOS datasets are few and far between in that country, but I found Kunming. Even I know (or thought I knew) that China is getting hotter.
Then I looked south. Australia and Tasmania. Both presented without comment.
Then I decided to look at India. "Everyone knows" that India is suffering from climate change. Here is the first place I looked at because it had long time series:
Is the sudden drop in the first plot the onset of the annual monsoon?
Here is the next place I looked at in India:
And here is the first place I looked at in South America: Lapaz Bolivia. Heavy urbanization over the time period in question:
Then I looked at Alaska, Munich, and Minneapolis. Again, presented without comment:
[note added afterwards: a friend started helping with this and we will soon make another post that analyzes all the stations with more than, say, 40 years of data]
[here are more I did after the original post. Again not cherry-picked. In the order I picked them and plotted them with no prior knowledge of what I would see]
[lots of sparsity because of wars Iran was involved in?]
[serious sparse outlier data in a few of the earlier 4-year bins. They pull the OLS line way off. Thiel-Sen regression (red line) is not sensitive to this]
[Portland OR has presumably had a lot of urbanization near its international airport?]
[Camp Pendleton airport in Oregon - not that far from Portland - has had less urbanization I suspect]
Nellis AFB, NV (on the outskirts of Las Vegas). Seems like it was very hot ~1940…
Another airport (CYQX) in Newfoundland.
These are not "global temperatures". How do you even define "global temperature"? These are a selection of airports with 75 to 100 years of temperatures measured by thermometers.
When I look I see two big things: 1) The extremes at each location in each 4-year bin span a much larger temperature range than any slow wandering of the median temperatures over ~80 years. And 2) the series of medians (and the minima and maxima) look pretty close to random.
Does downloading and plotting data (from NOAA and the National Weather Service) make me a science denier? I don't want to believe that, but I suppose I am (almost) as biased as the next human.
What do you see? I suspect this is a case where most people will see what they expect to see. How will that work out for us?
References:
https://www.weather.gov/asos/asostech
https://www.weather.gov/media/asos/aum-toc.pdf
http://sites.psu.edu/shanekearnsportfolio/wp-content/uploads/sites/20899/2014/12/Technical-Definition-and-Description-NOAA-ASOS.pdf
https://mashable.com/feature/hotornot-history-20-year-anniversary
Data sources:
https://mesonet.agron.iastate.edu/request/download.phtml
Code:
Written in the R language. Available upon request.
Some comments about this on hacker news:
https://news.ycombinator.com/item?id=31840139
(I appreciate the discussion though it appears largely to be, as they say, “more heat than light” :)
I use thermometers, not thermostats :-)
I use them because they directly measure temperature, pure and simple.
Proxies like tree-rings, while allowing for longer duration (since some trees have existed longer than the oldest thermometers) have confounding factors: the amount of rain, the availability of nutrients, other local ecological stresses, etc.
Nothing to add - I just can't get over how refreshing it is to come across people who think clearly! Thanks SO much.