This document is an overview of the structure and contents of the EPA’s Air Quality System (AQS) database. It answers questions many first-time data users have about the availability of data, the geographic and temporal scope, and how its organized and processed. Please note the caveat at the end of this document.

1. Introduction

The Clean Air Act requires that state, local, and tribal air pollution control agencies monitor the air for ambient levels of certain pollutants. This data is useful for health and policy research relating to air pollution and its control (Fann et al. 2015). The requirements for the monitoring program are codified in 40 CFR Part 58. In addition to the required monitoring, many agencies perform additional and/or voluntary monitoring of substances and meteorological parameters. The monitoring program is designed to meet three objectives (40 CFR Part 58 Appendix D.1):

  • Provide air pollution data to the public in a timely manner;

  • Support compliance with ambient air quality standards and emissions strategy development; and

  • Support for air pollution research studies.

This data is reported to the United State Environmental Protection Agency (EPA). The monitoring agencies are required to report the measured data, along with metadata about the site and monitoring equipment and associated quality assurance data to the US EPA’s Air Quality System (AQS). AQS and its predecessors have been accepting and storing this data for more than 40 years and currently contains more than 2 billion measurements. This document describes (1) the methods by which the data can be obtained, (2) a description of the AQS data set, and (3) some background material about the monitoring program that may help users select and interpret data.

2. Accessing the Data

All public access of AQS data is via the AQS Data Mart, a copy of the AQS database that is made once per week (currently on Sunday nights). This section summarizes the different ways you can access the data.

2.1. AirData

EPA’s primary portal for public access to air quality data is the AirData website. This website has reports, graphics, and maps that can be customized by the user. Most AirData tools show only criteria pollutants. AirData is suitable for use by citizens interested in their local conditions or policymakers and researchers looking for the data they need.

2.2. The AQS API

Within AirData, the AQS API (application programming interface) is available for querying data from the Data Mart database. The API is a suite of REST web services allowing users to customize data queries to select only the data they need. It provides access to all data within AQS (and not just criteria pollutants). Registration is required and detailed instructions for use are available at the API. Currently, only sample (raw) data may be accessed via the API.

The version 2 API is also available for beta testing. Please note, data from this API should not be used for scientific or regulatory work at this point. It has many more services, including daily and annual summary data, site and monitor descriptions, and quality assurance data and other "helper" services like lists of values and column definitions.

The AQS APIs are suitable for use by people who are familiar with the monitoring program and understand how the EPA collects and organizes data. The remainder of this paper is a primer on the topic.

2.3. Data Files

EPA also posts files of data at AirData for the convenience of users who would like large sets of the data without having to use the API. The files are organized by parameter (or parameter class), temporal aggregation level, and collection year. The temporal aggregation levels are hourly, 8-hour (for ozone), daily, and annual. Each file contains all data for the nation.

These files are updated two times per year, nominally in May and November. The May update is intended to be the first complete set of data for the prior calendar year. The November update is intended to be the first complete data set for the ozone season of the current year. Keep in mind that given the submission schedules, allowable resubmission windows, etc. that more data may arrive or the posted data may change after it is initially released. Each file has a "last update date" which will be updated if, during the regular update cycle, any data in the file has been added or changed.

The data files are suitable for use by people who are familiar with the monitoring program and understand how the EPA collects and organizes data and are comfortable handling large (multi-million row) data files.

2.4. Real Time Data

Real time data (data collected today) is not available from AQS (see the timeliness of data section below). If real-time information is needed, please visit the AirNow API site and direct all questions there.

2.5. Toxics Data

Toxics Data (also called HAP or Hazardous Air Pollution Data) is available from all of the sources listed above. However, it is suggested you get toxics data from the EPA Toxics Archive. This is a value added product that includes data from sources other than AQS, has additional EPA-applied quality assurance, and has less stringent data reduction policies (e.g., rounding) applied. Toxics data is included in the sources above for those needing it in a consistent format with the other outputs.

3. The AQS Data Set

This section provides a statistical abstract of the data available in the data set. The three dimensions appropriate to a discussion of the data set as a whole are time, geography and parameter.

3.1. Time

The earliest sample in the data set is from 1957. From that time the number of parameters sampled and the frequency at which they are sampled has generally increased. Changes in regulatory and health focus has caused variability in the numbers of monitors in operation over time. Table 1 shows the number of monitors operating, the number of distinct parameters sampled, the number of operating monitors, and the number of individual samples over time for the data set.

Table 1. The number of parameters, monitors, and samples by year in the AQS data set.

Year

Parameters

Monitors

Sample Measurements

1957

19

369

9,175

1958

25

737

47,653

1959

25

618

45,970

1960

26

1,010

52,255

1961

26

1,068

74,405

1962

29

1,348

230,012

1963

30

1,510

826,563

1964

30

2,355

997,277

1965

33

4,116

1,279,624

1966

52

5,233

1,608,604

1967

53

6,759

1,983,060

1968

69

10,276

2,859,647

1969

65

10,694

3,078,881

1970

53

9,018

3,020,440

1971

66

10,749

5,661,916

1972

73

14,365

8,729,757

1973

88

16,848

12,433,216

1974

87

19,883

17,702,113

1975

84

24,826

24,096,286

1976

83

22,935

25,358,854

1977

82

21,894

24,480,124

1978

82

20,472

23,293,294

1979

92

18,799

23,312,933

1980

93

18,614

25,252,283

1981

94

17,781

26,105,665

1982

80

13,056

27,138,649

1983

85

12,575

27,427,318

1984

82

12,009

27,922,711

1985

130

13,573

27,306,787

1986

117

13,251

27,403,933

1987

166

15,116

28,160,396

1988

219

17,302

30,230,646

1989

203

16,825

32,848,446

1990

310

17,755

34,578,502

1991

304

18,117

38,155,521

1992

362

20,298

40,211,030

1993

400

25,913

42,928,330

1994

425

29,349

47,887,909

1995

487

36,448

53,507,747

1996

467

40,211

55,708,476

1997

455

39,348

57,726,410

1998

482

40,509

60,347,372

1999

506

52,396

62,219,311

2000

564

58,968

66,425,708

2001

671

67,554

70,797,391

2002

623

77,428

74,079,404

2003

542

79,158

75,192,843

2004

528

80,952

77,176,377

2005

601

85,386

80,014,872

2006

595

81,947

82,461,706

2007

542

77,282

85,726,466

2008

543

75,217

85,540,102

2009

551

78,237

88,291,514

2010

683

71,977

96,722,999

2011

683

70,490

113,467,240

2012

594

68,761

114,862,878

2013

579

67,025

115,825,571

2014

562

65,582

116,351,413

2015

555

61,692

115,247,883

2016

563

58,824

117,424,252

2017

557

53,999

120,551,163

2018

479

52,286

117,373,293

2019

222

9,143

10,851,134

Table current as of May 21st, 2019.

As can be seen, the number of samples is on a generally upward trend over time. The number of parameters and monitors varies as areas of focus change. While in any given year the most unique parameters reported is less than 700, the total number of unique parameters reported in all years is 1,167.

3.1.1. The significance of 1980

1980 marked a revision to the ozone monitoring program (44 FR 8202) that included implementation of revised monitor calibration procedures. Data from prior years is available but the user should understand that total uncertainty and spatial variability as artefacts of the measurements are higher than in later years. 1980 marks the beginning of nationally consistent operational and quality assurance procedures.

3.1.2. The significance of 1999

1999 marked the beginning of required PM2.5 (particulate matter of 2.5 microns in aerodynamic diameter or less) and PM2.5 speciated monitoring (62 FR 38652). PM10 and TSP data is available in prior years, but 1999 is the first year with national FRM and non-FRM PM2.5 monitoring. This is reflected in the large jump in the number of monitors in 1999.

3.2. Sample Durations

Each monitor reports data at a specific sample duration. The sample value is the average atmospheric concentration of the parameter in the time window beginning at the sample begin time and lasting for the sample duration.

Sample duration is not to be confused with a calculated temporal aggregation (8-hour, daily summary, annual summary, etc.). EPA calculates many summaries (see the data aggregation section below) at different temporal scales. Sample duration applies only to the data that is reported to EPA by the monitoring organizations. For example, 8-hour ozone averages are not referred to as "samples" since they are calculated from the 1-hour sample measurements. Summary (temporal aggregate) data is available at our system-defined calculation levels and are not included in the table below. Table 2 profiles how much data is available at the various sample durations.

Table 2. The frequency of different sample durations in the AQS data set.

Sample Duration

Samples

Percentage

Latest Year Reported

5 MINUTE

177,827,887

6.40

2019

15 MINUTE

669,038

0.02

2018

1 HOUR

2,483,985,932

89.39

2019

2 HOUR

10,084,178

0.36

2018

3 HOUR

11,877,659

0.43

2019

4 HOUR

205,895

0.01

2012

5 HOUR

2,706

0.00

2012

6 HOUR

10,435

0.00

2012

8 HOUR

6,302

0.00

2018

12 HOUR

106,704

0.00

2017

24 HOUR

93,433,895

3.36

2019

1 WEEK

96,796

0.00

2012

1 MONTH

150,588

0.01

1996

3 MONTH

15,952

0.00

1969

COMPOSITE DATA

222,399

0.01

2019

INTEGRATED PASSIVE 4-WEEKS

42

0.00

2008

INTEGREATED PASSIVE 2-WEEKS

53

0.00

2008

INTEGREATED PASSIVE 3-WEEKS

81

0.00

2008

Table current as of May 21st, 2019.

Highlights of Table 2 include the following: Beginning in 2010, 5-Minute duration SO2 data was required to be reported and it now ranks second in number of samples. It has grown quickly due to the relatively large number of samples reported per year. Hourly data is by far the most voluminous in AQS with over 2 billion samples. The 2- and 3-hour data is largely carbon speciation and ozone precursor data respectively. The third largest component is the 24-hour data as many particulate and toxics samples are reported at this duration and have been for a long period of time. Composite data are concentration values derived from two or more air samples obtained at different times that are analyzed together. These separate samples may span any time period from one week to one year. All such composite data is for total solid particulate (TSP) metal species.

3.3. Geography

In 2017, data was collected at 2,761 AQS sites. An AQS site is a distinct geographic location that has one or more monitors. Not every site measures the same parameters. The location of these sites within the continental United States is shown in Figure 1.

Figure 1. AQS CONUS sites reporting data in 2017.