Return to AQS Home

AQS Data Mart

The AQS Data Mart is designed to make air quality data more accessible and useful to the scientific and technical community.

Accessing the AQS Data Mart

All access to the AQS Data Mart is through the AirData site.

The Query AirData interface allows users to extract data from the database. You must register (details are at the link) to use that site and your Direct Interface or AQS account will not work.

Query AirData is the API (Application Programming Interface) developed for accessing the AQS Data Mart. It contains a suite of REST web services and an interface if the user would rather construct queries manually. The details of the services are described there.

Files of data also exist at the AirData File Download page.

Please visit the AirData homepage for additional data products.

The Direct Interface Java applet is no longer available.


About the AQS Data Mart


Basic Information

The AQS Data Mart is a database containing all of the information from AQS. It has every measured value the EPA has collected via the national ambient air monitoring program. It also includes the associated aggregate values calculated by EPA (8-hour, daily, annual, etc.). The AQS Data Mart is a copy of AQS made once per week and made accessible to the public through web-based applications. The intended users of the Data Mart are air quality data analysts in the regulatory, academic, and health research communities. It is intended for those who need to download large volumes of detailed technical data stored at EPA and does not provide any interactive analytical tools. It serves as the back-end database for several Agency interactive tools that could not fully function without it: AirData, AirCompare, The Remote Sensing Information Gateway, the Map Monitoring Sites KML page, etc.

AQS must maintain constant readiness to accept data and meet high data integrity requirements, thus is limited in the number of users and queries to which it can respond. The Data Mart, as a read only copy, can allow wider access.

The most commonly requested aggregation levels of data (and key metrics in each) are:

Sample Values (2.4 billion values back as far as 1957, national consistency begins in 1980, data for 500 substances routinely collected)
  • The sample value converted to standard units of measure (generally 1-hour averages as reported to EPA, sometimes 24-hour averages)
  • Local Standard Time (LST) and GMT timestamps
  • Measurement method
  • Measurement uncertainty, where known
  • Any exceptional events affecting the data
NAAQS Averages
  • NAAQS average values (8-hour averages for ozone and CO, 24-hour averages for PM2.5)
Daily Summary Values (each monitor has the following calculated each day)
  • Observation count
  • Observation per cent (of expected observations)
  • Arithmetic mean of observations
  • Max observation and time of max
  • AQI (air quality index) where applicable
  • Number of observations > Standard where applicable
Annual Summary Values (each monitor has the following calculated each year)
  • Observation count and per cent
  • Valid days
  • Required observation count
  • Null observation count
  • Exceptional values count
  • Arithmetic Mean and Standard Deviation
  • 1st - 4th maximum (highest) observations
  • Percentiles (99, 98, 95, 90, 75, 50)
  • Number of observations > Standard
Site and Monitor Information
  • FIPS State Code (the first 5 items on this list make up the AQS Monitor Identifier)
  • FIPS County Code
  • Site Number (unique within the county)
  • Parameter Code (what is measured)
  • POC (Parameter Occurrence Code) to distinguish from different samplers at the same site
  • Latitude
  • Longitude
  • Measurement method information
  • Owner / operator / data-submitter information
  • Monitoring Network to which the monitor belongs
  • Exemptions from regulatory requirements
  • Operational dates
  • City and CBSA where the monitor is located
Quality Assurance Information
  • Various data fields related to the 19 different QA assessments possible

A note about monthly data. EPA does not calculate or provide any monthly data (except for a rolling 3-month average for lead as required by the standards). We know monthly aggregates are very useful for certain analyses, however the data user will have to calculate their own monthly aggregates from sample values or daily summary values.

Not all of this data is available (yet) via the AirData interfaces. If you cannot find what you are looking for, please email your question.


Additional Documentation

The REST API Documentation for Query AirData contains information on registering, constructing queries, and descriptions of the output formats available.

A full Data Dictionary for AQS data is available that describes all the fields. (Warning: the link is to a 425 page PDF document. An HTML version is in the works.)

All questions about the Data Mart should be directed to aqsdatamart@epa.gov


Query Considerations: Volume of Data and Retrieval Times

The AQS Data Mart has approximately 3.4 billion rows of data (measurements, NAAQS averages, daily summary, and annual summary; site, monitor and method descriptions; quality assurance results, etc.) available for query. The web services have been structured to function as optimally as possible. However, the user has unlimited access to this data and if large queries are selected, they may run for a very long time and affect the operations of the Data Mart. We ask that all users limit single queries to 2 million rows. Users may use the Profile web service to determine how many sample values would meet your selection criteria. Also, if users automate retrievals (i.e., with WGET calls from a script) they should limit the number of service requests to 1 every 5 seconds (or longer!) with an expected pull rate of below 2 million rows per hour.

In a typical year, the API is queried 4 thousand times and serves 3 billion rows of data.


Timeliness of Data

The data in the AQS Data Mart is updated from AQS every Sunday night.

Most data in AQS is required to be submitted by the end of the calendar quarter after the quarter in which it was collected. Some types of data (using non-continuous methods) are allowed an additional calendar quarter to be assembled and reported. However, AQS is updated practically every day as reporting agencies have data ready to submit. A key milestone in reporting is May 01, by which all data for the prior year should be complete and correct.

Historical data can change at any time. Many quality assurance review processes are made on an entire year's worth of data, so it might not be until the middle of this year until the final review and changes have been made to last year's data by a submitter. Also, historical monitoring or calculation methods may be found to be problematic and require that older data be changed. Finally, there is no "versioning" or freezing of data in the Data Mart, so if other people may need the data exactly as it was retrieved to verify or continue an analysis, the user must preserve a copy.

Real time data. Real time data (data collected today) is not available in the Data Mart. If real-time information is needed, please visit the AirNow data gateway site and direct all questions there.


Interpretation of Data

The air quality data collected by EPA has grown into a complex structure of data and concepts as statutes, regulations, technology, and our understanding of health effects have evolved. There are many monitoring networks with different structures and quality assurance requirements that report data. Different pollutants can only be measured using certain methods and this dictates certain sample collection times. Health standards have been set that require measurements to be averaged over times different than sample collection times. And so on..

Many of the data handling procedures are specific to a pollutant or a method. It is imperative for the user to understand the meaning of the data elements and techniques for aggregating data, as they are not always mathematically obvious.

Learning, understanding, and keeping up to date with changes in the data is a challenge for the analyst. For detailed information, please refer to the AQS Data Dictionary or the REST API documentation (both linked in the documentation section above). The documentation will be expanded to include interpretation guides for the analyst.


Accuracy of Data

The data in AQS is considered to be of the highest quality. It is submitted by tribal, state, and local agencies and must pass several quality control tests before it can be saved. Additionally, each monitor must pass several quality assurance assessments and audits (outlined in 40 CFR Part 58 Appendix A). The raw data collected during these assessments is available in the Data Mart itself, and summaries are available at our quality assurance website. Finally, each submitting agency annually certifies that the data they submit is correct (and the certification status of the data is also in the Data Mart).

Please note that the EPA does not (usually) unilaterally change any data submitted to us. If data is found to be in error, a change may require that the state, local, or tribal government who provided EPA with the data to correct it.


Keeping up with Data Mart News

The best way to get news related to air quality data and the AirData system (under which the Data Mart operates) is to subscribe to the AirData RSS feed at http://www.epa.gov/airquality/airdata/rssairdata.xml. It is recommended that all users subscribe to this feed. Notifications of system enhancements, down times, data changes, etc. are posted.


Citation

A user publish findings based on this data should use the following citation (with the date changed to reflect the save date on the data file):

US Environmental Protection Agency. Air Quality System Data Mart [internet database] available at http://www.epa.gov/ttn/airs/aqsdatamart. Accessed Month DD, YYYY.

This is the AMA database citation format. In 2015 or 2016, a paper will be published on data access which will provide a proper citation for use.
This page last updated on 2018-09-05