Imputation methods for addressing missing data in short-term monitoring of air pollutants
AffiliationUniv Arizona, Mel & Enid Zuckerman Coll Publ Hlth
Univ Arizona, Interdisciplinary Program Appl Math
MetadataShow full item record
CitationHadeed, S. J., O'Rourke, M. K., Burgess, J. L., Harris, R. B., & Canales, R. A. (2020). Imputation methods for addressing missing data in short-term monitoring of air pollutants. Science of The Total Environment, 139140. https://doi.org/10.1016/j.scitotenv.2020.139140
JournalSCIENCE OF THE TOTAL ENVIRONMENT
RightsCopyright © 2020 Elsevier B.V. All rights reserved.
Collection InformationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at email@example.com.
AbstractMonitoring of environmental contaminants is a critical part of exposure sciences research and public health practice. Missing data are often encountered when performing short-term monitoring (<24 h) of air pollutants with real-time monitors, especially in resource-limited areas. Approaches for handling consecutive periods of missing and incomplete data in this context remain unclear. Our aim is to evaluate existing imputation methods for handling missing data for real-time monitors operating for short durations. In a current field-study, realtime PM2.5 monitors were placed outside of 20 households and ran for 24-hours. Missing data was simulated in these households at four consecutive periods of missingness (20%, 40%, 60%, 80%). Univariate (Mean, Median, Last Observation Carried Forward, Kalman Filter, Random, Markov) and multivariate time-series (Predictive Mean Matching, Row Mean Method) methods were used to impute missing concentrations, and performance was evaluated using five error metrics (Absolute Bias, Percent Absolute Error in Means, R2 Coefficient of Determination, Root Mean Square Error, Mean Absolute Error). Univariate methods of Markov, random, and mean imputations were the best performingmethods that yielded 24-hour mean concentrations with the lowest error and highest R2 values across all levels of missingness. When evaluating error metrics minute-by-minute, Kalman filters, median, and Markov methods performed well at low levels of missingness (20-40%). However, at higher levels of missingness (60-80%), Markov, random, median, and mean imputation performed best on average. Multivariate methods were the worst performing imputation methods across all levels of missingness. Imputation using univariate methods may provide a reasonable solution to addressing missing data for short-term monitoring of air pollutants, especially in resource-limited areas. Further efforts are needed to evaluate imputation methods that are generalizable across a diverse range of study environments. (C) 2020 Elsevier B.V. All rights reserved.
Note24 month embargo; published online: 3 May 2020
VersionFinal accepted manuscript
- A novel scaling methodology to reduce the biases associated with missing data from commercial activity monitors.
- Authors: O'Driscoll R, Turicchi J, Duarte C, Michalowska J, Larsen SC, Palmeira AL, Heitmann BL, Horgan GW, Stubbs RJ
- Issue date: 2020
- Spatial imputation for air pollutants data sets via low rank matrix completion algorithm.
- Authors: Liu X, Wang X, Zou L, Xia J, Pang W
- Issue date: 2020 Jun
- Handling of Missing Outcome Data in Acute Stroke Trials: Advantages of Multiple Imputation Using Baseline and Postbaseline Variables.
- Authors: Young-Saver DF, Gornbein J, Starkman S, Saver JL
- Issue date: 2018 Dec
- Outcome-sensitive multiple imputation: a simulation study.
- Authors: Kontopantelis E, White IR, Sperrin M, Buchan I
- Issue date: 2017 Jan 9
- Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study.
- Authors: Marshall A, Altman DG, Holder RL
- Issue date: 2010 Dec 31