yaopf.blogg.se - Shu by Alexandria House

This article provides a systematic review of data integration for combining probability samples, probability and non-probability samples, and probability and big data samples. The past years have seen immense progress in theories, methods, and algorithms for surmounting important challenges arising from non-probability data analysis. Recently, in survey statistics, non-probability data become increasingly available for research purposes and provide unprecedented opportunities for new scientific discovery however, they also present additional challenges such as heterogeneity, selection bias, high dimensionality, etc. Broadly speaking, one can consider combining probability samples with non-probability samples. Narrowly speaking, survey integration means combining separate probability samples into one survey instrument (Bycroft 2010). The goal is multi-fold: (1) minimize the cost associated with surveys, (2) minimize the respondent burden, and (3) maximize the statistical information or equivalently the efficiency of survey estimation. Simultaneously, program budget cuts force reductions in sample sizes, and decreasing response rates make non-response bias an important concern.ĭata integration is a new area of research to provide a timely solution to the above challenges. Demands include requests for estimates for domains with small sample sizes and desires for more timely estimates. Large-scale survey programs continually face heightened demands coupled with reduced resources. However, many practical challenges arise in collecting and analyzing probability sample data (Baker et al. Kalton ( 2019) provided a comprehensive overview of the survey sampling research in the last 60 years.

( 2003), Cochran ( 1977) and Fuller ( 2009) for textbook discussions. Because the selection probability is known, the subsequent inference from a probability sample is often design-based and respects the way in which the data were collected see Särndal et al. Fundamentally, probability samples are selected under known sampling designs and, therefore, are representative of the target population.

Probability sampling is regarded as the gold standard in survey statistics for finite population inference.