📚

 > 

📊 

 > 

🔎

3.4 Potential Problems with Sampling

6 min readdecember 29, 2022

A

Aly Moosa

Kanya Shah

Kanya Shah

Jed Quiaoit

Jed Quiaoit

A

Aly Moosa

Kanya Shah

Kanya Shah

Jed Quiaoit

Jed Quiaoit


AP Statistics 📊

265 resources
See Units

For the next couple sections, be sure to hammer this definition down; hence, I'll be repeating the same sentence again and again. Repeat after me:
Bias occurs when certain responses are systematically favored over others. 👎
Bias occurs when certain responses are systematically favored over others. 👎
Bias occurs when certain responses are systematically favored over others. 👎
Bias occurs when certain responses are systematically favored over others. 👎
Bias occurs when certain responses are systematically favored over others, either intentionally or unintentionally.
That's right! Previously, we've discussed methods of sampling accurately but now, it is imperative to discuss how not to gather data. Statistical studies are biased if it is likely to underestimate or overestimate the value you are looking for. Bias can result in a sample or a study that is not representative of the population being studied, which can lead to inaccurate or misleading conclusions. 😕

Bias in Sampling 

Voluntary Response

Voluntary response bias, also known as self-selection bias, occurs when a sample is comprised entirely of volunteers or people who choose to participate in a study. This can occur when the study relies on self-reported data or when participants are recruited through means such as online surveys or television or radio programs. 🙋‍♂️
Because voluntary response samples are not randomly selected, they are often not representative of the population being studied. For example, people who are more interested in the topic being studied or who have strong opinions on the topic may be more likely to volunteer to participate in the study. This can result in a sample that is biased and does not accurately represent the views or characteristics of the population.

Example

You are a researcher who is interested in studying the attitudes of young adults towards social media. You decide to conduct an online survey to gather data on this topic. You create a link to the survey and post it on your personal social media accounts, as well as on several online forums and discussion groups related to social media. You also share the link with your friends and ask them to share it with their friends.
Within a few days, you receive a large number of responses to your survey. However, as you begin to analyze the data, you realize that there are some problems with the sample. Many of the respondents are young adults who are very active on social media and have strong opinions about it, while others are older adults who do not use social media as much. There are also a disproportionate number of responses from people who live in urban areas, while responses from people in rural areas are much less common.
Upon further investigation, you realize that the sample for your survey is not representative of the population of young adults in the United States. Instead, it is comprised mostly of volunteers who chose to participate in the survey, and these volunteers are not representative of the overall population. 😔

Undercoverage

Undercoverage bias occurs when a portion of the population has a reduced chance of being included in the sample, resulting in a sample that is not representative of the population. This can occur for a variety of reasons, such as when the sampling frame is incomplete or when certain groups are more difficult to reach or sample than others. 😡

Example

if you're studying the attitudes of college students towards climate change and you use a sampling frame that only includes students at four-year colleges and universities, you may be introducing undercoverage bias into your sample. This is because students at two-year colleges and trade schools may be underrepresented in your sample, even though they are part of the overall population of college students in the United States. 😔

Nonresponse

Nonresponse bias occurs when individuals who are chosen for the sample but for whom data cannot be obtained (or who refuse to respond) differ from those for whom data can be obtained. This can happen when a study relies on self-reported data or when data is collected through methods such as surveys or interviews. 👻

Example

If you're conducting a survey of college students' attitudes towards climate change and you send the survey to a random sample of 1000 students, but only 500 students complete and return the survey, you may be introducing nonresponse bias into your sample. This is because the students who completed and returned the survey may differ in some way from the students who did not respond, even though both groups were chosen for the sample.

Wording in Questions

Question wording bias occurs when the wording of a survey question or the way a question is asked influences the response that is given. This can happen when the question is confusing, misleading, or ambiguous, or when the question asks for too much or too little information. ✉️

Example

Consider the following two questions:
  • "Do you think that the government should provide free healthcare for all citizens?"
  • "Would you support a government healthcare program that would provide free healthcare for all citizens?"
These two questions are asking about the same topic, but they are worded differently. The first question uses a leading phrase ("Do you think that") that may bias the respondent towards agreeing with the statement, while the second question presents the idea of a government healthcare program more neutrally. As a result, the responses to these two questions may differ, even though they are asking about the same topic.

Convenience Sampling

Convenience sampling occurs in which the sample is selected based on ease of access or availability, rather than through a random selection process. This means that the sample is not representative of the population and may be biased. 🥪

Example

Let's go back to the researcher who wants to study the attitudes of college students towards climate change!
Rather than using a random sampling method to select a representative sample of students, the researcher decides to conduct the study on a college campus that is close to their home. The researcher then interviews students who are available and willing to participate, such as students who are passing by or who are hanging out in a common area.
In this example, the sample is not representative of all college students in the United States, but rather a convenience sample of students from a single college campus. This sample may be biased because it only includes students who are available and willing to participate, and it does not include students who are not on campus or who are not interested in participating in the study. As a result, the results of the study may not accurately represent the attitudes of college students towards climate change.

Why does this matter, then? 

If sampling is done without considering the factors that could invalidate your results, then the concluding data and analysis will not be representative of the population. 
https://firebasestorage.googleapis.com/v0/b/fiveable-92889.appspot.com/o/images%2FCiOtO9fVEAAB3RU.jpg?alt=media&token=0091d591-9f31-4acc-91a8-0f87767537ae

Courtesy of Twitter

Practice Problem

You are a researcher who is interested in studying the attitudes of small business owners towards the economy. You decide to conduct a survey to gather data on this topic.
To collect data for the survey, you create a list of all the small business owners in your city and send the survey to a random sample of 1000 business owners. However, you only receive responses from 500 of the business owners, and many of the respondents are from larger businesses with more than 50 employees.
Upon further analysis, you realize that your sample is not representative of the population of small business owners in your city.
a) Identify the type of bias present in this sample.
b) Explain how this bias may have affected the results of the survey.

Answer

a) In this situation, the sample is suffering from nonresponse bias because not all of the business owners who were chosen for the sample responded to the survey. This means that the sample is not representative of the population of small business owners in the city, as it only includes responses from 500 business owners, many of whom are from larger businesses.
b) This bias may have affected the results of the survey in several ways. For example, the attitudes of small business owners who did not respond to the survey may differ from those who did respond, even though both groups were chosen for the sample.
Additionally, the oversampling of larger businesses may have skewed the results in a way that is not representative of the attitudes of small business owners overall. As a result, the results of the survey may not accurately represent the attitudes of small business owners towards the economy.
Browse Study Guides By Unit
👆Unit 1 – Exploring One-Variable Data
✌️Unit 2 – Exploring Two-Variable Data
🎲Unit 4 – Probability, Random Variables, & Probability Distributions
📊Unit 5 – Sampling Distributions
⚖️Unit 6 – Proportions
😼Unit 7 – Means
✳️Unit 8 – Chi-Squares
📈Unit 9 – Slopes
✏️Frequently Asked Questions
✍️Free Response Questions (FRQs)
📆Big Reviews: Finals & Exam Prep