7.9 The Questions
7.9.1 Set 1
- Do economics majors watch more or less TV than computer science majors?
Code
## Warning: Removed 11 rows containing non-finite values (`stat_bin()`).
What kinds of tests could be reasonable to conduct? For what part of the data would we conduct these tests?
Code
- Do Americans with pets watch more or less TV than Americans without pets?
7.9.2 Set 2
- Do Americans spend more time emailing or using the web?
##
## Paired t-test
##
## data: wwwhr and emailhr
## t = 13.44, df = 1360, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 5.530219 7.420553
## sample estimates:
## mean difference
## 6.475386
Code
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 986 rows containing non-finite values (`stat_bin()`).
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 929 rows containing non-finite values (`stat_bin()`).
##
## Welch Two Sample t-test
##
## data: GSS$wwwhr and GSS$emailhr
## t = 12.073, df = 2398.5, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 5.657397 7.851614
## sample estimates:
## mean of x mean of y
## 13.906021 7.151515
- Do Americans spend more evenings with neighbors or with relatives?
Code
wilcox_test_data <- GSS %>%
select(socrel, socommun) %>%
mutate(
family_ordered = factor(
x = socrel,
levels = c('almost daily', 'sev times a week',
'sev times a mnth', 'once a month',
'sev times a year', 'once a year', 'never')),
friends_ordered = factor(
x = socommun,
levels = c('almost daily', 'sev times a week',
'sev times a mnth', 'once a month',
'sev times a year', 'once a year', 'never')))
To begin this investigation, we’ve got to look at the data and see what is in it. If you look below, you’ll note that it sure seems that people are spending more time with their family… erp, actually no. They’re “hanging out” with their friends rather than taking their mother out to dinner.
Code
wilcox_test_data %>%
select(friends_ordered, family_ordered) %>%
rename(
Friends = friends_ordered,
Family = family_ordered
) %>%
drop_na() %>%
pivot_longer(cols = c(Friends, Family)) %>%
ggplot() +
aes(x=value, fill=name) +
geom_histogram(stat='count', position='dodge', alpha=0.7) +
scale_fill_manual(values = c('#003262', '#FDB515')) +
labs(
title = 'Do Americans Spend Times With Friends or Family?',
subtitle = 'A cutting analysis.',
fill = 'Friends or Family',
x = 'Amount of Time Spent') +
scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
theme_minimal()
## Warning in geom_histogram(stat = "count", position = "dodge", alpha = 0.7):
## Ignoring unknown parameters: `binwidth`, `bins`, and `pad`
With this plot created, we can ask if what we observe in the plot is the produce of what could just be sampling error, or if this is something that was unlikely to arise due if the null hypothesis were true. What is the null hypothesis? Well, lets suppose that if we didn’t know anything about the data that we would expect there to be no difference between the amount of time spent with friends or families.
Code
##
## Wilcoxon rank sum test with continuity correction
##
## data: as.numeric(family_ordered) and as.numeric(friends_ordered)
## W = 716676, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0