Question 1: [4 points]
- Use your census dataset sample to estimate the NUMBER (not the percentage) of people who commute to work using public transit. Use the variable mode and be sure to exclude people who do not commute to work. Assume the total population of Canada is exactly 35 million.
- Indicate how far away from the true number of residents who commute by public transit you would expect to be, 19 times out of 20. Your answer should be a number not a percentage or proportion. You calculate this using the formula for the standard error of a proportion and then use that proportion to calculate the number, as you will have done in question 1a.
Question 2: [3 points]
Find the average total income of people who commute by public transit vs any other mode of transport. Exclude those who do not commute and remove missing values for income. Do not recode values that are $0 or negative. Report your comparison of these two means in a smoothly worded paragraph that summarizes the findings for a reader. Offer an explanation for any difference you observe. You do not need to discuss the p-value for this question.
Now switch to the American National Election Study. The file is “anes_timeseries_2020_for 380.dta”
DO THIS FIRST: You need to draw a random sample of 1500 cases from the dataset. That way you’ll all get different samples that I can have my computer replicate.
First, set the random number seed by typing: set seed studentnumber (where you replace studentnumber with your numeric student number).
Use the command sample: sample 1500, count. (If you do not include “count” in your command, Stata thinks you want 1500% of your sample and won’t be able to do anything.)
Now use the separate command count to double check that you now have 1500 cases to work with:
type count in the command window
Stata should simply report 1500. (If it’s close, it’s ok).
Question 3: [6 points]
In this question you will answer the question: is presidential vote choice (in 2020) related to attitudes towards Covid experts?
So first create an index of feelings toward the following groups/people: 1) The Centre for Disease Control (CDC), 2) The World Health Organization, 3) Scientists, and 4) Anthony Fauci. Combine the four variables into a single scale that runs from 0 to 100.
If you have difficulty finding them, you can always use the lookfor command, e.g.: lookfor feeling
3a. Paste in the Stata commands from this part of your do-file so we can see how you created the Covid experts index. Then paste in the tabulation of your new variable. (e.g. tabulate my_var).
3b. Using the presidential vote choice variable (V202073) summarize the distribution of your spending index for people who voted for Biden compared to those who voted for Trump. Your answer should explain what the index variable measures, the range and meaning of the numeric values, the mean and standard deviation of the variable and anything interesting about the shape of the distributions. Do not paste any Stata results in this answer. Write it up as it would appear in a newspaper or political analysis blog.
Question 4. [5 points]
Are Americans who live with children more likely to support a ban on ‘assault-style rifles’?
Recode the variable V201567 to create a variable that classifies people as living with: no kids, one kid, or more than one kid. Remove missing values for that variable and for V202342.
Run a crosstab. Report the results in a smoothly worded paragraph. Results should tell us how support for a ban varies across categories of the children variable. For example, “among individuals who live with more than one child, _____ opposed a ban…”. Do not report results from all 9 cells in the cross tab. Be sure to answer the question posed above.
Question 5. [5 points]
Use the Quality of Government dataset. Select a subset of cases by doing the same thing as you did at the start of question 4.
First, set the random number seed by typing: set seed studentnumber (where you replace studentnumber with your numeric student number). Use the command sample: sample 170, count. Note that you are selecting 170 cases, not 1,500 as you did in Question 3.
Recode gol_pr to classify countries as majoritarian or proportional electoral systems (hint 1 thru 4 are majoritarian, 5 thru 28 are PR).
Run t-test of the difference in average proportion of legislators in national parliaments who are women (wdi_wip). Do this only for countries in Latin America (use ht_region and an if statement).
Report the two means, their difference, and the p-value from the t-test. Do all that in a nice smooth, informative paragraph where you start by explaining in a simple way what the research question is, then how the variable is measured/calculated, what the results are, and what they tell us.