Submit to the Assignment 10.1 link above.
Assignment Deliverables:
- Part A: Compare the means male and female applicants and minority and non-minority applicants performance on the objective tests and performance tests for adverse impact. Note the statistically significant findings.
- Part B: What can you conclude based on the results in Part A? What can’t you conclude?
- Part C: You conclude you must have a discussion with the evaluators from each of the three teams. Prepare for the conversation. Briefly discuss your approach to the conversation. Prepare an outline of questions you will ask during the discussion. Provide a rationale for each of your questions.
- Note: The point of this activity is to ensure that you do not jump to conclusions, presume a bias exists, and to structure the conversation to promote open communication and inquiry.
Scenario
You are on the leadership team at a multi-site manufacturing facility operating in eastern Nebraska.
Nan, the Vice President of Operations, walked into your office with an angry scowl. “I’ve had it!” She raises her head to her forehead. “I have had it up to here with these frivolous claims of discrimination here.”
“Are you sure they’re frivolous?”
She looked at you with angry incredulity. “What are you saying? We don’t tolerate bigots and misogynists here.”
You realize you made a potentially large slip with that comment. Quickly, you work out how to recover the conversation. “Of course not, Nan. And I apologize if I implied otherwise. But I know that sometimes the data can tell a different story. All I am saying is that it is possible that patterns can emerge from data that tell a story that we may not even be aware of. We should never discount a claim of adverse impact out of hand. That might be a recipe for a costly lawsuit. Why don’t you tell me a bit more about the issue?”
After a deep sigh, she nods and begins to tell you her problem. In the past two weeks, the company has received three complaints of pay discrimination against women and minorities, particularly as it pertains to employee selection.
After chatting for awhile, you and Nan agree that since an evaluation of the selection process hasn’t been evaluated in a while, an analysis of the process should be performed. You assure Nan that you’ll let her know what you find. She thanks you and leaves your office, leaving you to dwell on the problem.
Details
After working on the selection process for a time and reviewing the text of the complaints, it appears the largest concern centers on the applicant testing regimen. You conclude that you need to evaluate them. A data set from the most recent pool of hires is provided.
Adverse selection problem dataset.csv
Employee ID | Protected Sex/ Gender Identity | Minority Status | Field Evaluation Team | Objective Test Score | Performance Test Aggregate Score |
100005 | 0 | 1 | 2 | 12 | 16 |
100034 | 0 | 0 | 2 | 14 | 18 |
100078 | 0 | 0 | 2 | 20 | 18 |
100105 | 0 | 1 | 3 | 15 | 11 |
100118 | 0 | 1 | 2 | 20 | 17 |
100171 | 0 | 0 | 1 | 11 | 14 |
100350 | 0 | 0 | 2 | 14 | 15 |
100414 | 0 | 0 | 1 | 24 | 21 |
100428 | 0 | 1 | 3 | 16 | 13 |
100454 | 1 | 0 | 2 | 22 | 18 |
100461 | 1 | 1 | 2 | 15 | 19 |
100463 | 0 | 1 | 2 | 10 | 14 |
100525 | 0 | 1 | 2 | 21 | 14 |
100538 | 0 | 0 | 2 | 22 | 19 |
100600 | 0 | 0 | 1 | 15 | 19 |
100616 | 0 | 1 | 3 | 10 | 14 |
100633 | 0 | 0 | 3 | 14 | 18 |
100671 | 0 | 0 | 1 | 19 | 20 |
100750 | 0 | 0 | 2 | 18 | 13 |
100832 | 0 | 0 | 1 | 14 | 15 |
101135 | 1 | 1 | 3 | 20 | 15 |
101500 | 0 | 1 | 2 | 12 | 15 |
101502 | 1 | 0 | 2 | 22 | 19 |
101528 | 0 | 1 | 3 | 19 | 14 |
101560 | 0 | 0 | 1 | 20 | 18 |
101591 | 0 | 1 | 3 | 14 | 13 |
101770 | 0 | 1 | 1 | 15 | 21 |
101856 | 0 | 1 | 2 | 12 | 9 |
102257 | 0 | 1 | 3 | 22 | 17 |
102530 | 0 | 1 | 2 | 24 | 19 |
102659 | 0 | 1 | 3 | 20 | 17 |
102772 | 1 | 1 | 3 | 22 | 16 |
102876 | 0 | 0 | 3 | 25 | 17 |
102904 | 1 | 0 | 3 | 22 | 15 |
103183 | 0 | 1 | 2 | 24 | 18 |
103207 | 0 | 0 | 1 | 11 | 14 |
103491 | 1 | 0 | 2 | 24 | 19 |
103687 | 1 | 1 | 3 | 19 | 16 |
103869 | 0 | 1 | 2 | 11 | 16 |
103902 | 0 | 0 | 3 | 13 | 16 |
104071 | 0 | 0 | 2 | 21 | 20 |
104084 | 1 | 0 | 1 | 14 | 18 |
104092 | 0 | 0 | 1 | 14 | 16 |
104094 | 0 | 1 | 3 | 20 | 17 |
104112 | 0 | 1 | 3 | 14 | 17 |
104133 | 0 | 0 | 1 | 14 | 18 |
104177 | 0 | 1 | 3 | 14 | 16 |
104191 | 1 | 0 | 1 | 22 | 18 |
104243 | 0 | 1 | 1 | 23 | 20 |
104290 | 1 | 0 | 1 | 23 | 17 |
104301 | 0 | 1 | 3 | 25 | 20 |
104318 | 0 | 1 | 3 | 19 | 15 |
104395 | 1 | 1 | 3 | 21 | 16 |
104431 | 0 | 0 | 1 | 18 | 16 |
104438 | 0 | 0 | 2 | 23 | 21 |
104458 | 0 | 0 | 1 | 24 | 20 |
104461 | 1 | 1 | 1 | 15 | 14 |
104490 | 0 | 0 | 1 | 15 | 17 |
104491 | 1 | 1 | 2 | 18 | 19 |
104495 | 1 | 0 | 2 | 12 | 17 |
104558 | 0 | 0 | 1 | 23 | 19 |
104583 | 0 | 0 | 1 | 15 | 20 |
104585 | 0 | 1 | 3 | 11 | 14 |
104586 | 0 | 1 | 2 | 14 | 20 |
104600 | 0 | 0 | 1 | 23 | 23 |
104601 | 0 | 0 | 3 | 23 | 21 |
104622 | 0 | 0 | 1 | 16 | 21 |
104654 | 1 | 1 | 1 | 22 | 18 |
104665 | 0 | 1 | 3 | 22 | 18 |
104669 | 1 | 0 | 1 | 12 | 15 |
104672 | 1 | 1 | 3 | 20 | 16 |
104745 | 0 | 1 | 2 | 12 | 16 |
The testing regimen is conducted in two phases: an objective test of cognitive ability and a five part performance test that includes measures of (interpersonal behaviors, preparation of paperwork, safety, the use of tools and mechanical aptitude, and the maintenance of tools). The objective test is performed virtually via an online portal with a maximum score of 25 points. The performance test is performed by three (3) geographically dispersed three-person teams (see map below). On each team, there is one evaluator responsible for assessing behaviors, one evaluator responsible for evaluating quality of paperwork and safe practices, and one evaluator responsible for assessing tool use and maintenance. Each phase of the performance test was ranked on a 5 point ordinal scale of performance ranging from “5 – Excellent”, “4 – Very Good”, “3 – Good”, “2 – Standard” down to “1” denoting “Substandard” performance. The five test scores are summed to create an aggregate performance based score with a maximum of 25 points.
Problem Guidance
Adverse Selection Problem Guidance
This problem can be hard to understand at the outset. That is perfectly normal. As such, this note is intended to provide guidance for the assignment.
First, the dataset you have been provided is somewhat close to what you might be expected to see in a real life scenario.
- The first column is an employee ID (or what is often designated a participant ID). This is simply a piece of nominal data that is a reference value to the participant’s record. The employee ID does not have any use for you in terms of analysis, but can be used for a number of business purposes, such as follow on interviews or training.
- Columns B and C are designations for protected sex and minority status. They are filled with what are known in programming as Boolean operators, where 1 = true and 0 = false. For column B, for example, a value of 1 indicates that the participant has a protected sex/gender status. A zero indicates not having that status. Similarly, in column C a 1 indicates the participant has a protected minority status and a 0 indicates non-minority.
- Column D indicates the training team that evaluated the performance test for the participant. There are three teams assigned to certain geographic areas indicated on the map posted on Blackboard.
- Column E indicates the participant score on an objective, multiple choice test that was required of all candidates.
- Column F indicates the participant score on the performance test, observed by the training team.
Step 1:
Your task is essentially to evaluate if any statistically significant differences exist between the various subgroups across protected sex and minority status on either the objective or performance tests for the company. Assume an alpha level of 0.05.
This will require you manipulate the data set to separate the dataset by sex and then by minority status. (Excel’s Sort function may come in very handy here.) NOTE: Do not in this first step break down groups further by training team. You may need to do so later, but not in this step.
A word of caution, it is highly advisable to never work on the raw dataset. Always save the data in a new file before manipulating. That way, if you make a mistake, you can always revert to the original data file.
Step 2:
After sorting, you will be able to easily break up the list into two distinct groups (protected sex vs. non-protected and later minority and non-minority). You will then be able to compare these subgroups using tools explored in previous weeks. Remember that disparate outcomes are not necessarily evidence of disparate treatment. A significant effect only indicates that a difference exists. It doesn’t tell you why it exists.
Hopefully, you will find no significant differences between these groups on either the objective or performance tests. However, if you do…
Step 3:
One possible difference, if the difference is on the performance test, could be a difference in the training team. If you find a difference, you might need to dig down further by comparing the differences in scores across each of the three training teams. This would mean comparing Team 1 and Team 2, Team 2 and Team 3, and Team 1 and Team 3.
(Incidentally, there is another statistical tool called ANOVA that can perform this same analysis in one step, but we will not cover that in this course.)
Step 4:
Once you’ve concluded where the data suggests differences exist (if any), write up what remedies you would propose. From there you will have access to part two of the assignment.
So, although this might seem a bit complicated. You will only need to perform several iterations of the same statistical tool. So I hope you won’t find it as difficult as it seems at first glance.