# Statistics for Social Science

Lead Author(s): **Stephen Hayward**

Student Price: **Contact us to learn more**

Statistics for Social Science takes a fresh approach to the introductory class. With learning check questions, embedded videos and interactive simulations, students engage in active learning as they read. An emphasis on real-world and academic applications help ground the concepts presented. Designed for students taking an introductory statistics course in psychology, sociology or any other social science discipline.

**8,525 students**

## What is a Top Hat Textbook?

Top Hat has reimagined the textbook – one that is designed to improve student readership through interactivity, is updated by a community of collaborating professors with the newest information, and accessed online from anywhere, at anytime.

- Top Hat Textbooks are built full of embedded videos, interactive timelines, charts, graphs, and video lessons from the authors themselves
- High-quality and affordable, at a significant fraction in cost vs traditional publisher textbooks

## Key features in this textbook

## Comparison of Social Sciences Textbooks

Consider adding Top Hat’s Statistics for Social Sciences textbook to your upcoming course. We’ve put together a textbook comparison to make it easy for you in your upcoming evaluation.

### Top Hat

Steve Hayward et al., Statistics for Social Sciences, Only one edition needed

### Pearson

Agresti, Statistical Methods for the Social Sciences, 5th Edition

### Cengage

Gravetter et al., Essentials of Statistics for The Behavioral Sciences, 9th Edition

### Sage

Gregory Privitera, Essentials Statistics for the Behavioral Sciences, 2nd Edition

### Pricing

Average price of textbook across most common format

#### Up to 40-60% more affordable

Lifetime access on any device

#### $200.83

Hardcover print text only

#### $239.95

Hardcover print text only

#### $92

Hardcover print text only

### Always up-to-date content, constantly revised by community of professors

Content meets standard for Introduction to Anatomy & Physiology course, and is updated with the latest content

### In-Book Interactivity

Includes embedded multi-media files and integrated software to enhance visual presentation of concepts directly in textbook

Only available with supplementary resources at additional cost

Only available with supplementary resources at additional cost

Only available with supplementary resources at additional cost

### Customizable

Ability to revise, adjust and adapt content to meet needs of course and instructor

### All-in-one Platform

Access to additional questions, test banks, and slides available within one platform

## Pricing

Average price of textbook across most common format

### Top Hat

Steve Hayward et al., Statistics for Social Sciences, Only one edition needed

#### Up to 40-60% more affordable

Lifetime access on any device

### Pearson

Agresti, Statistical Methods for the Social Sciences, 5th Edition

#### $200.83

Hardcover print text only

### Pearson

Gravetter et al., Essentials of Statistics for The Behavioral Sciences, 9th Edition

#### $239.95

Hardcover print text only

### Sage

McConnell, Brue, Flynn, Principles of Microeconomics, 7th Edition

#### $92

Hardcover print text only

## Always up-to-date content, constantly revised by community of professors

Constantly revised and updated by a community of professors with the latest content

### Top Hat

Steve Hayward et al., Statistics for Social Sciences, Only one edition needed

### Pearson

Agresti, Statistical Methods for the Social Sciences, 5th Edition

### Pearson

Gravetter et al., Essentials of Statistics for The Behavioral Sciences, 9th Edition

### Sage

Gregory Privitera, Essentials Statistics for the Behavioral Sciences, 2nd Edition

## In-book Interactivity

Includes embedded multi-media files and integrated software to enhance visual presentation of concepts directly in textbook

### Top Hat

Steve Hayward et al., Statistics for Social Sciences, Only one edition needed

### Pearson

Agresti, Statistical Methods for the Social Sciences, 5th Edition

**Pearson**

Gravetter et al., Essentials of Statistics for The Behavioral Sciences, 9th Edition

### Sage

Gregory Privitera, Essentials Statistics for the Behavioral Sciences, 2nd Edition

## Customizable

Ability to revise, adjust and adapt content to meet needs of course and instructor

### Top Hat

Steve Hayward et al., Statistics for Social Sciences, Only one edition needed

### Pearson

Agresti, Statistical Methods for the Social Sciences, 5th Edition

### Pearson

Gravetter et al., Essentials of Statistics for The Behavioral Sciences, 9th Edition

### Sage

Gregory Privitera, Essentials Statistics for the Behavioral Sciences, 2nd Edition

## All-in-one Platform

Access to additional questions, test banks, and slides available within one platform

### Top Hat

Steve Hayward et al., Statistics for Social Sciences, Only one edition needed

### Pearson

Agresti, Statistical Methods for the Social Sciences, 5th Edition

### Pearson

Gravetter et al., Essentials of Statistics for The Behavioral Sciences, 9th Edition

### Sage

Gregory Privitera, Essentials Statistics for the Behavioral Sciences, 2nd Edition

## About this textbook

### Lead Authors

#### Steve HaywardRio Salado College

A lifelong learner, Steve focused on statistics and research methodology during his graduate training at the University of New Mexico. He later founded and served as CEO of Center for Performance Technology, providing instructional design and training development support to larger client organizations throughout the United States. Steve is presently lead faculty member for statistics at Rio Salado College in Tempe, Arizona.

#### Joseph F. Crivello, PhDUniversity of Connecticut

Joseph Crivello has taught Anatomy & Physiology for over 34 years, and is currently a Teaching Fellow and Premedical Advisor of the HMMI/Hemsley Summer Teaching Institute.

### Contributing Authors

#### Susan BaileyUniversity of Wisconsin

#### Deborah CarrollSouthern Connecticut State University

#### Alistair CullumCreighton University

#### William Jerry HauseltSouthern Connecticut State University

#### Karen KampenUniversity of Manitoba

#### Adam SullivanBrown University

## Explore this textbook

Read the fully unlocked textbook below, and if you’re interested in learning more, get in touch to see how you can use this textbook in your course today.

# Introduction to Hypothesis Testing

It doesn't matter how beautiful your theory is, it doesn't matter how smart you are. If it doesn't agree with experiment, it's wrong. - Richard Feynman, Educator and Physicist

- What is a Hypothesis?
- The Deductive/Inductive Process
- Elements of an Experiment
- The Scientific Method
- Elements of a Hypothesis
- Preparing to Write a Hypothesis Statement
- The Cutoff Score
- The Alpha or Significance Level
- The
*p*-Value Approach - Effect Size
- Cohen’s
*d*as a Measure of Effect Size - Decision Error
- Assumptions
- Power of a Test
- Steps of Hypothesis Testing
- Case Study: Attention Span

## Chapter Objectives

Upon completing this chapter, you will be able to:

- Correctly formulate and state null and alternative hypotheses
- Identify Type I and Type II errors and interpret the level of significance
- Make and interpret a decision based on the outcome of a statistical test
- Interpret
*p*-values to make a statistical decision - Calculate effect size using Cohen’s
*d* - Interpret the power of a statistical test

Hypothesis testing is something we routinely use every day, usually without thinking about it! You might weigh yourself (test) to see if your weight is lower in the morning compared to the afternoon. Given a closed container, you lift it to sample the weight and reject the hypothesis that it is empty. You might even test to see whether tequila or beer leads to a worse hangover, or whether acetaminophen or ibuprofen provides a quicker relief from the ensuing headache.

All of these things involve elements of hypothesis testing. In this chapter, you’ll learn more about the foundations of hypothesis testing and further hone your developing statistical skills. Hypothesis testing is not only key to making inferences about events in everyday life, but also in most any professional role in the social sciences. In a way, it is more about learning and adapting a mindset that builds on and in turn supports, critical thinking skills.

Hypothesis testing has a long history as part of the Scientific Method, which was used even in ancient times by scholars of the day. It was first documented in more or less its present form by Sir Francis Bacon in the late 16th or early 17th century.

## What is a Hypothesis?

A **hypothesis**, by definition, is a tentative statement about a relationship or relationships between or among two or more variables. It is tentative because it expresses what a researcher expects to happen (or hopes will happen) in a given situation.

A particular experiment might be one of a kind, or it might be one of several on a related topic, where experimenters attempt to narrow down the possible range of causal effects through a process of testing and elimination. The hypothesis sets the stage for this exploration.

### Hypothesis Testing

**Hypothesis testing** is a statistical procedure for testing whether a research hypothesis provides a plausible explanation for experimental findings. After a hypothesis regarding a population parameter is proposed, sample statistics are used to assess the probability that the hypothesis in fact represents the true state of affairs. The process actually involves creating two competing hypotheses that are structured so as to be able to provide a test of the original research hypothesis. The overall process is based on probability theory and the Central Limit Theorem. Those topics are covered in the early chapters of the text to provide a foundation for what is to come later, meaning now.

## The Deductive / Inductive Process

Hypothesis testing involves a cycle of deductive reasoning, observation and inductive reasoning:

### Core Logic

The core logic of hypothesis testing is built around the concept of *proof by contradiction*. Two competing hypotheses are initially proposed; data is then gathered and statistical tests carried out to determine which of the two hypotheses can be rejected or supported based on that data analysis.

Hypothesis testing considers the probability that the outcome of a study could have occurred in the absence of any effect of the experimental procedure or any difference between the populations being measured. If this probability is low, the researcher may conclude that the observed differences are most likely a result of the experimental procedure or reliable differences between the populations.

### Rationale

When interpreting results (data) from experimental studies, there is always the question of whether the observed results were actually and totally an effect of the manipulation of the independent variable. The question arises because it is impossible to say with 100% certainty that an effect in an experimental situation is due to a particular cause; it is never possible to completely rule out, or “control for,” the possible effect of **extraneous variables **and other sources of error or bias. The closest a researcher can come to “proving” an effect is to be able to claim that an observed effect or outcome is reliable with some known (or at least estimated) margin of error. Since there is always some probability of error, an experimental outcome never actually “proves” anything, but instead provides incremental support for the research hypothesis.

### Lady Tasting Tea

In 1920s Cambridge, England, something momentous happened. This video chronicles how a tea-drinking British lady inspired Sir Ronald Fisher, whom we know for his relationship to the *F*-test, to think about probability and the likelihood of outcomes. Could a series of correct answers be due only to guessing? How to tell?

## Elements of an Experiment

Not all statistical analyses involve data from experiments, but the experimental paradigm provides a good frame of reference for a starting point. To provide a framework for testing the relationship suggested by the research question, the question must first be restated as a testable research hypothesis and a complementary null hypothesis. The hypotheses are clearly stated (at least they should be), and the conditions concerning how and what data should be collected are also clearly set forth. Once you’ve become familiar with the basic experimental process, it is easy enough to generalize to other situations involving the collection and analysis of data.

Key elements, or components, of an experiment include:

- A research hypothesis that summarizes the outcome to be tested.
- A null hypothesis that is complementary to the research hypothesis and summarizes the state of affairs if the research hypothesis, a.k.a. the alternative hypothesis, cannot be supported.
- Descriptions of the variables being tested, including the independent variable (IV), the dependent variable (DV), and an explanation of control measures for extraneous variables that have the potential to become confounding variables.
- A description of the population of interest and how this population will be represented in the experiment.
- Identification of the experimental group and the control group.
- A statement defining how the results are to be measured and interpreted, i.e., what constitutes a reliable outcome.

The core logic of hypothesis testing is built around the concept of _______.

proof of an effect due to the manipulation of the IV

the experimental method

proof by contradiction

the scientific method

The null hypothesis is stated so as to be _________ to the alternative hypothesis.

Which of the following are key components of an experiment? (Select all that apply)

A null hypothesis and an alternative hypothesis

A statement defining how the results are to be measured and interpreted

Identification of the population of interest

Descriptions of the variables being tested

## The Scientific Method

The **Scientific Method** provides the model for conducting research. It sets forth the process whereby researchers investigate phenomena and therefore develop new knowledge, correct errors and mistakes, and test theories. The steps of the scientific method include:

**Step 1: **Ask a question.

All scientific studies, whether concerning the physical or the social sciences, begin with an unanswered question about some observable phenomena or a problem to solve.

**Step 2: **Do background research.

Social scientists often consult peer-reviewed journal articles describing previous research to see what other researchers have found in similar circumstances or what other explanations have been suggested to answer the question at hand.

**Step 3: **Construct a hypothesis.

A hypothesis is basically an educated guess about a relationship stated such that the proposed explanation can be tested, usually by conducting an experiment and/or analyzing existing data.

**Step 4: **Test the hypothesis.

An experiment may be designed to test whether the proposed explanation fits with the outcome that is observed, or data may be collected from other research to see if those data can also be applied to answer the present research question.

**Step 5: **Analyze the data and draw conclusions.

Statistical analysis of the data should be carried out to determine whether the data do in fact provide reliable evidence either in support of or not in support of the explanation proposed in the hypothesis statement. If not, a new hypothesis may be formulated and tested.

**Step 6: **Communicate the results.

It’s important that the results be communicated in context so that other scientists can benefit from the new knowledge and use it to support additional investigations that further develop the topic. This brings the process full circle back to stages 1 and 2.

As you can see, hypothesis testing is a key element of the scientific method! In fact, hypothesis testing really involves all the stages of the method.

Constructing a testable hypothesis is one step in the *_____ *______.

Sort these items into the correct order according to the scientific method.

Ask a question

Test the hypothesis

Construct a hypothesis

Do background research

Analyze the data and draw conclusions

Communicate the results

### The Competing Hypotheses

The intelligence community (think C.I.A.) has developed elaborate methods of testing **competing hypotheses** to ensure that all possible explanations for the often thorny and always very complex problems they are routinely confronted with are taken into account. In fact, there is an entire model developed for this use, termed ACH or Analysis of Competing Hypotheses. We do not need to take things to such an extreme here, but we do find it useful to establish competing hypotheses when trying to answer a research question.

Identification of the competing hypotheses is actually a critical first step in the scientific method. This step lays the groundwork for the process that we’ve termed “proof by contradiction.” Having said that, it is also worth repeating that any one experiment does not provide “proof” of a cause-and-effect relationship—the most that can be concluded is that any given hypothesis is either rejected or supported. That may seem like a fine point, but it is nevertheless an important one to keep in mind.

The “competing” hypotheses that you will learn to use here are

- the null hypothesis and
- the alternative hypothesis

### The Null Hypothesis

The **null hypothesis**, represented as H_{0}, is a statement about a population parameter that is assumed to be true unless there is convincing evidence to the contrary. This is the cornerstone of the proof by contradiction process that was identified as the core logic of hypothesis testing.

The null hypothesis states that the expected difference between the groups being compared is nonexistent. It is sometimes referred to as the “no difference” hypothesis; the “null” hypothesis is so called because it describes a null difference in the outcome. (Null means *without value*, *effect*, *consequence *or *significance*.)

The process of proof by contradiction provides incremental support for the alternative to the null hypothesis by rejecting the (null) hypothesis that claims the expected difference between the groups being compared is nonexistent. If this “no difference” statement can be rejected, then support can be inferred for the alternative, i.e., that there is a real or reliable difference between the groups.

### The Alternative Hypothesis

The **alternative hypothesis**, represented as H_{1}, is a statement that is directly contradictory to the null hypothesis. It expresses the alternative to the null, thus the name. It is the statement that summarizes the expected or predicted outcome of the investigation at hand. For that reason, it is also frequently referred to as the **research hypothesis**.

### Complementary and Mutually Exclusive Events

In another sense, the alternative hypothesis is the *complement *to the null hypothesis. They are said to be **complementary**, because if you were to sum the probability of the outcomes of the two hypothesis statements, the total would equal one. They represent **mutually exclusive **outcomes since if one is false, the other must be true.

### What is Tested

In most cases, what is actually being tested is not the alternative hypothesis but rather the null hypothesis. The logic is that if the null hypothesis can be rejected, then support accrues to the alternative hypothesis. The research does not directly test the alternative hypothesis, and the alternative hypothesis is never “proved” in a single experiment (or disproved, for that matter).

The purpose of hypothesis testing is to establish the truth of the research hypothesis.

True

False

In an experimental situation, support is said to accrue to the alternative hypothesis under what condition?

The alternative is unable to be rejected.

The null is proven to be false.

The alternative is proven to be true.

The null is rejected.

## Elements of a Hypothesis

A complete, testable hypothesis includes several defining elements:

- It explicitly identifies the group or groups being compared.
- It specifies the level of measurement to be used
- It states the direction of the effect or the nature of the expected difference in terms of the dependent variable.

### Stating the Alternative Hypothesis

The alternative hypothesis is a *completely stated *prediction about an expected difference. It is stated in terms of measurable outcomes and spells out precisely what differences are expected to be found between the groups that are being compared.

### Directionality

When comparing two outcome values, the difference specified in the hypothesis may be either directional or non-directional.

Directionality is important because it will determine whether the test of significance is to be one-tailed or two-tailed, and it can also have serious implications for the interpretation of the results.

- A
**directional hypothesis states**that one measure will be more than or less than a comparison measure. It specifies the direction of the expected difference.

The hypothesis may predict that population 1 will score higher or lower than population 2 on some measure, in which case it is directional—it specifies the direction of the expected difference in the measurements being taken. It argues that the results will be positive or negative, in terms of higher scores or lower scores, or more or less of something. In other words, it is a directional change.

- A
**non-directional hypothesis states**that two measures will be different from each other, but does not specify the direction of the difference.

The hypothesis may simply predict that the two populations will be different on some measure, without specifying a direction of that difference, in which case it is non-directional; it anticipates a difference without specific concern for the directionality of the difference.

**The majority of real-world research hypotheses are non-directional in nature! **

### Writing the Alternative Hypothesis

The form of the hypothesis statement may vary slightly depending on the nature of the comparisons to be made.

### Experimental Study

Recall that an experimental study typically involves the introduction or manipulation of an independent variable (IV), sometimes referred to as the predictor variable, while measures are taken on a dependent variable (DV) to determine what changes are observed as a result of the manipulation of the IV. Active manipulation of the IV is a defining feature of an experimental study.

The hypothesis should state who or what is being measured (the population(s) of interest), what is being manipulated (the IV), what is expected to change as a result of that manipulation (the DV), what the expected change is (i.e., “higher,” “lower,” or “different from”), and how that expected change is to be measured (test scores, or whatever).

### Non-Experimental Study

Non-experimental research most often relies on pre-existing differences in an independent (predictor) variable and may make comparisons relating those differences to measures on a dependent variable. In a non-experimental study, the IV may be said to be “passively” manipulated. (See example 2 below.)

### Action Statement

The alternative hypothesis is usually in the form of an action statement, and is written as a complete sentence. Inferences about the results should be reserved to the conclusion section of the research report.

**Example 1: Experimental and Directional Hypothesis Statement**

A developer of video training materials wants to see whether a video coaching program will help students score higher on the GRE college placement exams than students who do not receive the coaching support. The elements of the comparison are:

**Population 1:**Students who receive video coaching**Population 2:**Students in general**Independent variable:**Video coaching**Dependent variable:**Scores on the GRE**Expected outcome:**Population 1 score > population 2 score**Summary statement:**H_{1}: µ_{1}> µ_{2}

*Incorrect alternative hypothesis: *

Video coaching improves scores on the GRE. (Stated as a conclusion, not a hypothesis, and does not identify the populations.)

*Correct alternative hypothesis: *

College students who receive video coaching will score higher on the GRE than students who do not receive video coaching.

**Example 2: Non-Experimental and Non-Directional Hypothesis Statement**

An admissions counselor at a large university is interested in whether admitted students from private preparatory schools tend to score differently on the GRE than students from public high schools. The elements of the comparison are:

**Population 1:**Students from private preparatory schools**Population 2:**Students from public high schools**Independent variable:**Type of school graduated from**Dependent variable:**Scores on the GRE**Expected outcome:**Population 1 score ≠ population 2 score**Summary statement:**H_{1}: µ_{1}≠ µ_{2}

*Incorrect alternative hypothesis: *

Type of school affects performance on the GRE. (Stated as a conclusion, not a hypothesis, and does not identify the populations.)

*Correct alternative hypothesis: *

Students from private preparatory schools will score differently on the GRE than students from public high schools.

What is the difference between an experimental and non-experimental study?

In a non-experimental study the IV is actively manipulated while in an experimental study the DV is actively manipulated.

A non-experimental study does not actually test a hypothesis statement.

In an experimental study the IV is actively manipulated while in a non-experimental study the IV is not actively manipulated.

They are different names for the same testing process.

Most real-world research hypotheses are _________ in nature.

A complete, testable hypothesis includes several defining elements, including which of the following? (Select all that apply)

It explicitly identifies the groups being compared

It specifies the scale of measurement to be used

It states the direction of the effect or the nature of the expected difference in terms of the dependent variable

### Stating the Null Hypothesis

The null hypothesis is the logical opposite of the alternative hypothesis, and is basically a statement of no effect—it summarizes the outcome in the case that the experimental treatment has no effect or the observed effect is weak and not reliable as a predictor of future outcomes. If you “combined” the alternative and null hypotheses, you would have covered all of the possible outcomes, so they are complementary in a very real sense.

### Directionality

An issue that frequently crops up is a misstatement of the null when the alternative hypothesis is directional in nature—i.e., when it predicts a “less than” or “more than” outcome instead of a “different from” outcome. The confusion is due to the null being considered as the “no effect” outcome. When writing the null statement, researchers often simply state the null as the condition where the experimental group’s outcome as measured on the variable of interest is the same as the control group’s outcome, i.e., “no difference.”

This “no difference” statement may be correct for a non-directional hypothesis—one in which the researcher is predicting a difference but not specifying the direction of the difference—but it is not correct for a directional hypothesis. If the alternative hypothesis is directional, the null should predict that the populations are not different *in the way predicted* by the alternative hypothesis.

While this may seem like a minor difference, the difference could be important in the real world of hypothesis testing. The thing to remember is that a directional hypothesis requires a different null statement than a non-directional hypothesis, and the two statements together must be complementary.

Here is a decision table to use:

### Writing the Null Hypothesis

If the alternative hypothesis is represented as: H_{1}: µ_{1} < µ_{2} (a one-tailed test) then the correct null hypothesis must be: H_{0}: µ_{1} ≥ µ_{2}

If the alternative hypothesis is represented as: H_{1}: µ_{1} ≠ µ_{2} (a two-tailed test) then the correct null hypothesis must be: H_{0}: µ_{1} = µ_{2}

**Examples**

*Directional alternative hypothesis: *

College students who receive video coaching will score higher on the GRE than students who do not receive video coaching.

*Incorrect null hypothesis: *

College students who receive video coaching will score the same on the GRE as students who do not receive video coaching. (What if the students score significantly lower on the test? What would your conclusion be then?)

*Correct null hypothesis: *

College students who receive video coaching will score equal to or lower than students who do not receive video coaching on the GRE.

- Null outcome: Population 1 score ≤ population 2 score
- Summary statement: H
_{0}: µ_{1}≤ µ_{2}

Given the statement, H1: µ1 > µ2, what is the correct null hypothesis?

H0: µ1 = µ2

H0: µ1 ≤ µ2

H0: µ1 ≥ µ2

H0: µ1 ≠ µ2

Given a research hypothesis that the average test score of students in online class sections will be different from the average test score of students in classroom-based sections, identify the correct null hypothesis:

H0: µ1 = µ2

H0: µ1 ≤ µ2

H0: µ1 ≥ µ2

H0: µ1 ≠ µ2

The null hypothesis is considered as the ______ outcome.

no effect

significant effect

alternative effect

## Preparing to Write a Hypothesis Statement

As you prepare to write a set of hypothesis statements, here are some things to keep in mind:

**Step 1: **Identify the populations.

State what population is represented by the sample(s) being studied and what population the comparison distribution is drawn from. Inferences drawn from the results must apply specifically to these populations of interest.

**Step 2: **Identify the independent and dependent variables and operationally define them.

Clearly state what is being manipulated (the* independent variable*) to test the response in a way that operationally defines it.

Identify what it is that is expected to change as a result of the manipulation (the *dependent variable)* and state how this change is to be measured and recorded. For example, it could be a score on a test, a response time, or various other measurable events. Generally speaking, it must be “countable” somehow.

**Note:** To **operationally **define something is to spell out how it is to be used or measured in a particular instance.

**Step 3: **Identify the Effect.

State the expected outcome in terms of the dependent variable. What is the nature of the change that is expected to occur in the dependent variable?

### The Comparison Distribution

The whole point of going through the process of constructing hypothesis statements is to be able to identify exactly what comparisons to make and how to make them. In essence, we are identifying two distributions, one of which is (in most scenarios) expected to be statistically significantly different from the other in some respect.

The **comparison distribution** is the statistical distribution to which the results of a study are to be compared. It represents the situation or the true state of affairs in the case where the null hypothesis is true.

The comparison distribution provides the “baseline” to which the measure being taken can be compared. This allows researchers to determine whether it is reliably different from a similar measure, or has been changed in some measurable way as a result of the experimental treatment or in terms of some other expected difference in its characteristics.

The comparison distribution takes different forms depending on the nature of the comparison to be made and the test to be carried out. Some of these tests have not been discussed yet, so you might save this for reference later.

**Example**

A researcher was interested in determining whether corporate executives who exercised regularly scored lower than corporate executives in general on a scale of stress symptoms. For corporate executives in general, the average on this test is 80 with a standard deviation of 12. The researcher then measured 20 executives who exercised and found them to have a mean score of 72. The statistical test would be to determine whether this difference is significant.

In this example, the set of scores of corporate executives in general can be assumed to be normally distributed. The distribution to which the scores of executives who exercise will be compared is a distribution of individual scores, i.e., the distribution of scores of the general population of executives.

What is the function of the comparison distribution?

It provides the statistical distribution to which the results of a study are to be compared.

It represents the situation if the alternative hypothesis is true.

It is significantly different from the experimental distribution.

The appropriate comparison distribution for comparing an individual score to a known population parameter is a distribution of sample means.

True

False

The appropriate comparison distribution to use when comparing a sample mean to a known population mean is a distribution of ______.

## The Cutoff Score

In order to make a decision about almost anything, there first has to be a decision point. For example, at what point do you decide to hang it up and quit studying for the day, or to email your professor and ask for additional time on an assignment that's due tomorrow? Most every decision is keyed by a decision point, whether it’s fatigue from hitting the books, a looming deadline, or anything else.

In carrying out statistical analyses and making decisions about probabilities associated with experimental (and non-experimental) outcomes, the decision point is known as the cutoff score. It is the point at which we are willing to say that the outcome was most likely due to the experimental manipulation of the IV, but with some predetermined (known) probability that the conclusion could be wrong.

The **cutoff score** is the “**critical value**” that marks certain areas of the comparison distribution used as a reference for **tests of significance**.

*Z*-scores and the areas they define can be looked up in the Standard Normal Table. The table shows the proportion of scores in the distribution that can be expected to fall above or below a given *z*-score.

A *z*-score that cuts off a given proportion of the distribution’s scores in one of the “tails” of the distribution acts as a marker for that area. It is the cutoff score, or critical value, that “cuts off” a given area under the normal curve.

Other distributions, such as the *t*-distribution, chi-square and the *F*-distribution, can have cutoff scores identified in a similar manner, except that the tables are set up differently.

## The Alpha or Significance Level

The **significance level** of a hypothesis test is represented by **alpha **(**α**) and corresponds to the size of the rejection region. A test with a significance level or alpha of .05, for example, would use a cutoff score that cuts off the most extreme 5% of the distribution. A score that falls into that area is likely to occur less than 5% of the time if the null hypothesis is true, and therefore provides evidence to support the alternative hypothesis. The rationale is that the probability of the result is so low that we are willing to say it is most likely due to an effect of the experimental manipulation or, in the case of a non-experimental outcome, due to pre-existing differences in the populations.

You will often see the significance level expressed as, for example, α =.05.

### One-Tailed vs. Two-Tailed Tests

Hypotheses tests are often characterized according to the location of the rejection region that will be used in making a decision about the outcome of the test. Since a critical value cuts off the most extreme area of the distribution (that area having been determined by the setting for alpha), it is always in one or both tails of the distribution.

### One-Tailed Test

An alternative hypothesis that predicts a “more than” outcome or a “less than” outcome—i.e., predicts a higher or lower score on whatever measure is being reported—leads to a one-tailed test. Since the predicted outcome is in a specific direction, only a result in that direction can justify rejection of the null hypothesis, and the test is therefore confined to that specific “tail” of the comparison distribution.

In a one-tailed test, the entire area set by alpha is assigned to one tail of the distribution.

- Right-Tailed vs. Left-Tailed Test

A one-tailed test (directional hypothesis) can be further defined as either right-tailed or left-tailed, depending on the direction of the difference predicted by the alternative hypothesis. A prediction of a higher score, or more of something, leads to a right-tailed test, while a prediction of a lower score, or less of something, leads to a left-tailed test.

### Two-Tailed Test

A hypothesis that predicts a “different from” outcome and does not specify a direction of the effect leads to a two-tailed test, because an outcome in either direction can justify rejection of the null.

In a two-tailed test, the area set by alpha is split between the two tails of the distribution.

Here is a decision table to use:

Which of the following determines alpha?

The probability of rejecting the null hypothesis

The size of the rejection region

Whether the test is one-tailed or two-tailed

Whether the result will be significant

A research hypothesis that makes a prediction of a lower score on a measure of the DV would require a ________.

Right-tailed test

Two-tailed test

Left-tailed test

Non-directional test

A two-tailed test is appropriate for a hypothesis that does not predict a $\_\_\_\_\_$ outcome.

different from

more than or less than

significantly different

### The Rejection Region

The area that lies beyond the cutoff score in the extreme tail of the distribution is the rejection region. Below is a graph showing the relationship of the critical values, the rejection regions, and the non-rejection region.

### To Reject or Not to Reject

As you’ve seen, **alpha **(α) sets the significance level of the test and determines the size of the rejection region. Most often, in psychology experiments, α values are set as either .05 or .01, meaning that the null will be rejected if the observed outcome is likely to have occurred less than 5% or 1% of the time if the null hypothesis is actually true.

There are two approaches to making a decision to reject the null hypothesis. One involves using test statistics and critical values. The second involves comparing probability values, or *p*-values, to the size of the rejection region.

### The Critical Value Approach

A test statistic (*z*-score, *t*-statistic, etc.) that exceeds the predetermined critical value provides evidence for rejecting the null hypothesis in favor of supporting the alternative hypothesis.

### Rejection Charts

Here is a set of rejection charts, based on *z*-scores, showing rejection regions bounded by cutoff scores for two common values of alpha.

## The *p*-Value Approach

The **p****-value **is the calculated probability of obtaining a result at least as extreme as the one observed if the null hypothesis is true. It is determined by calculating the test statistic and comparing that to a table of probabilities or using a technological tool to calculate the associated probability. If the *p*-value of the statistic is less than α, which sets the size (area) of the rejection region, the null hypothesis is rejected. The rationale is that if the probability of the result occurring if the null hypothesis is true is less than the limit set by alpha, the null hypothesis can be rejected.

*p*-Value Logic

Consider that there are two possible outcomes—the *p*-value is either ≤ α or > α.

Here is a decision table to use:

Experiment with the settings in the demonstration below to see the effect of varying levels of significance (alpha) and the values of the test statistic on the *p*-value.

### Examples

**Example 1: Critical Value Example**

A health professional gathers data from a large sample of medical service users in order to compare their average days in-hospital to a population with a known mean and standard deviation. She planned to use an alpha level of .01 for the comparison and calculated the *z*-score of the sample mean to be 0.074. She could not reject the null hypothesis stating that the populations were the same, a two-tailed test, since the test statistic value of 0.074 was not more extreme than the critical value of 2.5745, which defines the boundary of the rejection region.

**Example 2: ****p****-Value Example**

A researcher conducted an experiment with a one-tailed alpha level of .05. The *p*-value of the outcome was calculated to be .035. This outcome is likely to have occurred no more than 3.5% of the time if the null hypothesis were true, which is below the limit of 5% set by alpha, so the null can be rejected in favor of supporting the alternative hypothesis.

### Statistical Significance vs. Practical Significance

A downside of conventional hypothesis testing based on either the critical value approach or the *p*-value approach is that we may get a statistically significant outcome that has little or no practical importance. This is especially likely when the research is conducted with a very large sample size. A larger sample will always be more likely to provide statistically significant results than the same research done with a smaller sample, just because of the way the statistics are calculated. That leaves the question of “does it matter?”

With a two-tailed test, the rejection region is __________.

Larger than for a one-tailed test

Two times alpha

Split between the two tails of the distribution

Dependent on sample size

The *p*-value of an outcome is the probability of obtaining this result given that the null hypothesis is _________.

Correctly stated

True

Not true

Proven

Along with whether a result has statistical significance, a researcher should also be concerned with whether the result has $\_\_\_\_\_\_\_\_$ significance.

false

borderline

practical

theoretical

## Effect Size

**Effect size** is really just a way of quantifying the size of the difference between two groups so that we can better judge the practical significance of the results. It has the advantage of being easy to calculate and easy to understand. It helps to quantify the effectiveness of a particular treatment or intervention relative to a comparison group. In doing so, it takes the research conclusion beyond the basic question of “does it work?” and contributes additional information about “how well does it work?”

An issue with hypothesis testing using only critical values or *p*-values without considering effect size is that sample size directly affects the calculation of the results. All else being equal, the larger the sample size, the larger the calculated value of the test statistic. A small absolute difference between two means, for example, is more likely to be reported as statistically significant if it is based on a sample *n* of 1000, say, than on a sample *n *of 10. (You’ll see more about this in coming chapters.) Taking effect size into account provides a more complete picture of the results by quantifying the size of the difference.

The comparison is conceptually easy to grasp. It is just the difference between the two groups being compared, although that difference can be approached in two ways:

- absolute (raw) difference between group means
- standardized difference between group means

The absolute difference between group means, or the **absolute effect size**, is simply the difference between the means expressed in the distributions’ native units of measurement. It is useful when the variables have intrinsic and well-understood meanings (e.g., number of calories consumed). The downside to this is that it is analogous to comparing two raw scores within a distribution without taking into account the spread of the distribution.

The **standardized effect size**, figured as the **standardized difference between group means**, is useful when the measurements have no intrinsic meaning (e.g., Likert scales) or involve different scales of measurement. It also provides a basis for comparing differences across studies, and has been used extensively in post-hoc **meta-analyses** of studies. In this case, the difference between means is divided by the standard deviation to yield the standardized mean difference between the groups.

Using the standardized mean difference provides a “common ground” for comparisons. Just as converting raw scores to* z*-scores standardizes those scores and enables comparisons to be made in terms of a common scale, so does standardizing the difference between the group means.

Consider what would happen if we were to compare the distributions in either of the examples below. The two distributions on the left show the same amount of difference between their means as the two on the right, but look at the amount of overlap between the two pairs of curves.

The distribution on the left has a lesser standard deviation (less spread) with correspondingly less overlap, while the distribution on the right has a greater standard deviation (more spread) with more overlap. While comparisons of both sets of distributions might show statistically significant differences, the difference in the set on the left is more likely to be of practical significance. In fact, the difference in the set on the right might be hardly noticeable and have no practical significance at all.

This “difference in the differences” is quantifiable as the effect size. The comparison above shows why the standard deviation needs to be included in the calculation: to correct for differences in spread. The absolute effect size might be the same in either case, but the standardized difference between the group means would provide a more realistic assessment of the effect. Best practice is to report both the absolute effect size and the standardized effect size so as to present the most complete picture possible.

## Cohen’s *d *as a Measure of Effect Size

There are several measures of effect size available, one of the more commonly used being Cohen’s *d*. It can be used when comparing mean scores of two groups (as you’ll do in the next chapters). It is easy to compute and intuitive to interpret, being simply the difference between the group means divided by the average of their standard deviations.

Standardizing the measure in this way simplifies interpretation. A *d* of 1 tells us that the two groups’ means differ by one standard deviation, a *d* of 0.5 indicates a difference of one half a standard deviation, etc. This can be extended to making comparisons in terms of percentiles. An effect size of 0 would place the mean of group 2 at the 50th percentile of group 1, so that the distributions overlap completely—there is no difference. An effect size of 0.4 would place the mean of group 2 at the 66th percentile of group 1, so an individual with an average score in group 2 (at the mean) would have scored higher than 66% of the individuals in group 1. In like fashion, an effect size of 0.8 would place the mean of group 2 at the 79th percentile of group 1, so an individual with an average score in group 2 (at the mean) would have scored higher than 79% of the individuals in group 1.

Cohen suggests that effect sizes be classified according to size:

### Calculating Cohen’s *d*

The value for *d* is nothing more than the difference between two group means divided by their pooled standard deviations:

The formula for the pooled standard deviation is:

Combined, the two formulas can be shown as:

**Example**

A researcher compared two groups of elementary school students on a measure of time-to-acquisition of a simple memory task where the independent variable was the teaching method, either a textbook example or a video example. He found that the mean score for group 1 (textbook) was 24 with a standard deviation of 5, and the mean score for group 2 (video) was 20 with a standard deviation of 4. The researcher calculated his effect size as below.

The absolute effect size was reported as four points. The standardized effect size based on Cohen’s *d* was reported as .88, a large effect.

An effect size of zero would indicate __________.

No effect of the IV

No effect of the DV

A small effect

Significance of the outcome

Cohen’s *d* is used to calculate a $\_\_\_\_\_\_\_$ effect size.

Calculate Cohen’s *d* for an experiment in which the mean and SD for the two groups being compared were as follows:
Group 1: x̅ = 56, *s* = 18
Group 2: x̅ = 50, *s* = 14

## Decision Error

Since experimental conclusions are based on probability, no hypothesis test results in 100% certainty. There is always some probability of error in the results; therefore, nothing is ever “proven” beyond the shadow of a doubt. If the null is rejected in favor of the alternative hypothesis, it is still not “disproved.” it is only rejected in that instance. At the same time, the alternative hypothesis is not proven but rather “supported,” because there still remains some probability of error in the outcome.

This probability of error can be considered as taking one of two forms. They are referred to as Type I or Type II errors.

### Type I Error

Rejecting the null hypothesis when it is in fact true constitutes a **Type I error**.

The level of significance of a statistical test also sets the maximum probability of making a Type I decision error. Using a significance level of .05, for example, means that we will reject the null if the result is likely to have occurred no more than 5% of the time if the null hypothesis is actually true. That gives us a 5% maximum probability of making a Type I error, and this equivalence is also why we never “prove” or “disprove” the hypotheses with 100% certainty.

By logical extension, then, alpha also determines the probability of making a Type I error:

α = significance level = probability of Type I error

### Type II Error

Failing to reject the null when it is in fact false constitutes a **Type II** error.

The probability of a Type II error increases as the probability of a Type I error decreases. In other words, it is inversely related to the level of significance. It cannot usually be computed exactly because it depends on the population mean, which is usually unknown. It can be computed, however, for assumed (given) values of µ, σ^{2} and* n*.

The probability of a Type II error is represented as beta (β).

The probability of correctly rejecting the null (i.e., rejecting the null when it is false) is equal to 1 – β, also known as the power of the test.

The table below summarizes the error types and their probabilities.

**Example**

In a justice system based on jury trials, there are two types of errors: **1)** the person is innocent but the jury finds the person guilty, and **2)** the person is guilty but the jury declares the person is not guilty. In this system of justice, the first error is considered more serious than the second error. These two errors along with the correct decisions are shown in the next table:

With respect to hypothesis testing, the two errors that can occur are: **1)** the null hypothesis is true but the decision based on the testing process is that the null hypothesis should be rejected, and **2)** the null hypothesis is false but the testing process concludes that it should be accepted. These two errors are called Type I and Type II errors. As in the jury trial situation, a Type I error is usually considered more serious than a Type II error.

## Assumptions

- In a jury trial, the person accused of the crime is assumed innocent at the beginning of the trial, and unless the jury can find overwhelming evidence to the contrary, he or she should be judged not guilty at the end of the trial.
- In hypothesis testing, the null hypothesis is assumed to be true, and unless the test shows overwhelming evidence that the null hypothesis is not true, the null hypothesis is not rejected.

## Power of a Test

The **power **of a statistical test is equal to the probability of making a correct decision and rejecting the null hypothesis when it is in fact false. It is equal to the probability of not making a Type II error, or 1 - β. Several factors affect power:

- Sample size.
- The larger the
*n*, the greater the power of the test. - Significance level.
- The higher the value of α, the higher the power of the test. Increasing α has the effect of increasing the size of the rejection region and decreasing the size of the non-rejection region. This translates to a greater likelihood of rejecting the null hypothesis and, therefore, a reduced likelihood of failing to reject the null when it is false, i.e., a reduced probability of making a Type II error. As a result, beta is decreased and the power of the test is increased.
- The “true” value of the parameter being tested.
- The greater the difference between the true value of a parameter and that specified in the null hypothesis, the greater the power of the test. This in turn relates to effect size, as we saw earlier.

An important use of power in hypothesis testing is to help to ensure that the sample size being considered is large enough for the purpose of the test. If not, it is likely that the results will be inconclusive and the effort and resources devoted to the process will have been wasted.

The calculations for power are rather complicated and best done using a statistical package like R.

Experiment with the settings in the demonstration below to see the effect of various combinations of settings.

Other things being equal, which of the following will increase the power of a hypothesis test?

Reducing sample size

Reducing alpha

Increasing sample size

Increasing beta

Which of the following defines a Type I error?

The probability of rejecting the null hypothesis when it is false

The probability of accepting the null hypothesis

The probability of a false negative

The probability of rejecting the null hypothesis when it is true

A researcher is planning a study and is considering various options for alpha. Selecting a larger alpha value will result in decreased $\_\_\_\_\_\_\_.$

## Steps of Hypothesis Testing

This chapter has presented an introduction to the process of hypothesis testing. For reference, the general steps of hypothesis testing are summarized in the table below.

With reference to the table above, here are some helpful tips as you think about how to go about conducting the hypothesis test:

**Step 1:** State the claim and identify the null (H_{0}) and alternative (H_{1}) hypotheses.

Remember, H_{0} is the ‘no difference’ hypothesis and always includes an equality. Be sure to state the hypotheses so that they are complementary, as shown in the table below.

**Step 2:** Determine the alpha level (α).

Is there a standard alpha level in the field? Consider the consequences of making a Type I or Type II error when determining the appropriate alpha level. Are you conducting a 1-tailed or a 2-tailed test?

**Step 3: **Determine the appropriate test to use.

Test determination will depend upon your answers to specific questions: What type of sample data do you have? A mean? A proportion? What do you know about the population parameters? The mean? Standard error of the mean? Proportion of participants who comprise a particular group? Answers to questions like these will guide your choice of statistical tests and computational formulas.

**Step 4:** Identify the critical value of the test statistic.

How will that be calculated? What table or technology tool will you use?

**Step 5:** Calculate the test statistic.

This step involves using the computational formula to calculate the appropriate test statistic, or using a technological tool, either a statistical software package or a statistics calculator that can calculate and display the desired statistic.

**Step 6: **Compare the calculated test statistic to the critical value.

*Traditional approach:*

If the calculated value of the test statistic is more extreme than the critical value, reject H_{0}; otherwise, fail to reject H_{0}.

*p-value approach: *

An alternative approach is to determine the probability of the value of the statistic given that the null hypothesis is true, also known as the **p****-value**. If the probability, or *p*-value, is > α, you must fail to reject H_{0}. If the *p*-value is < α, you should reject H_{0}.

**Step 7:** Interpret the decision.

Summarize the finding in context. State a conclusion and make any inferences that are appropriate.

Arrange the hypothesis testing steps in their correct order.

State the null and alternative hypotheses

Determine the alpha level

Calculate the test statistic

Identify the critical values of the test statistic

Decide to reject or fail to reject $H_0$

Interpret the decision in terms of the original claim

Determine the appropriate statistical test

## Case Study: Attention Span

One of the more intriguing studies done recently is one by the Microsoft Corporation, in which it was reported that humans now have a shorter attention span than goldfish! Wait, did you really say goldfish? Hmm, let’s take a closer look at this!

The point of the study was to show how attention span has deteriorated since the mobile revolution began. The Microsoft study indicated that people’s attention span has fallen from about 12 seconds in the year 2000 to about 8 seconds currently. A goldfish, on the other hand, has an attention span of about 9 seconds. This decrease in attention has been attributed to increased use of smartphones and the need to quickly focus on content and generate a response in a matter of seconds.

To be fair, there are some related upsides to these findings, but it’s always more fun to sensationalize something, right? In this case, there are corresponding plusses; for example, while long-term focus was shown to erode with increased digital consumption, the same users were shown to have more intermittent high-intensity bursts of attention in the short term. These may represent off-setting differences.

The full report is available here. Please review it before continuing.

There have also been some criticisms of the Microsoft study. One such article, by technical writer Ken McCall, is available here. Please review the article and continue on to the discussion questions that follow.

### Case Study Question 9.01

The Microsoft study divided participants into three groups based on attentional criteria. What sampling method does this represent? Why was that appropriate? (or, why not?)

Click here to see the answer to Case Study Question 9.01.

### Case Study Question 9.02

Critique the statement that “the average human attention span is 8 seconds.”

Click here to see the answer to Case Study Question 9.02.

### Case Study Question 9.03

Mr. McCall quotes Drs. Dukette and Cornish as stating "Continuous attention span may be as short as 8 seconds. After this amount of time, it's likely an individual's eyes may shift, or a stray thought will briefly enter consciousness. However, these short lapses are only minimally distracting and do not tend to interfere with task performance. Attention spans in children may be 3 to 5 minutes increasing as we age to adults being about 20 minutes. Adults can reset their attention spans at 20-minute intervals, if they so desire." How does this square with the Microsoft conclusion?

Click here to see the answer to Case Study Question 9.03.

### Case Study Question 9.04

What is one takeaway from this?

Click here to see the answer to Case Study Question 9.04.

### References

8-Second Attention Span? McCall, Ken. April 18, 2014. Retrieved from https://www.linkedin.com/pulse/20140418171300-15742110-writing-for-goldfish

Dianne Dukette; David Cornish (2009). The Essential 20: Twenty Components of an Excellent Health Care Team. RoseDog Books. pp. 72–73. ISBN 1-4349-9555-0.

## Pre-Class Discussion Questions

### Class Discussion 9.01

What is a hypothesis?

Click here to see the answer to Class Discussion 9.01.

### Class Discussion 9.02

Explain the relationship between the null and alternative hypotheses.

Click here to see the answer to Class Discussion 9.02.

### Class Discussion 9.03

Explain what is meant by “proof by contradiction.”

Click here to see the answer to Class Discussion 9.03.

### Class Discussion 9.04

What is the role of alpha in significance testing?

Click here to see the answer to Class Discussion 9.04.

### Class Discussion 9.05

Why is it important to also consider effect size when making conclusions about significance of results?

Click here to see the answer to Class Discussion 9.05.

## Answers to Case Study Questions

### Answer to Case Study Question 9.01

Using stratified sampling enabled researchers to avoid confoundings due to increased variance as a result of individual differences.

Click here to return to Case Study Question 9.01.

### Answer to Case Study Question 9.02

What is missing here is “attention to what and under what conditions?” That is a very general conclusion taken out of the context of just what it is that was being attended to.

Click here to return to Case Study Question 9.02.

### Answer to Case Study Question 9.03

Attention span can be defined in various ways and “tested” accordingly. Think “operationally defined”!

Click here to return to Case Study Question 9.03.

### Answer to Case Study Question 9.04

Human attention span may very well peak at about 8 seconds in a digital environment, giving some support to the claims made in the Microsoft article. This has clear implications for advertisers, who are the intended beneficiaries of the report, but should probably be taken with a grain of salt when bandied about as representing a general state of affairs of human attention span.

Click here to return to Case Study Question 9.04.

## Answers to Pre-Class Discussion Questions

### Answer to Class Discussion 9.01

A hypothesis is a tentative statement about a relationship or relationships between or among two or more variables. It is tentative in that it expresses what is expected to happen but it has not yet been confirmed by observation.

Click here to return to Class Discussion 9.01.

### Answer to Class Discussion 9.02

The null hypothesis represents a condition that is assumed to be true until (unless) there is convincing evidence to the contrary, while the alternative represents the predicted outcome of the investigation at hand. The two statements are complementary in that, between them, they summarize all possible outcomes of the investigation.

Click here to return to Class Discussion 9.02.

### Answer to Class Discussion 9.03

The logic is that if the null hypothesis can be rejected, then support goes to the alternative. This “proof” stops short of being conclusive since, even if the evidence supports rejection of the null statement, there remains some probability of error.

Click here to return to Class Discussion 9.03.

### Answer to Class Discussion 9.04

Alpha is a predetermined value that establishes the error limit, i.e., the probability of rejecting the null hypothesis when it is actually true.

Click here to return to Class Discussion 9.04.

### Answer to Class Discussion 9.05

With large sample sizes, even a small difference in outcomes might be considered significant. The actual size of the difference (the effect size) must also be considered so as to distinguish between statistically significant differences that are or are not of any practical importance.

Click here to return to Class Discussion 9.05.

### Image Credits

[1] Image courtesy of Tracy Collins under CC BY-SA 2.0.

[2] Image courtesy of the Kelly Library, University of Toronto in the Public Domain.

[3] Image under CC0 1.0 Universal.