Hypothesis testing

class: title-slide center middle
background-image: url("img/hands.png")
background-position: right
background-size: contain
background-color: white

### .red[Rachel Warnock]

### .red[09.04.2024]
]

---

# Today

<br>

.remark-slide table{
        width: 100%;
    }

/* Change the background color to white for shaded rows (even rows) */

.remark-slide thead, .remark-slide tr:nth-child(2n) {
        background-color: white;
    }
    
</style>

.pull-left[
<table class="table table-striped" style="width: auto !important; ">
 <thead>
  <tr>
   <th style="text-align:left;"> schedule </th>
   <th style="text-align:center;"> start.time </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Course introduction </td>
   <td style="text-align:center;"> 10:00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Group exercise: mini reading group </td>
   <td style="text-align:center;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Break </td>
   <td style="text-align:center;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Introduction to hypothesis testing </td>
   <td style="text-align:center;"> 11:15 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Group exercise: defining hypotheses </td>
   <td style="text-align:center;"> 12.30 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Lunch (1 hr) </td>
   <td style="text-align:center;"> 13:30 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> p-values </td>
   <td style="text-align:center;"> 14:00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Parametric tests </td>
   <td style="text-align:center;"> 15:00 </td>
  </tr>
</tbody>
</table>
.footnote[*times are approximate]
]

.pull-right[
<center><img src="https://media.giphy.com/media/icVkqVBTfuBDczxyBH/giphy.gif" alt="" height="300px" /></center>
]

---

# About this course

You will learn:
* how develop and test your own hypotheses
* perform basic statistical tests 
* apply this knowledge to reproduce (and potentially improve) published results
]

.large[Each participant is assigned to a working and each group is assigned a published scientific paper.]

---

<br>

]

.large[
[*Statistical analysis of iron geochemical data suggests limited late Proterozoic oxygenation*](https://www.nature.com/articles/nature14589)
]

.large[
[*High coral diversity is coupled with reef-building capacity during the Late Oligocene warming*](https://riviste.unimi.it/index.php/RIPS/article/view/16332)
]

.large[
[*Isotopic and anatomical evidence of an herbivorous diet in the Early Tertiary giant bird Gastornis*](https://pubmed.ncbi.nlm.nih.gov/24563098/)
]

]

---

# Course evaluation

.large[You can use this [*Google Slides Template*](https://docs.google.com/presentation/d/1R_p1v3kD2eWrfU0uOmbvyE4iP0YD2_KRqfAuxwOLSMg/edit?usp=sharing) to prepare your presentation.]

---
<br>
<br>
<br>
<br>

# Will I have to use R?

.large[The focus of the course is on the **concepts** behind statistical hypothesis testing and reproducibility, not programming, but both things are (probably) easier if you use R.]

---

background-image: url("img/fully-expecting.jpeg")
background-size: contain
class: middle

---
class: middle inverse

]

.pull-right[
<center><img src="https://mochimochiland.com/wp-content/uploads/cheerupDRAFT1sm.gif" alt="" height="300px" /></center>
]

---
class: middle

.pull-right[
# Why does palaeobiology feature so much statistics?
.large[What do you think? &#129300;]
]

*"Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data."*

---

# Why study statistics?

.large[
* It makes the **literature** more accessible, since many papers you encounter will include statistics
]

---
class: middle
# Mini reading group

In your working groups:

Introduce yourselves and discuss your paper.
* What is the paper about?
* What was the general aim?
* Did you like / not like it?

To report back: prepare a three sentence summary.

]

.pull-right[
<div class="countdown" id="timer_ad92a787" data-warn-when="120" data-update-every="1" tabindex="0" style="right:0;bottom:0;position: relative; width: min-content; margin: 1em auto;">
<div class="countdown-controls"><button class="countdown-bump-down">−</button><button class="countdown-bump-up">+</button></div>
<code class="countdown-time"><span class="countdown-digits minutes">20</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>
]
---
class: middle center

# .black[Time for a break]

---
class: middle
background-image: url("img/noahs-ark.jpg")
background-size: contain
background-position: right
name: hypothesis

]

---
background-image: url("img/noahs-ark.jpg")
background-size: contain
background-position: right

* Statistical hypotheses

* Null vs. alternative hypotheses

* Significance and *p*-values
]

]

---
background-image: url("img/moons.png")
background-size: contain
background-position: right

# Hypotheses

.small[Right: [Galileo's sketches](https://www.metmuseum.org/art/collection/search/778099) of the moon from 1610]

]

---
class: middle
background-image: url("img/Platt.png")
background-size: contain
background-position: right

1. Develop alternative hypotheses

2. Devise experiments to eliminate all but one hypothesis

3. Perform the experiments
]

.footnote[[Platt (1964)](https://www.science.org/doi/10.1126/science.146.3642.347)]
]
---
<br>
<br>
<br>

# Benefits of hypothesis driven research

.large[Reduces chances that researchers become attached to a single outcome and design tests on this basis (confirmation bias).]

.large[
Forces us to design tests in advance of seeing the data, rather than explaining patterns we observe later (inductive research).
]

---

# Research hypotheses

.large[The research hypothesis can include the potential mechanism or cause of a phenomenon and used to make predictions.]

---

# Example

.large[
**Research question** `\(-\)` What is the impact of continental configuration on extinction risk?]

.large[
**Research hypothesis** `\(-\)` Extinction intensity during the Ordovician will be higher than the Cenozoic because less species are able to disperse along continental margins.]

---
class: bottom
background-image: url("img/continents.png")
background-size: 85%
background-position: center

---
<br>
<br>
<br>

# Example

.large[
**Research question** `\(-\)` How does body size affect diversification potential among mammals following the K-Pg extinction?]

.large[
**Research hypothesis** `\(-\)` Large mammals will undergo more diversification relative to other body size categories through expansion into larger body size niches previously occupied by dinosaurs (the dinosaur incumbency hypothesis).]

.my-style {
  font-size: .7em;
}
</style>
---
class: bottom
background-image: url("img/mammals.jpg")
background-size: 70%
background-position: center

[Benevento et al. 2023](https://onlinelibrary.wiley.com/doi/full/10.1111/pala.12653)
.my-style[A, total Mammalia; B, Multituberculata; C, Metatheria; D, Eutheria.]

---
class: middle

## What are other examples of research hypotheses in palaeobiology?

]

---
<br>
<br>

## Examples of statements that are **not** research hypotheses

.large[**There is life on other planets**] `\(-\)` .large[this is a reasonable assumption to make, but there is no feasible way to test this claim.]

.large[**Dinosaurs are cool**] `\(-\)` .large[this is a personal opinion and can not be empirically validated or falsified.]

.large[**There will be more brachiopods than bivalves among the fossils I collect**] `\(-\)` .large[this is a claim about the data, not about a scientific theory. And actually sounds more like a statistical hypothesis...]

---
<br>
<br>
<br>

## Research vs. statistical hypotheses

.large[A research hypotheses can be a bit fuzzy, but ultimately research hypotheses are scientific claims.]

.large[A statistical hypothesis must be **mathematically precise** and correspond to specific claims about parameters that can be used to generate (or describe) the distribution of data points.]

.large[A **statistical hypothesis** is a predicted pattern in the data that should occur if the research hypothesis is true.]

---
background-image: url("img/blue-bird.png")
background-size: contain
background-size: 15%
background-position: 90% 10%

## Statistical hypothesis example

.large[Say we have a population of birds that can either be **.blue[blue]** or **.red[red]**. We want to know, does being blue confer some advantage?]

* More birds are blue.

* More birds are red.
]

---
background-image: url("img/blue-bird.png")
background-size: contain
background-size: 15%
background-position: 90% 10%

## Statistical hypothesis example

.large[
* Birds have the same chance of being .blue[blue] as being .red[red]. If this is true, then `\(\theta=\)` 0.5.
]

---
background-image: url("img/blue-bird.png")
background-size: contain
background-size: 15%
background-position: 90% 10%

## Statistical hypothesis example

<br>

.large[
These are examples of **statistical hypotheses** because they are statements about a "population" parameter.]

--
.large[
A statistical (hypothesis) test is a *test* of the statistical hypothesis, not the research hypothesis!
]

---

## Null vs. alternative hypotheses

.large[
The **null hypothesis** `\(H_0\)` is a concise statement expressing the concept of "no difference" between a sample and the population mean.

It corresponds to the exact opposite of the thing I want to believe!
]

.large[
The goal is not to show that the alternative hypothesis is (probably) true, the goal is to show that the null hypothesis is (probably) false!
]

---
background-image: url("img/blue-bird.png")
background-size: contain
background-size: 15%
background-position: 90% 10%

## Statistical hypothesis example

<br>

--
.large[
Evidence in favour of the alternative hypotheses would give us a hint that being blue might be advantageous.
]

---
background-image: url("img/apple.jpg")
background-size: contain
background-position: right

## Example

.pull-left[
.large[
**Research question** `\(-\)` What are the health benefits of eating an apple a day?
]

.large[
**Research hypothesis** `\(-\)` Increasing apple consumption in over-60s will result in decreasing frequency of doctor’s visits.
]

.large[
**Null hypothesis** `\(-\)` Increasing apple consumption in over-60s will have no effect on frequency of doctor’s visits.
]
]

---
background-image: url("img/plane.jpg")
background-size: contain
background-position: right

## Example

.large[
**Research hypothesis** `\(-\)` Low-cost airlines are more likely to have delays than premium airlines.
]

.large[
**Null hypothesis** `\(-\)` Low-cost and premium airlines are equally likely to have delays.
]
]

---
background-image: url("img/phones.jpg")
background-size: contain
background-position: right

## Example

.pull-left[
.large[
**Research question** `\(-\)` What effect does daily use of social media have on the attention span of under-16s?
]

.large[
**Research hypothesis** `\(-\)` There is a negative correlation between time spent on social media and attention span in under-16s.
]

.large[
**Null hypothesis** `\(-\)` There is no relationship between social media use and attention span in under-16s.
]
]

---
class: middle
# Hypotheses

Discuss the hypotheses in your paper.

Try to identify the following:
* The research hypothesis
* The alternative hypothesis
* The null hypothesis

Make a note to add your presentation.
Don't worry about getting it perfect, you can go back and refine it later.

Do you agree with the authors? Are there more than one?

]

.pull-right[
<div class="countdown" id="timer_4f8ccf56" data-warn-when="12" data-update-every="1" tabindex="0" style="right:0;bottom:0;position: relative; width: min-content; margin: 1em auto;">
<div class="countdown-controls"><button class="countdown-bump-down">−</button><button class="countdown-bump-up">+</button></div>
<code class="countdown-time"><span class="countdown-digits minutes">15</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>
]

---
class: middle center

# .black[Time for a break]

---
name: significance
<br>
<br>
<br>

## Sampling distributions

.large[
If the probability of being a .blue[blue] bird is `\(\theta = 0.5\)`, what would we expect the data to look like?
]

.large[
We need to determine what the **sampling distribution** of the test statistic would be if the null hypothesis was true. 
]

.large[
This distribution tells us what values we can expect if `\(H_0\)` is true. We use this a tool to assess how closely the null agrees with our data.
]

---

The null hypothesis predicts that `\(X\)` is [*binomially distributed*](https://en.wikipedia.org/wiki/Binomial_distribution).
It says `\(X = 50/100\)` is the most likely outcome, so we'd expect to see somewhere between 40 and 60 .blue[blue] birds.

---

## Critical regions and critical values

.large[
If the null hypothesis is true, the sampling distribution of `\(X\)` is Binomial `\((0.5, N)\)`.
]

---
class: middle center

Our critical region consists of the most extreme values, known as the **tails** of the distribution.

---
class: middle

.large[
The **critical region** corresponds to the values of `\(X\)` for which we would reject the null hypothesis.

The **sampling distribution** allows us to calculate the probability that we would obtain a particular value of `\(X\)` if the null hypothesis were actually true.
]
]

.pull.right[

]

---
class: middle

The numbers 40 and 60 are our **critical values**.

If the number of blue birds is between 41 and 59, then we should retain the null hypothesis (birds have the same chance of being .blue[blue] as being .red[red]).

If the number of blue birds is between 0 to 40 **or** between 60 to 100, then we should reject the null hypothesis `\(-\)` this is a **two tailed test**.
]
]

.pull.right[
<img src="img/samplingdist2.png" alt="" height="400px" />
]

---
class: middle
background-image: url("img/tails.png")
background-size: 60%
background-position: right

---
class: middle center

The critical region for a **one sided test**. We would use this to test for `\(\theta >\)` 0.5 (i.e, more birds are .blue[blue]).

---
class: middle
## Statisical significance

.large[
If the data allow us to reject the null hypothesis, we say that "the result is statistically significant", which is often shortened to "the result is significant".

This terminology reflects a time when "significant" meant something like "indicated", rather than its more recent meaning, which is closer to "important".
]

--
.large[
How do we define significance?
]
---
<br>
<br>
<br>

## Two types of errors

.large[
The goal behind statistical hypothesis testing is not to *eliminate* but to *minimize* errors.
]

---
<br>
<br>
<br>

## Two types of errors

* `\(H_0\)` is **true** and we *correctly accept* the null

* `\(H_0\)` is **true** and we *incorrectly reject* the null (type I error)

* `\(H_0\)` is **false** and we *correctly reject* the null

* `\(H_0\)` is **false** and we *incorrectly accept* the null (type II error)
]

---
<br>
<br>
<br>

## Two types of errors

.large[
One of the most important design principles of hypothesis testing is to control the probability `\(\alpha\)` of a **type I error**. ]

.large[
`\(\alpha\)` is called the significance level. By convention we often use `\(\alpha\)` of 0.05, 0.01, 0.001.]

.large[
A hypothesis test that is said to have a significance level `\(\alpha\)` has a type I error rate is no larger than `\(\alpha\)`.
]

---
background-image: url("img/blue-bird.png")
background-size: contain
background-size: 10%
background-position: 90% 5%

# *p*-values

.large[
*p* can be defined to be the smallest type I error rate `\(\alpha\)` that you are willing to tolerate if you want to reject the null hypothesis.

In the bird example, `\(X=\)` 62 .blue[blue] birds gives us 
*p* = 0.021. The results can be interpreted as shown in the table, given *p* = 0.021.

`\(X=\)` 97 .blue[blue] birds would give us *p* = 1.36 X `\(10^{-25}\)`, which is a tiny, tiny type I error rate.
]

]

---
# *p*-values

Recall that the critical region corresponds to the tails (extremes) of the distribution.

*p* can therefore also be defined as the probability of observing a test statistic that is (at least) as extreme as the one we actually get.

If the data are extremely implausible according to the null hypothesis, then the null hypothesis is probably wrong.

]

]

.pull.right[

<center><img src="img/samplingdist2.png" alt="" height="350px" /></center>
]

---
class: middle
background-image: url("img/P-values.jpg")
background-size: contain
background-position: centre

---
<br>
<br>
<br>

# Reporting *p*-values

.large[
**Option 1** `\(-\)` you can state only that *p* `\(< \alpha\)` for a significance level that you chose in advance, e.g., *p<*.05.

But this means we're being forced to treat *p*=.051 in a fundamentally different way to *p*=.049.]

.large[
**Option 2** `\(-\)` just report the actual *p* value and let the reader make up their own minds about what an acceptable type I error rate is.

But if you get *p=*.062, then it means that you have to be willing to tolerate a type I error rate of 6.2% to justify rejecting the null. If you personally find 6.2% intolerable, then you retain the null.
]

---
<br>
<br>
<br>

## Two types of errors

<br>

We refer to the **power** of the test, which is the probability that we reject the null hypothesis when it really is false, which is `\(1 - \beta\)`.

A powerful test has a small value of `\(\beta\)`. Note we don't have a corresponding level for `\(\beta\)`.]

<br>

---
class: middle
# Intepreting *p*-values

How have the authors in your paper interpreted the *p* values?

Spend 15 minutes putting together the 'Scientific background' and 'Hypothesis' slides.
]

.pull-right[
<div class="countdown" id="timer_8afc408f" data-warn-when="12" data-update-every="1" tabindex="0" style="right:0;bottom:0;position: relative; width: min-content; margin: 1em auto;">
<div class="countdown-controls"><button class="countdown-bump-down">−</button><button class="countdown-bump-up">+</button></div>
<code class="countdown-time"><span class="countdown-digits minutes">15</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>
]

---

# .black[Time for a break]