Homework Assignment 8: Exploring Hypotheses

Study background

For my thesis, I am interested in studying the effects of temperature on survival of different species in a hybrid zone of Mytilus mussels. We are testing the hypothesis, “hybrid Mytilus have novel temperature tolerance traits that facillitate their survival,” and testing this hypothesis by looking at survival of larval mussels in different temperatures. The different species I will be looking at are Mytilus galloprovinciallis (MG), Mytilus planulatus (MP), and introgressed offspring between the two (MGxMP). I will be subjecting larvae for each of these to two temperature treatments, 18 deggrees celcius and 25 degreese celcius. For this assignment, I am using information about MG and MP temperature tolerance to simulate mean survival rate in the different species. To get approximates for temperature tolerance, I am using a paper by Iva Popovic and Cynthia Riginos titiled “Comparative genomics reveals divergent thermal selection in warm- and cold-tolerant mussels.” For this simulation, I am assuming the following thermal optima for each species: MG thermal optimum: 19.7 degrees C MP thermal optimum: 11.2 degrees C

A null hypothesis for this study is that “Hybrid mussels have intermediate thermal tolerances.” This claim has been evaluated in fruit flies in my lab and found to be true. I ran these analyses assuming this trend.

Load in relevant libraries

library(tidyverse)
library(ggplot2)
library(MASS)

Data simulation

Given MG’s higher thermal tolerance, I will be assuming that larvae at higher temperatures will survive at a higher proportion. I am simulating 50 replicates of each larval experiment, and then combining into a data frame. For this assignment, I will assume that hybrid MG x MP have an intermediate survival rate.

# simulated MG
mg_sur_25 <- rnorm(n = 50, mean = 0.65, sd = 0.048)
mg_sur_18 <- rnorm(n = 50, mean = 0.85, sd = 0.076)

# simulated MP
mp_sur_25 <- rnorm(n = 50, mean = 0.37, sd = 0.086)
mp_sur_18 <- rnorm(n = 50, mean = 0.71, sd = 0.053)

# simulated hybrid MG X MP
hy_sur_25 <- rnorm(n=50, mean = 0.52, sd = 0.041)
hy_sur_18 <- rnorm(n=50, mean = 0.75, sd = 0.076)

# combine Atomic vectors so they can go into DF
surv_rate <- c(mp_sur_18, mg_sur_18, hy_sur_18, mp_sur_25, mg_sur_25, hy_sur_25)

temp_treatment <- c(rep(18, 150), rep(25, 150))

species <- c(rep("MP", 50), rep("MG", 50), rep("HY", 50),rep("MP", 50), rep("MG", 50), rep("HY", 50))

Data frame construction

I combined my simulated vectors into a data frame. A major troubleshooting factor I needed to consider was that temperature is being input into the data frame as two numbers, but these are factors for the purpose of the data analysis. Up front, it is important to define temperature as a factor.

dat <- data.frame(index=1:length(surv_rate),
                   surv_rate=surv_rate,
                   temp=temp_treatment,
                   species=species)
dat$temp<-as.factor(dat$temp)

# Define temperature as.factor for later analysis. Otherwise, it will be treated as a continuous variable, which isn't useful for this analysis.

# check out structure of data frame and make sure it's how I expect
head(dat)
##   index surv_rate temp species
## 1     1 0.6776006   18      MP
## 2     2 0.6186098   18      MP
## 3     3 0.5923216   18      MP
## 4     4 0.6837407   18      MP
## 5     5 0.7113084   18      MP
## 6     6 0.6050999   18      MP
str(dat)
## 'data.frame':    300 obs. of  4 variables:
##  $ index    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ surv_rate: num  0.678 0.619 0.592 0.684 0.711 ...
##  $ temp     : Factor w/ 2 levels "18","25": 1 1 1 1 1 1 1 1 1 1 ...
##  $ species  : chr  "MP" "MP" "MP" "MP" ...

Analyze data using ANOVA

Given that my data is separated into distinct groupings (species) and we are interested in evaluating the effects of two temperatures, the best way to analyze this data is using an ANOVA. I ran an anova to look at survival rate as an effect of temperature, as an effect of species, and as an effect of temperatuer and species.

# Run ANOVA
ANOmodel <- aov(surv_rate~temp + species + temp:species,data=dat)

# Check out ANOVA
print(ANOmodel)
## Call:
##    aov(formula = surv_rate ~ temp + species + temp:species, data = dat)
## 
## Terms:
##                     temp  species temp:species Residuals
## Sum of Squares  3.963180 1.859141     0.268414  1.124818
## Deg. of Freedom        1        2            2       294
## 
## Residual standard error: 0.06185395
## Estimated effects may be unbalanced
print(summary(ANOmodel))
##               Df Sum Sq Mean Sq F value   Pr(>F)    
## temp           1  3.963   3.963 1035.88  < 2e-16 ***
## species        2  1.859   0.930  242.97  < 2e-16 ***
## temp:species   2  0.268   0.134   35.08 2.18e-14 ***
## Residuals    294  1.125   0.004                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
z <- summary(ANOmodel)
str(z)
## List of 1
##  $ :Classes 'anova' and 'data.frame':    4 obs. of  5 variables:
##   ..$ Df     : num [1:4] 1 2 2 294
##   ..$ Sum Sq : num [1:4] 3.963 1.859 0.268 1.125
##   ..$ Mean Sq: num [1:4] 3.96318 0.92957 0.13421 0.00383
##   ..$ F value: num [1:4] 1035.9 243 35.1 NA
##   ..$ Pr(>F) : num [1:4] 2.33e-98 5.18e-63 2.18e-14 NA
##  - attr(*, "class")= chr [1:2] "summary.aov" "listof"

Graph the data

I used a boxplot in ggplot to visualize my data. This makes sense because it allows comparison of different survival rates among the different species contextualized by temperature. It allows the viewer to discern these relationships between categorical data.

boxplot <-ggplot(data=dat) +
  aes(x=temp, y=surv_rate, fill=species) +
  geom_boxplot() +
  xlab("Temperature (°C)") +
  ylab("Survival Rate")
boxplot + labs(fill = "Species") + theme(panel.background = element_blank())

Data simulation manipulation: randomly generated means

For this manipulation, I will randomly generate 6 means with a uniform distribution and input those for my mean values to see how this alters data understanding.

I set a seed so that my results, though random, will be consistently the same upon replication of this document. By chance and as an effect of the standard deviations, there were still statistical differences in my data. This is interesting. When I reduce the size of the difference in means to between 0.4 and 0.75, I no longer see a statistically significant relationship with temperature.

# Randombly generated means
set.seed(04199968)
mean <- runif(n=6, min = 0.4, max = 0.75)

# simulated MG
mg_sur_25 <- rnorm(n = 50, mean = mean[1], sd = 0.048)
mg_sur_18 <- rnorm(n = 50, mean = mean[2], sd = 0.076)

# simulated MP
mp_sur_25 <- rnorm(n = 50, mean = mean[3], sd = 0.086)
mp_sur_18 <- rnorm(n = 50, mean = mean[4], sd = 0.053)

# simulated hybrid MG X MP
hy_sur_25 <- rnorm(n=50, mean = mean[5], sd = 0.041)
hy_sur_18 <- rnorm(n=50, mean = mean[6], sd = 0.076)

# combine Atomic vectors so they can go into DF
surv_rate <- c(mp_sur_18, mg_sur_18, hy_sur_18, mp_sur_25, mg_sur_25, hy_sur_25)

temp_treatment <- c(rep(18, 150), rep(25, 150))

species <- c(rep("MP", 50), rep("MG", 50), rep("HY", 50),rep("MP", 50), rep("MG", 50), rep("HY", 50))
dat <- data.frame(index=1:length(surv_rate),
                   surv_rate=surv_rate,
                   temp=temp_treatment,
                   species=species)
dat$temp<-as.factor(dat$temp)
# Run ANOVA
ANOmodel <- aov(surv_rate~temp + species + temp:species,data=dat)

# Check out ANOVA
print(ANOmodel)
## Call:
##    aov(formula = surv_rate ~ temp + species + temp:species, data = dat)
## 
## Terms:
##                      temp   species temp:species Residuals
## Sum of Squares  0.0055134 2.4945718    0.6737904 1.3440424
## Deg. of Freedom         1         2            2       294
## 
## Residual standard error: 0.06761341
## Estimated effects may be unbalanced
print(summary(ANOmodel))
##               Df Sum Sq Mean Sq F value Pr(>F)    
## temp           1 0.0055  0.0055   1.206  0.273    
## species        2 2.4946  1.2473 272.835 <2e-16 ***
## temp:species   2 0.6738  0.3369  73.694 <2e-16 ***
## Residuals    294 1.3440  0.0046                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
z <- summary(ANOmodel)
str(z)
## List of 1
##  $ :Classes 'anova' and 'data.frame':    4 obs. of  5 variables:
##   ..$ Df     : num [1:4] 1 2 2 294
##   ..$ Sum Sq : num [1:4] 0.00551 2.49457 0.67379 1.34404
##   ..$ Mean Sq: num [1:4] 0.00551 1.24729 0.3369 0.00457
##   ..$ F value: num [1:4] 1.21 272.84 73.69 NA
##   ..$ Pr(>F) : num [1:4] 2.73e-01 1.01e-67 1.14e-26 NA
##  - attr(*, "class")= chr [1:2] "summary.aov" "listof"
boxplot <-ggplot(data=dat) +
  aes(x=temp, y=surv_rate, fill=species) +
  geom_boxplot() +
  xlab("Temperature (°C)") +
  ylab("Survival Rate")
boxplot + labs(fill = "Species") + theme(panel.background = element_blank())

## Sample size manipulation Even when I reduce my sample size from 50 to 5, with the data I input there is a statistically significant relationship. This is likely due to my mean and SD differences.

# simulated MG
mg_sur_25 <- rnorm(n = 5, mean = 0.65, sd = 0.048)
mg_sur_18 <- rnorm(n = 5, mean = 0.85, sd = 0.076)

# simulated MP
mp_sur_25 <- rnorm(n = 5, mean = 0.37, sd = 0.086)
mp_sur_18 <- rnorm(n = 5, mean = 0.71, sd = 0.053)

# simulated hybrid MG X MP
hy_sur_25 <- rnorm(n=5, mean = 0.52, sd = 0.041)
hy_sur_18 <- rnorm(n=5, mean = 0.75, sd = 0.076)

# combine Atomic vectors so they can go into DF
surv_rate <- c(mp_sur_18, mg_sur_18, hy_sur_18, mp_sur_25, mg_sur_25, hy_sur_25)

temp_treatment <- c(rep(18, (5*6)/2), rep(25, (5*6)/2))

species <- c(rep("MP", 5), rep("MG", 5), rep("HY", 5),rep("MP", 5), rep("MG", 5), rep("HY", 5))
dat <- data.frame(index=1:length(surv_rate),
                   surv_rate=surv_rate,
                   temp=temp_treatment,
                   species=species)
dat$temp<-as.factor(dat$temp)
# Run ANOVA
ANOmodel <- aov(surv_rate~temp + species + temp:species,data=dat)

# Check out ANOVA
print(ANOmodel)
## Call:
##    aov(formula = surv_rate ~ temp + species + temp:species, data = dat)
## 
## Terms:
##                      temp   species temp:species Residuals
## Sum of Squares  0.4495932 0.2135946    0.0560142 0.1031879
## Deg. of Freedom         1         2            2        24
## 
## Residual standard error: 0.06557055
## Estimated effects may be unbalanced
print(summary(ANOmodel))
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## temp          1 0.4496  0.4496 104.569 3.17e-10 ***
## species       2 0.2136  0.1068  24.839 1.43e-06 ***
## temp:species  2 0.0560  0.0280   6.514   0.0055 ** 
## Residuals    24 0.1032  0.0043                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
z <- summary(ANOmodel)
str(z)
## List of 1
##  $ :Classes 'anova' and 'data.frame':    4 obs. of  5 variables:
##   ..$ Df     : num [1:4] 1 2 2 24
##   ..$ Sum Sq : num [1:4] 0.45 0.214 0.056 0.103
##   ..$ Mean Sq: num [1:4] 0.4496 0.1068 0.028 0.0043
##   ..$ F value: num [1:4] 104.57 24.84 6.51 NA
##   ..$ Pr(>F) : num [1:4] 3.17e-10 1.43e-06 5.50e-03 NA
##  - attr(*, "class")= chr [1:2] "summary.aov" "listof"
boxplot <-ggplot(data=dat) +
  aes(x=temp, y=surv_rate, fill=species) +
  geom_boxplot() +
  xlab("Temperature (°C)") +
  ylab("Survival Rate")
boxplot + labs(fill = "Species") + theme(panel.background = element_blank())