For my thesis, I am interested in studying the effects of temperature on survival of different species in a hybrid zone of Mytilus mussels. We are testing the hypothesis, “hybrid Mytilus have novel temperature tolerance traits that facillitate their survival,” and testing this hypothesis by looking at survival of larval mussels in different temperatures. The different species I will be looking at are Mytilus galloprovinciallis (MG), Mytilus planulatus (MP), and introgressed offspring between the two (MGxMP). I will be subjecting larvae for each of these to two temperature treatments, 18 deggrees celcius and 25 degreese celcius. For this assignment, I am using information about MG and MP temperature tolerance to simulate mean survival rate in the different species. To get approximates for temperature tolerance, I am using a paper by Iva Popovic and Cynthia Riginos titiled “Comparative genomics reveals divergent thermal selection in warm- and cold-tolerant mussels.” For this simulation, I am assuming the following thermal optima for each species: MG thermal optimum: 19.7 degrees C MP thermal optimum: 11.2 degrees C
A null hypothesis for this study is that “Hybrid mussels have intermediate thermal tolerances.” This claim has been evaluated in fruit flies in my lab and found to be true. I ran these analyses assuming this trend.
library(tidyverse)
library(ggplot2)
library(MASS)
Given MG’s higher thermal tolerance, I will be assuming that larvae at higher temperatures will survive at a higher proportion. I am simulating 50 replicates of each larval experiment, and then combining into a data frame. For this assignment, I will assume that hybrid MG x MP have an intermediate survival rate.
# simulated MG
mg_sur_25 <- rnorm(n = 50, mean = 0.65, sd = 0.048)
mg_sur_18 <- rnorm(n = 50, mean = 0.85, sd = 0.076)
# simulated MP
mp_sur_25 <- rnorm(n = 50, mean = 0.37, sd = 0.086)
mp_sur_18 <- rnorm(n = 50, mean = 0.71, sd = 0.053)
# simulated hybrid MG X MP
hy_sur_25 <- rnorm(n=50, mean = 0.52, sd = 0.041)
hy_sur_18 <- rnorm(n=50, mean = 0.75, sd = 0.076)
# combine Atomic vectors so they can go into DF
surv_rate <- c(mp_sur_18, mg_sur_18, hy_sur_18, mp_sur_25, mg_sur_25, hy_sur_25)
temp_treatment <- c(rep(18, 150), rep(25, 150))
species <- c(rep("MP", 50), rep("MG", 50), rep("HY", 50),rep("MP", 50), rep("MG", 50), rep("HY", 50))
I combined my simulated vectors into a data frame. A major troubleshooting factor I needed to consider was that temperature is being input into the data frame as two numbers, but these are factors for the purpose of the data analysis. Up front, it is important to define temperature as a factor.
dat <- data.frame(index=1:length(surv_rate),
surv_rate=surv_rate,
temp=temp_treatment,
species=species)
dat$temp<-as.factor(dat$temp)
# Define temperature as.factor for later analysis. Otherwise, it will be treated as a continuous variable, which isn't useful for this analysis.
# check out structure of data frame and make sure it's how I expect
head(dat)
## index surv_rate temp species
## 1 1 0.6776006 18 MP
## 2 2 0.6186098 18 MP
## 3 3 0.5923216 18 MP
## 4 4 0.6837407 18 MP
## 5 5 0.7113084 18 MP
## 6 6 0.6050999 18 MP
str(dat)
## 'data.frame': 300 obs. of 4 variables:
## $ index : int 1 2 3 4 5 6 7 8 9 10 ...
## $ surv_rate: num 0.678 0.619 0.592 0.684 0.711 ...
## $ temp : Factor w/ 2 levels "18","25": 1 1 1 1 1 1 1 1 1 1 ...
## $ species : chr "MP" "MP" "MP" "MP" ...
Given that my data is separated into distinct groupings (species) and we are interested in evaluating the effects of two temperatures, the best way to analyze this data is using an ANOVA. I ran an anova to look at survival rate as an effect of temperature, as an effect of species, and as an effect of temperatuer and species.
# Run ANOVA
ANOmodel <- aov(surv_rate~temp + species + temp:species,data=dat)
# Check out ANOVA
print(ANOmodel)
## Call:
## aov(formula = surv_rate ~ temp + species + temp:species, data = dat)
##
## Terms:
## temp species temp:species Residuals
## Sum of Squares 3.963180 1.859141 0.268414 1.124818
## Deg. of Freedom 1 2 2 294
##
## Residual standard error: 0.06185395
## Estimated effects may be unbalanced
print(summary(ANOmodel))
## Df Sum Sq Mean Sq F value Pr(>F)
## temp 1 3.963 3.963 1035.88 < 2e-16 ***
## species 2 1.859 0.930 242.97 < 2e-16 ***
## temp:species 2 0.268 0.134 35.08 2.18e-14 ***
## Residuals 294 1.125 0.004
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
z <- summary(ANOmodel)
str(z)
## List of 1
## $ :Classes 'anova' and 'data.frame': 4 obs. of 5 variables:
## ..$ Df : num [1:4] 1 2 2 294
## ..$ Sum Sq : num [1:4] 3.963 1.859 0.268 1.125
## ..$ Mean Sq: num [1:4] 3.96318 0.92957 0.13421 0.00383
## ..$ F value: num [1:4] 1035.9 243 35.1 NA
## ..$ Pr(>F) : num [1:4] 2.33e-98 5.18e-63 2.18e-14 NA
## - attr(*, "class")= chr [1:2] "summary.aov" "listof"
I used a boxplot in ggplot to visualize my data. This makes sense because it allows comparison of different survival rates among the different species contextualized by temperature. It allows the viewer to discern these relationships between categorical data.
boxplot <-ggplot(data=dat) +
aes(x=temp, y=surv_rate, fill=species) +
geom_boxplot() +
xlab("Temperature (°C)") +
ylab("Survival Rate")
boxplot + labs(fill = "Species") + theme(panel.background = element_blank())
For this manipulation, I will randomly generate 6 means with a uniform distribution and input those for my mean values to see how this alters data understanding.
I set a seed so that my results, though random, will be consistently the same upon replication of this document. By chance and as an effect of the standard deviations, there were still statistical differences in my data. This is interesting. When I reduce the size of the difference in means to between 0.4 and 0.75, I no longer see a statistically significant relationship with temperature.
# Randombly generated means
set.seed(04199968)
mean <- runif(n=6, min = 0.4, max = 0.75)
# simulated MG
mg_sur_25 <- rnorm(n = 50, mean = mean[1], sd = 0.048)
mg_sur_18 <- rnorm(n = 50, mean = mean[2], sd = 0.076)
# simulated MP
mp_sur_25 <- rnorm(n = 50, mean = mean[3], sd = 0.086)
mp_sur_18 <- rnorm(n = 50, mean = mean[4], sd = 0.053)
# simulated hybrid MG X MP
hy_sur_25 <- rnorm(n=50, mean = mean[5], sd = 0.041)
hy_sur_18 <- rnorm(n=50, mean = mean[6], sd = 0.076)
# combine Atomic vectors so they can go into DF
surv_rate <- c(mp_sur_18, mg_sur_18, hy_sur_18, mp_sur_25, mg_sur_25, hy_sur_25)
temp_treatment <- c(rep(18, 150), rep(25, 150))
species <- c(rep("MP", 50), rep("MG", 50), rep("HY", 50),rep("MP", 50), rep("MG", 50), rep("HY", 50))
dat <- data.frame(index=1:length(surv_rate),
surv_rate=surv_rate,
temp=temp_treatment,
species=species)
dat$temp<-as.factor(dat$temp)
# Run ANOVA
ANOmodel <- aov(surv_rate~temp + species + temp:species,data=dat)
# Check out ANOVA
print(ANOmodel)
## Call:
## aov(formula = surv_rate ~ temp + species + temp:species, data = dat)
##
## Terms:
## temp species temp:species Residuals
## Sum of Squares 0.0055134 2.4945718 0.6737904 1.3440424
## Deg. of Freedom 1 2 2 294
##
## Residual standard error: 0.06761341
## Estimated effects may be unbalanced
print(summary(ANOmodel))
## Df Sum Sq Mean Sq F value Pr(>F)
## temp 1 0.0055 0.0055 1.206 0.273
## species 2 2.4946 1.2473 272.835 <2e-16 ***
## temp:species 2 0.6738 0.3369 73.694 <2e-16 ***
## Residuals 294 1.3440 0.0046
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
z <- summary(ANOmodel)
str(z)
## List of 1
## $ :Classes 'anova' and 'data.frame': 4 obs. of 5 variables:
## ..$ Df : num [1:4] 1 2 2 294
## ..$ Sum Sq : num [1:4] 0.00551 2.49457 0.67379 1.34404
## ..$ Mean Sq: num [1:4] 0.00551 1.24729 0.3369 0.00457
## ..$ F value: num [1:4] 1.21 272.84 73.69 NA
## ..$ Pr(>F) : num [1:4] 2.73e-01 1.01e-67 1.14e-26 NA
## - attr(*, "class")= chr [1:2] "summary.aov" "listof"
boxplot <-ggplot(data=dat) +
aes(x=temp, y=surv_rate, fill=species) +
geom_boxplot() +
xlab("Temperature (°C)") +
ylab("Survival Rate")
boxplot + labs(fill = "Species") + theme(panel.background = element_blank())
## Sample size manipulation Even when I reduce my sample size from 50 to 5, with the data I input there is a statistically significant relationship. This is likely due to my mean and SD differences.
# simulated MG
mg_sur_25 <- rnorm(n = 5, mean = 0.65, sd = 0.048)
mg_sur_18 <- rnorm(n = 5, mean = 0.85, sd = 0.076)
# simulated MP
mp_sur_25 <- rnorm(n = 5, mean = 0.37, sd = 0.086)
mp_sur_18 <- rnorm(n = 5, mean = 0.71, sd = 0.053)
# simulated hybrid MG X MP
hy_sur_25 <- rnorm(n=5, mean = 0.52, sd = 0.041)
hy_sur_18 <- rnorm(n=5, mean = 0.75, sd = 0.076)
# combine Atomic vectors so they can go into DF
surv_rate <- c(mp_sur_18, mg_sur_18, hy_sur_18, mp_sur_25, mg_sur_25, hy_sur_25)
temp_treatment <- c(rep(18, (5*6)/2), rep(25, (5*6)/2))
species <- c(rep("MP", 5), rep("MG", 5), rep("HY", 5),rep("MP", 5), rep("MG", 5), rep("HY", 5))
dat <- data.frame(index=1:length(surv_rate),
surv_rate=surv_rate,
temp=temp_treatment,
species=species)
dat$temp<-as.factor(dat$temp)
# Run ANOVA
ANOmodel <- aov(surv_rate~temp + species + temp:species,data=dat)
# Check out ANOVA
print(ANOmodel)
## Call:
## aov(formula = surv_rate ~ temp + species + temp:species, data = dat)
##
## Terms:
## temp species temp:species Residuals
## Sum of Squares 0.4495932 0.2135946 0.0560142 0.1031879
## Deg. of Freedom 1 2 2 24
##
## Residual standard error: 0.06557055
## Estimated effects may be unbalanced
print(summary(ANOmodel))
## Df Sum Sq Mean Sq F value Pr(>F)
## temp 1 0.4496 0.4496 104.569 3.17e-10 ***
## species 2 0.2136 0.1068 24.839 1.43e-06 ***
## temp:species 2 0.0560 0.0280 6.514 0.0055 **
## Residuals 24 0.1032 0.0043
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
z <- summary(ANOmodel)
str(z)
## List of 1
## $ :Classes 'anova' and 'data.frame': 4 obs. of 5 variables:
## ..$ Df : num [1:4] 1 2 2 24
## ..$ Sum Sq : num [1:4] 0.45 0.214 0.056 0.103
## ..$ Mean Sq: num [1:4] 0.4496 0.1068 0.028 0.0043
## ..$ F value: num [1:4] 104.57 24.84 6.51 NA
## ..$ Pr(>F) : num [1:4] 3.17e-10 1.43e-06 5.50e-03 NA
## - attr(*, "class")= chr [1:2] "summary.aov" "listof"
boxplot <-ggplot(data=dat) +
aes(x=temp, y=surv_rate, fill=species) +
geom_boxplot() +
xlab("Temperature (°C)") +
ylab("Survival Rate")
boxplot + labs(fill = "Species") + theme(panel.background = element_blank())