setwd("~/Desktop/Coding-Boot-Camp/Data-viz-basics")
T-tests
Getting everything set up
Set Working Directory
Change to your own working directory (WD) to save things like plots. You can do that by modifying the file path or go session (on the upper bar) –> set working directory). Working directories are important in R because they tell the computer where to look to grab information and save things like results. This can vary by project, script, etc. so it’s important to consistently have the appropriate WD. If you are unsure what your current WD is, you can use the getwd
command in the console (usually the lower left hand pane) to get your WD.
Load Packages
if (!require("pacman")) install.packages("pacman") #run this if you don't have pacman
library(pacman)
::p_load(tidyverse, ggpubr, rstatix, caret, broom, kableExtra, reactable, Hmisc, datarium, car, DT,install = T)
pacman#use pacman to load packages quickly
One of the great things about R is its ability to be super flexible. This comes from R’s ability to use different packages
. You can load packages into your current work environment
by using the library(PACKAGE)
function. It is important to note that in order to library
a package you must first have it installed. To install a package you can use the install.packages("PACKAGE")
command. You can learn more about the different types of packages hosted on the Comprehensive R Archive Network (CRAN) here! One other important thing is that some packages often have similar commands (e.g., plyr
and hmisc
both use summarize
) that are masked meaning that you will call a function and may not get the function you expect. To get around this you can use PACKAGE::FUNCTION
to call package-specific function.
For this script, and here forward, We use pacman
to load in all of our packages rather than using the iterative if (!require("PACKAGE")) install.packages("PACKAGE")
set-up. There’s still some merit to using that if loading in packages in a certain order creates issues (e.g.,tidyverse
and brms
in a certain fashion).
Get our plot aesthetics set-up
This is a super quick and easy way to style our plots without introduce a vile amount of code lines to each chunk!
= c("#3B9AB2", "#EBCC2A", "#F21A00")
palette_map = c("#ee9b00", "#bb3e03", "#005f73")
palette_condition
= theme_classic() + #
plot_aes theme(legend.position = "top",
legend.text = element_text(size = 12),
text = element_text(size = 16, family = "Futura Medium"),
axis.text = element_text(color = "black"),
axis.line = element_line(colour = "black"),
axis.ticks.y = element_blank())
Build Relevant Functions
Using stuff like summary functions allows for us to present results in a clean, organized manner. For example, we can trim superfluous information from model output when sharing with collaborators among other things.
<- function(x, na.omit=FALSE){
mystats if (na.omit)
<- x[!is.na(x)]
x <- mean(x)
m <- length(x)
n <- sd(x)
s <- sum((x-m)^3/s^3)/n
skew <- sum((x-m)^4/s^4)/n - 3
kurt return(c(n=n, mean=m, stdev=s, skew=skew, kurtosis=kurt))
}
Load data
Since we are using an existing dataset in R, we don’t need to do anything fancy here. However, when normally load in data you can use a few different approaches. In most reproducible scripts you’ll see people use nomenclature similar to df
, data
, dataframe
, etc. to denote a dataframe. If you are working with multiple datasets, it’s advisable to call stuff by a intuitive name that allows you to know what the data actually is. For example, if I am working with two different corpora (e.g., Atlantic and NYT Best-Sellers) I will probably call the Atlantic dataframe atlantic
and the NYT Best-sellers NYT
for simplicity and so I don’t accidentally write over files.
For example, if your WD is already set and the data exists within said directory you can use: df <- read_csv(MY_CSV.csv)
If the data is on something like Github
you can use: df <- read_csv('https://raw.githubusercontent.com/scm1210/Language_Lab_Repro/main/Atlantic_Cleaned_all_vars.csv') #read in the data
.
If you are working in one directory and need to call something for another directory you can do something like: Atlantic_FK <- read_csv("~/Desktop/working-with-lyle/Atlantic/Atlantic_flesch_kinkaid_scores.csv")
There are also other packages/functions that allow you to read in files with different extensions such as haven::read_sav()
to read in a file from SPSS or rjson:: fromJSON(file="data.json")
to read in a json file. If you want to learn more about how to reading in different files you can take a peek at this site.
# Load the data
data("genderweight", package = "datarium")
<- as.data.frame(genderweight)
genderweight
# Show a sample of the data by group
Brief Description
Now that we got everything set up, we are going to get into our analyses. For the first part we are going to be looking at gender differences between males and females in weight using the genderweight
dataset from the Datarium
package in R! To this quantify this, we’ll use an Independent T-test. An Independent T-test is when we have one IV with only two levels or categories (e.g., gender: Male and Female) and one DV that is continuous (e.g., weight ranging from 0 to 250). If the IV has more than two levels we need to use another type of test called ANOVA; more on that later :).
Some other things:
Two means calculated from two samples Null hypothesis: Means from two samples are similar
Alternative hypothesis: Means from two samples are different
The bigger the observed difference between sample means = More likely sample means differ (capture this with effect size).
Statistical Assumptions
Assumption are also important. That is, data need to possess certain qualities for us to be able to use this type of test. For a t-test these are:
The DV data are continuous (not ordinal or nominal).
The sample data have been randomly sampled from a population.
There is homogeneity of variance (i.e., the variability of the data in each group is similar).
The distribution is approximately normal.
Click through the tabs to see how to check each assumption.
Continuous
We can check this by looking at the structure of our data using the class
function (for one variable) or str
function (for all the variables in our dataset). We can see that weight is numeric
and therefore continuous! Therefore, we can move forward with our analyses.
class(genderweight$weight)
[1] "numeric"
str(genderweight)
'data.frame': 40 obs. of 3 variables:
$ id : Factor w/ 40 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ group : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
$ weight: num 61.6 64.6 66.2 59.3 64.9 ...
Randomly Sampled
This is something you do when you design the study–we can’t do anything in R to check this.
Homogeneity of Variance
We need to make sure the variability of the data in each group is similar. We can use something called Levene’s Test for equality of error variances to do this. If we violate this assumption (p <. 05 in our test) we will have to use a Welch’s T-test. We violate this assumption so if we were actually doing a meaningful project we would need to use a different statistical test. For the sake of brevity we’ll pretend we are ok for now.
leveneTest(genderweight$weight ~ genderweight$group)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 6.12 0.018 *
38
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The distribution is approximately normal.
To check the distribution of the data we can use density plots in the ggplot
within tidyverse
to visualize this. It’s also important to get some statistics behind this, and to do that we can look at skewness and kurtosis via the mystats
function that we wrote earlier. You can also use psych::describe
to get similar information. For skewness and kurtosis, we want values of skewness fall between − 3 and + 3, and kurtosis is appropriate from a range of − 10 to + 10
# Basic density
<- ggplot(genderweight, aes(x=weight)) +
p geom_density(color="dodgerblue4", fill="dodgerblue3", alpha=0.2) + plot_aes +
geom_vline(aes(xintercept=mean(weight)),
color="dodgerblue3", linetype="dashed", size=1)
annotate_figure(p,
top = text_grob("Density Plots for both genders", color = "black", face = "bold", size = 20),
bottom = text_grob("Verical line represents mean value."
color = "Black",
, hjust = 1.1, x = 1, face = "italic", size = 12))
We can also have the densities by gender. Looks like we should have some interesting results!
<-ggplot(genderweight, aes(x=weight, color=group, fill=group, alpha=0.1)) +
pgeom_density()+geom_vline(aes(xintercept=mean(weight)),
color="blue", linetype="dashed", size=1) + plot_aes
annotate_figure(p,
top = text_grob("Density Plots for both genders", color = "black", face = "bold", size = 20),
bottom = text_grob("Verical line represents mean value."
color = "Black",
, hjust = 1.1, x = 1, face = "italic", size = 12))
mystats(genderweight$weight)
n mean stdev skew kurtosis
40.0000 74.6624 11.7924 0.1486 -1.7419
Summary Statistics
It’s also important for us to get some summary statistics for our data (e.g., N’s, Means, SDs).
Sample Size, Means, and Standard Deviations
%>%
genderweight group_by(group) %>%
get_summary_stats(weight, type = "mean_sd") %>%
::reactable(striped = TRUE) reactable
Build a simple graph to visualize
We also visualize our data with a box plot, while overlaying the scatter plots!
# Create a box plot with jittered data points
ggplot(genderweight, aes(x = group, y = weight,color = group)) +
geom_boxplot() +
geom_jitter(width = 0.2, size = 2,alpha=0.2) +
# Add axis labels
xlab("Groups") +
ylab("Weight") +
+
plot_aes # Add plot title
ggtitle("Weight by Groups") + theme(plot.title = element_text(hjust = 0.5))
Main Analyses
Now, we’ll conduct our independent t-test to see if there are gender differences in weight, as well as getting the proportion of variance accounted for/Magnitude of the effect (Cohen’s D).
Independent T-test
We can see that we have a significant effect of gender on weight. That is, when conducting our independent t-test we observed a large effect [t(26.87) =-20.79, p < .001, d = -6.575], such that Men (M = 85.826, SD = 4.354) possessed significantly greater body weight compared to their female counterparts (M = 63.499, SD = 2.028).
<- genderweight %>%
stat.test t_test(weight ~ group) %>%
add_significance()
stat.test
# A tibble: 1 × 9
.y. group1 group2 n1 n2 statistic df p p.signif
<chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
1 weight F M 20 20 -20.8 26.9 4.30e-18 ****
Cohen’s D
From inspecting our output, we can see we have a large effect of gender
%>% cohens_d(weight ~ group, var.equal = TRUE) genderweight
# A tibble: 1 × 7
.y. group1 group2 effsize n1 n2 magnitude
* <chr> <chr> <chr> <dbl> <int> <int> <ord>
1 weight F M -6.57 20 20 large
Visualize our results using GG-Plot with stats
<- ggplot(genderweight, aes(x = group, y = weight, color = group)) +
plot geom_boxplot() +
geom_jitter(width = 0.2, size = 2) +
# Add axis labels
xlab("Gender") +
ylab("Weight") +
+
plot_aes # Add plot title
ggtitle("Weight by Gender") + theme(plot.title = element_text(hjust = 0.5))
<- stat.test %>% add_xy_position(x = "group")
stat.test <- plot + stat_pvalue_manual(stat.test, tip.length = 0) +
plot labs(subtitle = get_test_label(stat.test, detailed = TRUE))
annotate_figure(plot,
bottom = text_grob("D = -6.575"
color = "Black",
, hjust = 1.1, x = 1, face = "italic", size = 16))
High Level Example
Next we will see what this looks like at the highest level.
This part of the tutorial uses data and reproduces a subset of analyses reported in the following manuscript:
You can find this project’s github here.
Abstract
The COVID-19 pandemic sent shockwaves across the fabric of our society. Examining the impact of the pandemic on business leadership is particularly important to understanding how this event affected their decision-making. The present study documents the psychological effects of the COVID-19 pandemic on chief executive officers (CEOs). This was accomplished by analyzing CEOs’ language from quarterly earnings calls (N = 19,536) for a year before and after lockdown. CEOs had large shifts in language in the months immediately following the start of the pandemic lockdowns. Analytic thinking plummeted after the world went into lockdown, with CEOs’ language becoming less technical and more personal and intuitive. In parallel, CEOs’ language showed signs of increased cognitive load, as they were processing the effect of the pandemic on their business practices. Business leaders’ use of collective-focused language (we-usage) dropped substantially after the pandemic began, perhaps suggesting CEOs felt disconnected from their companies. Self-focused (I-usage) language increased, showing the increased preoccupation of business leaders. The size of the observed shifts in language during the pandemic also dwarfed responses to other events that occurred dating back to 2010, with the effect lasting around seven months.
Prep data
Load necessary packages and set Working Directory
if (!require("pacman")) install.packages("pacman")
::p_load(tidyverse,zoo,lubridate,plotrix,ggpubr, caret, broom, kableExtra, reactable, effsize, install = T) pacman
Define aesthetics
= c("#3B9AB2", "#EBCC2A", "#F21A00")
palette_map = c("#ee9b00", "#bb3e03", "#005f73")
palette_condition
= theme_classic() +
plot_aes theme(text = element_text(size = 16, family = "Futura Medium")) +
theme(axis.text.x=element_text(angle=45, hjust=1)) +
theme(plot.title.position = 'plot',
plot.title = element_text(hjust = 0.5, face = "bold", size = 16)) +
theme(axis.text=element_text(size=16),
axis.title=element_text(size=20,face="bold"))+
theme(plot.title.position = 'plot',
plot.title = element_text(hjust = 0.5, face = "bold", size = 20)) +
theme(axis.text=element_text(size = 14),
axis.title=element_text(size = 20,face="bold"))
Write our Table Funcions
<- function(ttest_list) {
baseline_ttest # Extract relevant information from each test and store in a data frame
<- data.frame(
ttest_df Group1 = seq(0,0,1),
Group2 = seq(1,24,1),
t = sapply(ttest_list, function(x) x$statistic),
df = sapply(ttest_list, function(x) x$parameter),
p_value = sapply(ttest_list, function(x) x$p.value)
)
# Format p-values as scientific notation
$p_value <- format(ttest_df$p_value, scientific = T)
ttest_df
# Rename columns
colnames(ttest_df) <- c("t", "t + 1 ", "t-value", "Degrees of Freedom", "p-value")
# Create table using kableExtra
kable(ttest_df, caption = "Summary of Welch's t-Tests", booktabs = TRUE) %>%
::kable_styling()
kableExtra
}
<- function(ttest_list) {
post_pandemic_summary # Extract relevant information from each test and store in a data frame
<- data.frame(
ttest_df Group1 = seq(12,23,1),
Group2 = seq(13,24,1),
t = sapply(ttest_list, function(x) x$statistic),
df = sapply(ttest_list, function(x) x$parameter),
p_value = sapply(ttest_list, function(x) x$p.value)
)
# Format p-values as scientific notation
$p_value <- format(ttest_df$p_value, scientific = T)
ttest_df
# Rename columns
colnames(ttest_df) <- c("t", "t + 1 ", "t-value", "Degrees of Freedom", "p-value")
# Create table using kableExtra
kable(ttest_df, caption = "Summary of Welch's t-Tests", booktabs = TRUE) %>%
::kable_styling()
kableExtra
}
<- function(cohen_d_list) {
baseline_cohen_d # Extract relevant information from each test and store in a data frame
<- data.frame(
cohen_d_df Group1 = seq(0,0,1),
Group2 = seq(1,24,1),
Cohen_d = sapply(cohen_d_list, function(x) x$estimate)
)
# Rename columns
colnames(cohen_d_df) <- c("t", "t + 1", "Cohen's d")
# Create table using kableExtra
kable(cohen_d_df, caption = "Summary of Cohen's D", booktabs = TRUE) %>%
::kable_styling()
kableExtra
}
<- function(cohen_d_list) {
post_cohen_d # Extract relevant information from each test and store in a data frame
<- data.frame(
cohen_d_df Group1 = seq(12,23,1),
Group2 = seq(13,24,1),
Cohen_d = sapply(cohen_d_list, function(x) x$estimate)
)
# Rename columns
colnames(cohen_d_df) <- c("t", "t+1", "Cohen's d")
# Create table using kableExtra
kable(cohen_d_df, caption = "Summary of Cohen's D", booktabs = TRUE) %>%
::kable_styling()
kableExtra
}
<- function(mean_diff_list) {
baseline_mean_diff # Extract relevant information from each mean difference calculation and store in a data frame
<- data.frame(
mean_diff_df Group1 = seq(0,0,1),
Group2 = seq(1,24,1),
mean_diff = mean_diff_list
)
# Rename columns
colnames(mean_diff_df) <- c("t", "t+1", "Mean Difference")
# Create table using kableExtra
kable(mean_diff_df, caption = "Summary of Mean Differences", booktabs = TRUE) %>%
::kable_styling()
kableExtra
}
<- function(mean_diff_list) {
post_mean_diff # Extract relevant information from each mean difference calculation and store in a data frame
<- data.frame(
mean_diff_df Group1 = seq(12,23,1),
Group2 = seq(13,24,1),
mean_diff = mean_diff_list
)
# Rename columns
colnames(mean_diff_df) <- c("t", "t+1", "Mean Difference")
# Create table using kableExtra
kable(mean_diff_df, caption = "Summary of Mean Differences", booktabs = TRUE) %>%
::kable_styling()
kableExtra }
Load in the Data
<- read_csv("https://raw.githubusercontent.com/scm1210/Summer-Coding/main/data/Big_CEO.csv") #read in the data from github
data
<- data["2019-03-01"<= data$Date & data$Date <= "2021-04-01",] #subsetting covid dates
data
<- data %>% filter(WC<=5400) %>% #filter out based on our exclusion criteria
data filter(WC>=25)
$month_year <- format(as.Date(data$Date), "%Y-%m") #reformat
data
<- data %>% dplyr::select(Date, Speaker, Analytic, cogproc,allnone,we,i,emo_anx) %>%
data_tidy mutate(Date = lubridate::ymd(Date),
time_month = as.numeric(Date - ymd("2019-03-01")) / 30, #centering at start of march
time_month_quad = time_month * time_month) #making our quadratic term
$Date_off <- floor(data_tidy$time_month) #rounding off dates to whole months using ceiling function (0 = 2019-03, 24 = 2021-04)
data_tidy$Date_covid <- as.factor(data_tidy$Date_off) #factorize data_tidy
Create Tidy Data for Graphs
<- read_csv("https://raw.githubusercontent.com/scm1210/Language_Lab_Repro/main/Big_CEO.csv")#put code here to read in Big CEO data
df <- df %>% filter(WC<=5400) %>%
df filter(WC>=25)
$month_year <- format(as.Date(df$Date), "%Y-%m") ###extracting month and year to build fiscal quarter graphs, need a new variable bc if not it'll give us issues
df
<- df %>%#converting our dates to quarterly dates
df2 group_by(month_year) %>% ###grouping by the Top100 tag and date
summarise_at(vars("Date","WC","Analytic","cogproc",'we','i'), funs(mean, std.error),) #pulling the means and SEs for our variables of interest
<- df2["2019-01"<= df2$month_year & df2$month_year <= "2021-03",] #covid dates df2
Write our Stats Functions
We were interested in how language changed relative to baseline one year pre-pandemic, as well as how language changed after the Pandemic.
As a result we ran two separate set of analyses comparing t(time zero) to t[i] and t(12 months after our centered data point) to t + 1. The groups you see will be centered on 03/2019. That is, 12 = 03/2020, 13 = 04/2020, etc. etc.
Analytic Thinking
= function(fac1, fac2){
analytic_my.t t.test(data_tidy$Analytic[data_tidy$Date_covid==fac1],
$Analytic[data_tidy$Date_covid==fac2])
data_tidy#writing our t-test function to compare t to t[i]
}
= function(fac1, fac2){
analytic_my.d cohen.d(data_tidy$Analytic[data_tidy$Date_covid==fac1],
$Analytic[data_tidy$Date_covid==fac2])
data_tidy#function for cohen's d
}
<- function(fac1, fac2){
analytic_mean mean(data_tidy$Analytic[data_tidy$Date_covid==fac1])-
mean(data_tidy$Analytic[data_tidy$Date_covid==fac2])
#function to do mean differences }
Cognitive Processing
= function(fac1, fac2){
cogproc_my.t t.test(data_tidy$cogproc[data_tidy$Date_covid==fac1],
$cogproc[data_tidy$Date_covid==fac2])
data_tidy#writing our t-test function to compare t to t[i]
}
= function(fac1, fac2){
cogproc_my.d cohen.d(data_tidy$cogproc[data_tidy$Date_covid==fac1],
$cogproc[data_tidy$Date_covid==fac2])
data_tidy#function for cohen's d
}
<- function(fac1, fac2){
cogproc_mean mean(data_tidy$cogproc[data_tidy$Date_covid==fac1])-
mean(data_tidy$cogproc[data_tidy$Date_covid==fac2])
#function to do mean differences }
I-words
= function(fac1, fac2){
i_my.t t.test(data_tidy$i[data_tidy$Date_covid==fac1],
$i[data_tidy$Date_covid==fac2])
data_tidy#writing our t-test function to compare t to t + 1
}
= function(fac1, fac2){
i_my.d cohen.d(data_tidy$i[data_tidy$Date_covid==fac1],
$i[data_tidy$Date_covid==fac2])
data_tidy#function for cohen's d
}
<- function(fac1, fac2){
i_mean mean(data_tidy$i[data_tidy$Date_covid==fac1])-
mean(data_tidy$i[data_tidy$Date_covid==fac2])
#function to do mean differences }
We-words
= function(fac1, fac2){
we_my.t t.test(data_tidy$we[data_tidy$Date_covid==fac1],
$we[data_tidy$Date_covid==fac2])
data_tidy
}
= function(fac1, fac2){
we_my.d cohen.d(data_tidy$we[data_tidy$Date_covid==fac1],
$we[data_tidy$Date_covid==fac2])
data_tidy#function for cohen's d
}
<- function(fac1, fac2){
we_mean mean(data_tidy$we[data_tidy$Date_covid==fac1])-
mean(data_tidy$we[data_tidy$Date_covid==fac2])
#function to do mean differences }
Tidy data
Data transformations
- None
Exclusions
- Excluded texts that were shorter than ** 25 words ** and greater than ** 5,400 words **!
Summary of the Data
Range of Dates
range(data$Date)
[1] "2019-03-01" "2021-04-01"
Number of Speakers
<- data %>%
speakers select(Speaker) %>%
unique() %>%
::summarize(n = n()) %>%
dplyr::datatable()
DT speakers
Number of Transcripts
<- data %>%
transcripts select(1) %>%
::summarize(n = n()) %>%
dplyr::datatable()
DT
transcripts
Mean Word Count
<- data %>%
word_count select(WC) %>%
::summarize(mean = mean(WC)) %>%
dplyr::datatable()
DT word_count
How did language change after the Pandemic?
Analytic Thinking
T-test
<- mapply(analytic_my.t,seq(12,23,1), seq(13,24,1),SIMPLIFY=F) #compare t (first parantheses) to t[i] (second parentheses)increasing by 1
analytic_ttestpost_pandemic_summary(analytic_ttest)
t | t + 1 | t-value | Degrees of Freedom | p-value |
---|---|---|---|---|
12 | 13 | 5.0849 | 525.79 | 5.124e-07 |
13 | 14 | -2.5948 | 373.06 | 9.839e-03 |
14 | 15 | -1.6726 | 252.04 | 9.565e-02 |
15 | 16 | 1.9242 | 377.62 | 5.508e-02 |
16 | 17 | -2.2122 | 200.57 | 2.808e-02 |
17 | 18 | -1.6872 | 218.93 | 9.298e-02 |
18 | 19 | 0.6199 | 262.61 | 5.358e-01 |
19 | 20 | 0.8738 | 128.22 | 3.839e-01 |
20 | 21 | -1.5398 | 230.76 | 1.250e-01 |
21 | 22 | 1.9533 | 94.32 | 5.374e-02 |
22 | 23 | -1.1498 | 55.55 | 2.552e-01 |
23 | 24 | -1.7179 | 2141.37 | 8.596e-02 |
Cohen’s D
<- mapply(analytic_my.d,seq(12,23,1), seq(13,24,1),SIMPLIFY=FALSE)
analytic_d post_cohen_d(analytic_d)
t | t+1 | Cohen's d |
---|---|---|
12 | 13 | 0.3275 |
13 | 14 | -0.1598 |
14 | 15 | -0.1320 |
15 | 16 | 0.1936 |
16 | 17 | -0.1617 |
17 | 18 | -0.1481 |
18 | 19 | 0.0710 |
19 | 20 | 0.0899 |
20 | 21 | -0.1246 |
21 | 22 | 0.2682 |
22 | 23 | -0.1598 |
23 | 24 | -0.0739 |
Mean Differences
<- mapply(analytic_mean, seq(12,23,1), seq(13,24,1)) #across all of the months comparing to time zero
analytic_meandiff post_mean_diff(analytic_meandiff)
t | t+1 | Mean Difference |
---|---|---|
12 | 13 | 4.7346 |
13 | 14 | -2.1905 |
14 | 15 | -1.8443 |
15 | 16 | 2.7483 |
16 | 17 | -2.2318 |
17 | 18 | -2.1013 |
18 | 19 | 1.1589 |
19 | 20 | 1.2765 |
20 | 21 | -1.7791 |
21 | 22 | 4.0651 |
22 | 23 | -2.0756 |
23 | 24 | -0.9941 |
Cogproc
T-test
<-mapply(cogproc_my.t, seq(12,23,1), seq(13,24,1),SIMPLIFY=FALSE) #compare t (first parathese) to t[i] (second parantheses) increasing by 1
cogproc_ttest post_pandemic_summary(cogproc_ttest)
t | t + 1 | t-value | Degrees of Freedom | p-value |
---|---|---|---|---|
12 | 13 | -4.3161 | 534.57 | 1.893e-05 |
13 | 14 | 1.4046 | 366.54 | 1.610e-01 |
14 | 15 | 4.0193 | 257.87 | 7.665e-05 |
15 | 16 | -3.1317 | 367.30 | 1.877e-03 |
16 | 17 | 0.9868 | 199.24 | 3.249e-01 |
17 | 18 | 4.1804 | 223.61 | 4.178e-05 |
18 | 19 | -1.1984 | 285.88 | 2.318e-01 |
19 | 20 | -1.4930 | 133.62 | 1.378e-01 |
20 | 21 | 3.2109 | 234.85 | 1.508e-03 |
21 | 22 | -1.7045 | 87.35 | 9.183e-02 |
22 | 23 | 0.9968 | 55.38 | 3.232e-01 |
23 | 24 | -0.9994 | 2145.13 | 3.177e-01 |
Cohen’s D
<-mapply(cogproc_my.d, seq(12,23,1), seq(13,24,1),SIMPLIFY=FALSE)
cogproc_d post_cohen_d(cogproc_d)
t | t+1 | Cohen's d |
---|---|---|
12 | 13 | -0.2755 |
13 | 14 | 0.0887 |
14 | 15 | 0.3007 |
15 | 16 | -0.3205 |
16 | 17 | 0.0733 |
17 | 18 | 0.3436 |
18 | 19 | -0.1329 |
19 | 20 | -0.1294 |
20 | 21 | 0.2477 |
21 | 22 | -0.2453 |
22 | 23 | 0.1405 |
23 | 24 | -0.0430 |
Mean Differences
<- mapply(cogproc_mean, seq(12,23,1), seq(13,24,1)) # comparing time zero [3/2019]across all of the months
cogproc_meandiff post_mean_diff(cogproc_meandiff)
t | t+1 | Mean Difference |
---|---|---|
12 | 13 | -0.6107 |
13 | 14 | 0.1785 |
14 | 15 | 0.6095 |
15 | 16 | -0.6540 |
16 | 17 | 0.1560 |
17 | 18 | 0.7442 |
18 | 19 | -0.2962 |
19 | 20 | -0.2746 |
20 | 21 | 0.5305 |
21 | 22 | -0.5358 |
22 | 23 | 0.2776 |
23 | 24 | -0.0887 |
I-words
T-test
<- mapply(i_my.t, seq(12,23,1), seq(13,24,1),SIMPLIFY=FALSE) #compare t (first paratheses) to t[i] (second parentheses) increasing by 1
i_ttest post_pandemic_summary(i_ttest)
t | t + 1 | t-value | Degrees of Freedom | p-value |
---|---|---|---|---|
12 | 13 | -5.1026 | 477.85 | 4.842e-07 |
13 | 14 | 2.9683 | 362.97 | 3.194e-03 |
14 | 15 | 2.7352 | 261.20 | 6.661e-03 |
15 | 16 | -3.5895 | 336.98 | 3.805e-04 |
16 | 17 | 1.7614 | 191.52 | 7.976e-02 |
17 | 18 | 3.4394 | 240.73 | 6.870e-04 |
18 | 19 | -2.6019 | 255.11 | 9.813e-03 |
19 | 20 | 0.4503 | 134.91 | 6.532e-01 |
20 | 21 | 1.5059 | 248.77 | 1.334e-01 |
21 | 22 | 2.0159 | 84.28 | 4.700e-02 |
22 | 23 | -3.8068 | 57.56 | 3.437e-04 |
23 | 24 | 4.4095 | 2135.84 | 1.088e-05 |
Cohen’s D
<- mapply(i_my.d,seq(12,23,1), seq(13,24,1),SIMPLIFY=FALSE)
i_d post_cohen_d(i_d)
t | t+1 | Cohen's d |
---|---|---|
12 | 13 | -0.3468 |
13 | 14 | 0.1902 |
14 | 15 | 0.1991 |
15 | 16 | -0.3758 |
16 | 17 | 0.1452 |
17 | 18 | 0.2370 |
18 | 19 | -0.3007 |
19 | 20 | 0.0378 |
20 | 21 | 0.1020 |
21 | 22 | 0.2972 |
22 | 23 | -0.4622 |
23 | 24 | 0.1900 |
Mean Differences
<- mapply(i_mean,seq(12,23,1), seq(13,24,1)) # comparing time zero [3/2020]across all of the months
i_meandiff post_mean_diff(i_meandiff)
t | t+1 | Mean Difference |
---|---|---|
12 | 13 | -0.2878 |
13 | 14 | 0.1551 |
14 | 15 | 0.1625 |
15 | 16 | -0.3242 |
16 | 17 | 0.1289 |
17 | 18 | 0.2083 |
18 | 19 | -0.2364 |
19 | 20 | 0.0329 |
20 | 21 | 0.0886 |
21 | 22 | 0.2293 |
22 | 23 | -0.3912 |
23 | 24 | 0.1657 |
We-words
T-test
<- mapply(we_my.t, seq(12,23,1), seq(13,24,1),SIMPLIFY=FALSE) #compare t (first parathese) to t[i] (second parantheses) increasing by 1
we_ttest post_pandemic_summary(we_ttest)
t | t + 1 | t-value | Degrees of Freedom | p-value |
---|---|---|---|---|
12 | 13 | 4.1038 | 527.08 | 4.709e-05 |
13 | 14 | 0.9117 | 378.82 | 3.625e-01 |
14 | 15 | -3.3226 | 253.14 | 1.023e-03 |
15 | 16 | 2.4647 | 373.96 | 1.416e-02 |
16 | 17 | -0.3375 | 197.52 | 7.361e-01 |
17 | 18 | -4.2759 | 229.50 | 2.794e-05 |
18 | 19 | 2.5510 | 262.60 | 1.131e-02 |
19 | 20 | -0.1422 | 131.79 | 8.871e-01 |
20 | 21 | -1.9395 | 238.21 | 5.362e-02 |
21 | 22 | -0.2952 | 84.06 | 7.685e-01 |
22 | 23 | 0.8557 | 55.76 | 3.958e-01 |
23 | 24 | -0.3495 | 2137.77 | 7.267e-01 |
Cohen’s D
<- mapply(we_my.d, seq(12,23,1), seq(13,24,1),SIMPLIFY=FALSE)
we_d post_cohen_d(we_d)
t | t+1 | Cohen's d |
---|---|---|
12 | 13 | 0.2639 |
13 | 14 | 0.0550 |
14 | 15 | -0.2595 |
15 | 16 | 0.2501 |
16 | 17 | -0.0256 |
17 | 18 | -0.3276 |
18 | 19 | 0.2920 |
19 | 20 | -0.0130 |
20 | 21 | -0.1444 |
21 | 22 | -0.0436 |
22 | 23 | 0.1170 |
23 | 24 | -0.0151 |
Mean Differences
<- mapply(we_mean, seq(12,23,1), seq(13,24,1)) # comparing time zero [3/2020]across all of the months
we_meandiff post_mean_diff(we_meandiff)
t | t+1 | Mean Difference |
---|---|---|
12 | 13 | 0.3778 |
13 | 14 | 0.0763 |
14 | 15 | -0.3676 |
15 | 16 | 0.3649 |
16 | 17 | -0.0365 |
17 | 18 | -0.4711 |
18 | 19 | 0.4169 |
19 | 20 | -0.0183 |
20 | 21 | -0.2042 |
21 | 22 | -0.0609 |
22 | 23 | 0.1583 |
23 | 24 | -0.0210 |
How did language change relative to baseline (one year before the pandemic; 03/2019)?
Analytic Thining
T-test
<-mapply(analytic_my.t,0, seq(1,24,1),SIMPLIFY=FALSE) #compare t (first parantheses) to t[i] (second parentheses)increasing by 1
analytic_ttest_baseline baseline_ttest(analytic_ttest_baseline)
t | t + 1 | t-value | Degrees of Freedom | p-value |
---|---|---|---|---|
0 | 1 | 1.5025 | 1161.5 | 1.332e-01 |
0 | 2 | 0.6860 | 1036.8 | 4.929e-01 |
0 | 3 | 0.2508 | 245.1 | 8.022e-01 |
0 | 4 | 2.6728 | 1120.1 | 7.631e-03 |
0 | 5 | 0.4785 | 1004.8 | 6.324e-01 |
0 | 6 | 1.0343 | 280.4 | 3.019e-01 |
0 | 7 | 2.6675 | 1049.9 | 7.760e-03 |
0 | 8 | 1.4046 | 993.4 | 1.605e-01 |
0 | 9 | 1.0147 | 328.1 | 3.110e-01 |
0 | 10 | 1.5505 | 286.2 | 1.221e-01 |
0 | 11 | 1.9738 | 1061.6 | 4.867e-02 |
0 | 12 | 1.3054 | 1272.1 | 1.920e-01 |
0 | 13 | 5.7770 | 623.9 | 1.201e-08 |
0 | 14 | 5.1516 | 929.5 | 3.153e-07 |
0 | 15 | 1.4219 | 370.2 | 1.559e-01 |
0 | 16 | 3.9258 | 316.9 | 1.061e-04 |
0 | 17 | 3.2572 | 918.1 | 1.166e-03 |
0 | 18 | 0.1171 | 302.2 | 9.068e-01 |
0 | 19 | 0.8463 | 164.4 | 3.986e-01 |
0 | 20 | 3.7364 | 920.4 | 1.981e-04 |
0 | 21 | 0.6393 | 331.8 | 5.231e-01 |
0 | 22 | 2.6168 | 63.2 | 1.109e-02 |
0 | 23 | 3.7687 | 1112.0 | 1.727e-04 |
0 | 24 | 2.4326 | 1125.2 | 1.515e-02 |
Cohen’s D
<- mapply(analytic_my.d,0, seq(1,24,1),SIMPLIFY=FALSE)
analytic_D_baseline baseline_cohen_d(analytic_D_baseline)
t | t + 1 | Cohen's d |
---|---|---|
0 | 1 | 0.0880 |
0 | 2 | 0.0330 |
0 | 3 | 0.0206 |
0 | 4 | 0.1587 |
0 | 5 | 0.0235 |
0 | 6 | 0.0867 |
0 | 7 | 0.1621 |
0 | 8 | 0.0687 |
0 | 9 | 0.0806 |
0 | 10 | 0.1283 |
0 | 11 | 0.1024 |
0 | 12 | 0.0694 |
0 | 13 | 0.3954 |
0 | 14 | 0.2534 |
0 | 15 | 0.1138 |
0 | 16 | 0.3057 |
0 | 17 | 0.1588 |
0 | 18 | 0.0102 |
0 | 19 | 0.0861 |
0 | 20 | 0.1803 |
0 | 21 | 0.0530 |
0 | 22 | 0.3237 |
0 | 23 | 0.2019 |
0 | 24 | 0.1263 |
Mean Differences
<- mapply(analytic_mean, 0, seq(1,24,1)) #across all of the months comparing to time zero
analytic_mean_baseline baseline_mean_diff(analytic_mean_baseline)
t | t+1 | Mean Difference |
---|---|---|
0 | 1 | 1.3114 |
0 | 2 | 0.4935 |
0 | 3 | 0.3040 |
0 | 4 | 2.3251 |
0 | 5 | 0.3412 |
0 | 6 | 1.3028 |
0 | 7 | 2.3954 |
0 | 8 | 0.9976 |
0 | 9 | 1.1987 |
0 | 10 | 1.9189 |
0 | 11 | 1.4369 |
0 | 12 | 1.0438 |
0 | 13 | 5.7785 |
0 | 14 | 3.5880 |
0 | 15 | 1.7437 |
0 | 16 | 4.4920 |
0 | 17 | 2.2602 |
0 | 18 | 0.1590 |
0 | 19 | 1.3178 |
0 | 20 | 2.5943 |
0 | 21 | 0.8152 |
0 | 22 | 4.8803 |
0 | 23 | 2.8046 |
0 | 24 | 1.8106 |
Cogproc
T-test
<- mapply(cogproc_my.t, 0, seq(1,24,1),SIMPLIFY=FALSE) #compare t (first parathese) to t[i] (second parantheses) increasing by 1
cogproc_ttest_baseline baseline_ttest(cogproc_ttest_baseline)
t | t + 1 | t-value | Degrees of Freedom | p-value |
---|---|---|---|---|
0 | 1 | -0.5097 | 1156.51 | 6.103e-01 |
0 | 2 | -0.7179 | 1035.97 | 4.730e-01 |
0 | 3 | -0.2391 | 218.72 | 8.112e-01 |
0 | 4 | -1.8417 | 1119.70 | 6.579e-02 |
0 | 5 | -0.3764 | 1051.94 | 7.067e-01 |
0 | 6 | 0.2442 | 282.79 | 8.072e-01 |
0 | 7 | -1.7142 | 1029.21 | 8.680e-02 |
0 | 8 | -0.9538 | 1076.64 | 3.404e-01 |
0 | 9 | 1.0446 | 320.31 | 2.970e-01 |
0 | 10 | -0.8169 | 255.26 | 4.148e-01 |
0 | 11 | -0.7245 | 1147.57 | 4.689e-01 |
0 | 12 | -2.0280 | 1307.90 | 4.276e-02 |
0 | 13 | -5.7012 | 609.25 | 1.855e-08 |
0 | 14 | -6.5911 | 924.04 | 7.329e-11 |
0 | 15 | -0.3856 | 395.99 | 7.000e-01 |
0 | 16 | -4.0812 | 298.22 | 5.758e-05 |
0 | 17 | -5.4650 | 949.00 | 5.916e-08 |
0 | 18 | 0.9265 | 310.67 | 3.549e-01 |
0 | 19 | -0.5797 | 184.74 | 5.628e-01 |
0 | 20 | -3.7994 | 936.81 | 1.544e-04 |
0 | 21 | 0.7639 | 341.61 | 4.455e-01 |
0 | 22 | -1.3820 | 61.97 | 1.719e-01 |
0 | 23 | -1.0691 | 1140.02 | 2.853e-01 |
0 | 24 | -1.8593 | 1172.33 | 6.323e-02 |
Cohen’s D
<- mapply(cogproc_my.d, 0, seq(1,24,1),SIMPLIFY=FALSE)
cogproc_D_baseline baseline_cohen_d(cogproc_D_baseline)
t | t + 1 | Cohen's d |
---|---|---|
0 | 1 | -0.0299 |
0 | 2 | -0.0345 |
0 | 3 | -0.0213 |
0 | 4 | -0.1094 |
0 | 5 | -0.0180 |
0 | 6 | 0.0204 |
0 | 7 | -0.1048 |
0 | 8 | -0.0446 |
0 | 9 | 0.0841 |
0 | 10 | -0.0732 |
0 | 11 | -0.0364 |
0 | 12 | -0.1070 |
0 | 13 | -0.3939 |
0 | 14 | -0.3256 |
0 | 15 | -0.0298 |
0 | 16 | -0.3292 |
0 | 17 | -0.2601 |
0 | 18 | 0.0789 |
0 | 19 | -0.0527 |
0 | 20 | -0.1809 |
0 | 21 | 0.0622 |
0 | 22 | -0.1778 |
0 | 23 | -0.0568 |
0 | 24 | -0.0951 |
Mean Differences
<- mapply(cogproc_mean, 0, seq(1,24,1)) # comparing time zero [3/2020]across all of the months
cogproc_mean_baseline baseline_mean_diff(cogproc_meandiff)
t | t+1 | Mean Difference |
---|---|---|
0 | 1 | -0.6107 |
0 | 2 | 0.1785 |
0 | 3 | 0.6095 |
0 | 4 | -0.6540 |
0 | 5 | 0.1560 |
0 | 6 | 0.7442 |
0 | 7 | -0.2962 |
0 | 8 | -0.2746 |
0 | 9 | 0.5305 |
0 | 10 | -0.5358 |
0 | 11 | 0.2776 |
0 | 12 | -0.0887 |
0 | 13 | -0.6107 |
0 | 14 | 0.1785 |
0 | 15 | 0.6095 |
0 | 16 | -0.6540 |
0 | 17 | 0.1560 |
0 | 18 | 0.7442 |
0 | 19 | -0.2962 |
0 | 20 | -0.2746 |
0 | 21 | 0.5305 |
0 | 22 | -0.5358 |
0 | 23 | 0.2776 |
0 | 24 | -0.0887 |
I-words
T-test
<- mapply(i_my.t, 0, seq(1,24,1),SIMPLIFY=FALSE) #compare t (first paratheseses) to t[i] (second parentheses) increasing by 1
i_ttest_baseline baseline_ttest(i_ttest_baseline)
t | t + 1 | t-value | Degrees of Freedom | p-value |
---|---|---|---|---|
0 | 1 | -3.3450 | 1143.82 | 8.495e-04 |
0 | 2 | -1.1963 | 1155.18 | 2.318e-01 |
0 | 3 | -0.1911 | 213.55 | 8.486e-01 |
0 | 4 | -4.1439 | 1114.31 | 3.672e-05 |
0 | 5 | -0.6477 | 1056.56 | 5.173e-01 |
0 | 6 | -1.6111 | 278.03 | 1.083e-01 |
0 | 7 | -3.3533 | 1035.23 | 8.274e-04 |
0 | 8 | -2.0582 | 1066.96 | 3.981e-02 |
0 | 9 | -1.4168 | 265.19 | 1.577e-01 |
0 | 10 | -2.7747 | 284.30 | 5.891e-03 |
0 | 11 | -1.9849 | 1154.30 | 4.739e-02 |
0 | 12 | -0.3320 | 1263.50 | 7.399e-01 |
0 | 13 | -5.0280 | 571.49 | 6.644e-07 |
0 | 14 | -3.7093 | 958.88 | 2.198e-04 |
0 | 15 | 0.2214 | 390.58 | 8.249e-01 |
0 | 16 | -3.9255 | 253.44 | 1.116e-04 |
0 | 17 | -4.4733 | 1005.42 | 8.580e-06 |
0 | 18 | 0.4135 | 350.62 | 6.795e-01 |
0 | 19 | -2.6460 | 180.60 | 8.864e-03 |
0 | 20 | -4.3779 | 986.11 | 1.326e-05 |
0 | 21 | -1.3222 | 371.13 | 1.869e-01 |
0 | 22 | 1.3508 | 63.34 | 1.816e-01 |
0 | 23 | -5.6223 | 1250.84 | 2.322e-08 |
0 | 24 | -1.8931 | 1254.80 | 5.858e-02 |
Cohen’s D
<- mapply(i_my.d, 0, seq(1,24,1),SIMPLIFY=FALSE)
i_D_baseline baseline_cohen_d(i_D_baseline)
t | t + 1 | Cohen's d |
---|---|---|
0 | 1 | -0.1966 |
0 | 2 | -0.0544 |
0 | 3 | -0.0174 |
0 | 4 | -0.2467 |
0 | 5 | -0.0310 |
0 | 6 | -0.1358 |
0 | 7 | -0.2047 |
0 | 8 | -0.0967 |
0 | 9 | -0.1296 |
0 | 10 | -0.2305 |
0 | 11 | -0.0996 |
0 | 12 | -0.0177 |
0 | 13 | -0.3562 |
0 | 14 | -0.1786 |
0 | 15 | 0.0172 |
0 | 16 | -0.3537 |
0 | 17 | -0.2047 |
0 | 18 | 0.0327 |
0 | 19 | -0.2453 |
0 | 20 | -0.2011 |
0 | 21 | -0.1028 |
0 | 22 | 0.1664 |
0 | 23 | -0.2904 |
0 | 24 | -0.0945 |
Mean Differences
<- mapply(i_mean, 0, seq(1,24,1)) # comparing time zero [3/2020]across all of the months
i_mean_baseline baseline_mean_diff(i_mean_baseline)
t | t+1 | Mean Difference |
---|---|---|
0 | 1 | -0.1748 |
0 | 2 | -0.0504 |
0 | 3 | -0.0149 |
0 | 4 | -0.2082 |
0 | 5 | -0.0266 |
0 | 6 | -0.1159 |
0 | 7 | -0.1744 |
0 | 8 | -0.0846 |
0 | 9 | -0.1162 |
0 | 10 | -0.1958 |
0 | 11 | -0.0843 |
0 | 12 | -0.0150 |
0 | 13 | -0.3028 |
0 | 14 | -0.1477 |
0 | 15 | 0.0147 |
0 | 16 | -0.3094 |
0 | 17 | -0.1805 |
0 | 18 | 0.0278 |
0 | 19 | -0.2086 |
0 | 20 | -0.1757 |
0 | 21 | -0.0871 |
0 | 22 | 0.1422 |
0 | 23 | -0.2490 |
0 | 24 | -0.0833 |
We-words
T-test
<- mapply(we_my.t, 0, seq(1,24,1),SIMPLIFY=FALSE) #compare t (first parathese) to t[i] (second parantheses) increasing by 1
we_ttest_baseline baseline_ttest(we_ttest_baseline)
t | t + 1 | t-value | Degrees of Freedom | p-value |
---|---|---|---|---|
0 | 1 | 0.5718 | 1161.88 | 5.676e-01 |
0 | 2 | 1.5919 | 1008.45 | 1.117e-01 |
0 | 3 | -1.0685 | 214.75 | 2.865e-01 |
0 | 4 | 0.6154 | 1116.23 | 5.384e-01 |
0 | 5 | 0.9396 | 979.10 | 3.476e-01 |
0 | 6 | -1.1796 | 280.32 | 2.392e-01 |
0 | 7 | -0.2036 | 1067.88 | 8.387e-01 |
0 | 8 | 0.6497 | 972.54 | 5.160e-01 |
0 | 9 | -0.6307 | 351.29 | 5.286e-01 |
0 | 10 | -0.9677 | 309.04 | 3.340e-01 |
0 | 11 | -0.9268 | 1073.79 | 3.543e-01 |
0 | 12 | -0.4002 | 1197.17 | 6.891e-01 |
0 | 13 | 3.3604 | 676.59 | 8.220e-04 |
0 | 14 | 5.6601 | 890.34 | 2.040e-08 |
0 | 15 | 0.4232 | 395.82 | 6.724e-01 |
0 | 16 | 3.3898 | 317.82 | 7.876e-04 |
0 | 17 | 5.1356 | 889.20 | 3.457e-07 |
0 | 18 | -0.7164 | 361.98 | 4.742e-01 |
0 | 19 | 2.3094 | 191.38 | 2.199e-02 |
0 | 20 | 4.1802 | 873.54 | 3.205e-05 |
0 | 21 | 0.8667 | 390.06 | 3.866e-01 |
0 | 22 | 0.2288 | 64.77 | 8.198e-01 |
0 | 23 | 2.5427 | 1081.13 | 1.114e-02 |
0 | 24 | 2.2873 | 1080.95 | 2.237e-02 |
Cohen’s D
<- mapply(we_my.d, 0, seq(1,24,1),SIMPLIFY=FALSE)
we_D_baseline baseline_cohen_d(we_D_baseline)
t | t + 1 | Cohen's d |
---|---|---|
0 | 1 | 0.0334 |
0 | 2 | 0.0778 |
0 | 3 | -0.0967 |
0 | 4 | 0.0362 |
0 | 5 | 0.0469 |
0 | 6 | -0.0989 |
0 | 7 | -0.0123 |
0 | 8 | 0.0322 |
0 | 9 | -0.0483 |
0 | 10 | -0.0764 |
0 | 11 | -0.0479 |
0 | 12 | -0.0216 |
0 | 13 | 0.2229 |
0 | 14 | 0.2874 |
0 | 15 | 0.0327 |
0 | 16 | 0.2636 |
0 | 17 | 0.2567 |
0 | 18 | -0.0557 |
0 | 19 | 0.2040 |
0 | 20 | 0.2103 |
0 | 21 | 0.0657 |
0 | 22 | 0.0271 |
0 | 23 | 0.1374 |
0 | 24 | 0.1205 |
Mean Differences
<- mapply(we_mean, 0, seq(1,24,1)) # comparing time zero [3/2020]across all of the months
we_mean_baseline baseline_mean_diff(we_mean_baseline)
t | t+1 | Mean Difference |
---|---|---|
0 | 1 | 0.0531 |
0 | 2 | 0.1227 |
0 | 3 | -0.1575 |
0 | 4 | 0.0545 |
0 | 5 | 0.0718 |
0 | 6 | -0.1605 |
0 | 7 | -0.0191 |
0 | 8 | 0.0495 |
0 | 9 | -0.0766 |
0 | 10 | -0.1218 |
0 | 11 | -0.0731 |
0 | 12 | -0.0335 |
0 | 13 | 0.3443 |
0 | 14 | 0.4207 |
0 | 15 | 0.0531 |
0 | 16 | 0.4180 |
0 | 17 | 0.3815 |
0 | 18 | -0.0896 |
0 | 19 | 0.3273 |
0 | 20 | 0.3090 |
0 | 21 | 0.1048 |
0 | 22 | 0.0439 |
0 | 23 | 0.2022 |
0 | 24 | 0.1813 |
Build our Graphs
Analytic Thinking
<- ggplot(data=df2, aes(x=Date_mean, y=Analytic_mean, group=1)) +
Analytic geom_line(colour = "dodgerblue3") +
scale_x_date(date_breaks = "3 month", date_labels = "%Y-%m") +
geom_ribbon(aes(ymin=Analytic_mean-Analytic_std.error, ymax=Analytic_mean+Analytic_std.error), alpha=0.2) +
ggtitle("Analytic Thinking") +
labs(x = "Month", y = 'Standardized score') +
+ #here's our plot aes object
plot_aes geom_vline(xintercept = as.numeric(as.Date("2020-03-01")), linetype = 1) +
geom_rect(data = df2, #summer surge
aes(xmin = as.Date("2020-06-15", "%Y-%m-%d"),
xmax = as.Date("2020-07-20", "%Y-%m-%d"),
ymin = -Inf,
ymax = Inf),
fill = "gray",
alpha = 0.009) +
geom_rect(data = df2, #winter surge
aes(xmin = as.Date("2020-11-15", "%Y-%m-%d"),
xmax = as.Date("2021-01-01", "%Y-%m-%d"),
ymin = -Inf,
ymax = Inf),
fill = "gray",
alpha = 0.009)
<- Analytic + annotate(geom="text",x=as.Date("2020-07-01"),
Analytic y=43,label="Summer 2020 surge", size = 3) +
annotate(geom="text",x=as.Date("2020-12-03"),
y=43,label="Winter 2020 surge", size = 3)
Analytic
Cogproc
<- ggplot(data=df2, aes(x=Date_mean, y=cogproc_mean, group=1)) +
Cogproc geom_line(colour = "dodgerblue3") +
scale_x_date(date_breaks = "3 month", date_labels = "%Y-%m") +
geom_ribbon(aes(ymin=cogproc_mean-cogproc_std.error, ymax=cogproc_mean+cogproc_std.error), alpha=0.2) +
ggtitle("Cognitive Processing") +
labs(x = "Month", y = '% Total Words') +
+ #here's our plot aes object
plot_aes geom_vline(xintercept = as.numeric(as.Date("2020-03-01")), linetype = 1) +
geom_rect(data = df2, #summer surge
aes(xmin = as.Date("2020-06-15", "%Y-%m-%d"),
xmax = as.Date("2020-07-20", "%Y-%m-%d"),
ymin = -Inf,
ymax = Inf),
fill = "gray",
alpha = 0.009) +
geom_rect(data = df2, #winter surge
aes(xmin = as.Date("2020-11-15", "%Y-%m-%d"),
xmax = as.Date("2021-01-01", "%Y-%m-%d"),
ymin = -Inf,
ymax = Inf),
fill = "gray",
alpha = 0.009)
<- Cogproc + annotate(geom="text",x=as.Date("2020-07-01"),
Cogproc y=12.5,label="Summer 2020 surge", size = 3) +
annotate(geom="text",x=as.Date("2020-12-03"),
y=12.5,label="Winter 2020 surge", size = 3)
Cogproc
I-words
<- ggplot(data=df2, aes(x=Date_mean, y=i_mean, group=1)) +
i geom_line(colour = "dodgerblue3") +
scale_x_date(date_breaks = "3 month", date_labels = "%Y-%m") +
geom_ribbon(aes(ymin=i_mean-i_std.error, ymax=i_mean+i_std.error), alpha=0.2) +
ggtitle("I-usage") +
labs(x = "Month", y = '% Total Words') +
+ #here's our plot aes object
plot_aes geom_vline(xintercept = as.numeric(as.Date("2020-03-01")), linetype = 1) +
geom_rect(data = df2, #summer surge
aes(xmin = as.Date("2020-06-15", "%Y-%m-%d"),
xmax = as.Date("2020-07-20", "%Y-%m-%d"),
ymin = -Inf,
ymax = Inf),
fill = "gray",
alpha = 0.009) +
geom_rect(data = df2, #winter surge
aes(xmin = as.Date("2020-11-15", "%Y-%m-%d"),
xmax = as.Date("2021-01-01", "%Y-%m-%d"),
ymin = -Inf,
ymax = Inf),
fill = "gray",
alpha = 0.009)
<- i + annotate(geom="text",x=as.Date("2020-07-01"),
i y=1.95,label="Summer 2020 surge", size = 3) +
annotate(geom="text",x=as.Date("2020-12-03"),
y=1.95,label="Winter 2020 surge", size = 3)
i
We-words
<- ggplot(data=df2, aes(x=Date_mean, y=we_mean, group=1)) +
we geom_line(colour = "dodgerblue3") +
scale_x_date(date_breaks = "3 month", date_labels = "%Y-%m") +
geom_ribbon(aes(ymin=we_mean-we_std.error, ymax=we_mean+we_std.error), alpha=0.2) +
ggtitle("We-usage") +
labs(x = "Month", y = '% Total Words') +
+ #here's our plot aes object
plot_aes geom_vline(xintercept = as.numeric(as.Date("2020-03-01")), linetype = 1) +
geom_rect(data = df2, #summer surge
aes(xmin = as.Date("2020-06-15", "%Y-%m-%d"),
xmax = as.Date("2020-07-20", "%Y-%m-%d"),
ymin = -Inf,
ymax = Inf),
fill = "gray",
alpha = 0.009) +
geom_rect(data = df2, #winter surge
aes(xmin = as.Date("2020-11-15", "%Y-%m-%d"),
xmax = as.Date("2021-01-01", "%Y-%m-%d"),
ymin = -Inf,
ymax = Inf),
fill = "gray",
alpha = 0.009)
<- we + annotate(geom="text",x=as.Date("2020-07-01"),
we y=6.5,label="Summer 2020 surge", size = 3) +
annotate(geom="text",x=as.Date("2020-12-03"),
y=6.5,label="Winter 2020 surge", size = 3)
we
Tie them all together
<- ggpubr::ggarrange(Analytic,Cogproc,i,we,ncol=2, nrow=2, common.legend = TRUE, legend = "bottom")
graphs annotate_figure(graphs,
top = text_grob("CEOs' Language Change", color = "black", face = "bold", size = 20),
bottom = text_grob("Note. Vertical Line Represents the onset of the pandemic. \n\ Horizontal shading represents Standard Error. Vertical bars represent virus surges."
color = "Black",
, hjust = 1.1, x = 1, face = "italic", size = 16))
Package Citations
::cite_packages() report
- Fox J, Weisberg S (2019). _An R Companion to Applied Regression_, Third edition. Sage, Thousand Oaks CA. <https://www.john-fox.ca/Companion/>.
- Fox J, Weisberg S, Price B (2022). _carData: Companion to Applied Regression Data Sets_. doi:10.32614/CRAN.package.carData <https://doi.org/10.32614/CRAN.package.carData>, R package version 3.0-5, <https://CRAN.R-project.org/package=carData>.
- Grolemund G, Wickham H (2011). "Dates and Times Made Easy with lubridate." _Journal of Statistical Software_, *40*(3), 1-25. <https://www.jstatsoft.org/v40/i03/>.
- Harrell Jr F (2025). _Hmisc: Harrell Miscellaneous_. doi:10.32614/CRAN.package.Hmisc <https://doi.org/10.32614/CRAN.package.Hmisc>, R package version 5.2-3, <https://CRAN.R-project.org/package=Hmisc>.
- J L (2006). "Plotrix: a package in the red light district of R." _R-News_, *6*(4), 8-12.
- Kassambara A (2019). _datarium: Data Bank for Statistical Analysis and Visualization_. doi:10.32614/CRAN.package.datarium <https://doi.org/10.32614/CRAN.package.datarium>, R package version 0.1.0, <https://CRAN.R-project.org/package=datarium>.
- Kassambara A (2023). _rstatix: Pipe-Friendly Framework for Basic Statistical Tests_. doi:10.32614/CRAN.package.rstatix <https://doi.org/10.32614/CRAN.package.rstatix>, R package version 0.7.2, <https://CRAN.R-project.org/package=rstatix>.
- Kassambara A (2025). _ggpubr: 'ggplot2' Based Publication Ready Plots_. doi:10.32614/CRAN.package.ggpubr <https://doi.org/10.32614/CRAN.package.ggpubr>, R package version 0.6.1, <https://CRAN.R-project.org/package=ggpubr>.
- Kuhn, Max (2008). "Building Predictive Models in R Using the caret Package." _Journal of Statistical Software_, *28*(5), 1–26. doi:10.18637/jss.v028.i05 <https://doi.org/10.18637/jss.v028.i05>, <https://www.jstatsoft.org/index.php/jss/article/view/v028i05>.
- Lin G (2023). _reactable: Interactive Data Tables for R_. doi:10.32614/CRAN.package.reactable <https://doi.org/10.32614/CRAN.package.reactable>, R package version 0.4.4, <https://CRAN.R-project.org/package=reactable>.
- Müller K, Wickham H (2025). _tibble: Simple Data Frames_. doi:10.32614/CRAN.package.tibble <https://doi.org/10.32614/CRAN.package.tibble>, R package version 3.3.0, <https://CRAN.R-project.org/package=tibble>.
- R Core Team (2025). _R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>.
- Rinker TW, Kurkiewicz D (2018). _pacman: Package Management for R_. version 0.5.0, <http://github.com/trinker/pacman>.
- Robinson D, Hayes A, Couch S (2025). _broom: Convert Statistical Objects into Tidy Tibbles_. doi:10.32614/CRAN.package.broom <https://doi.org/10.32614/CRAN.package.broom>, R package version 1.0.10, <https://CRAN.R-project.org/package=broom>.
- Sarkar D (2008). _Lattice: Multivariate Data Visualization with R_. Springer, New York. ISBN 978-0-387-75968-5, <http://lmdvr.r-forge.r-project.org>.
- Torchiano M (2020). _effsize: Efficient Effect Size Computation_. doi:10.5281/zenodo.1480624 <https://doi.org/10.5281/zenodo.1480624>, R package version 0.8.1, <https://CRAN.R-project.org/package=effsize>.
- Wickham H (2016). _ggplot2: Elegant Graphics for Data Analysis_. Springer-Verlag New York. ISBN 978-3-319-24277-4, <https://ggplot2.tidyverse.org>.
- Wickham H (2023). _forcats: Tools for Working with Categorical Variables (Factors)_. doi:10.32614/CRAN.package.forcats <https://doi.org/10.32614/CRAN.package.forcats>, R package version 1.0.0, <https://CRAN.R-project.org/package=forcats>.
- Wickham H (2025). _stringr: Simple, Consistent Wrappers for Common String Operations_. doi:10.32614/CRAN.package.stringr <https://doi.org/10.32614/CRAN.package.stringr>, R package version 1.5.2, <https://CRAN.R-project.org/package=stringr>.
- Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to the tidyverse." _Journal of Open Source Software_, *4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
- Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A Grammar of Data Manipulation_. doi:10.32614/CRAN.package.dplyr <https://doi.org/10.32614/CRAN.package.dplyr>, R package version 1.1.4, <https://CRAN.R-project.org/package=dplyr>.
- Wickham H, Henry L (2025). _purrr: Functional Programming Tools_. doi:10.32614/CRAN.package.purrr <https://doi.org/10.32614/CRAN.package.purrr>, R package version 1.1.0, <https://CRAN.R-project.org/package=purrr>.
- Wickham H, Hester J, Bryan J (2024). _readr: Read Rectangular Text Data_. doi:10.32614/CRAN.package.readr <https://doi.org/10.32614/CRAN.package.readr>, R package version 2.1.5, <https://CRAN.R-project.org/package=readr>.
- Wickham H, Vaughan D, Girlich M (2024). _tidyr: Tidy Messy Data_. doi:10.32614/CRAN.package.tidyr <https://doi.org/10.32614/CRAN.package.tidyr>, R package version 1.3.1, <https://CRAN.R-project.org/package=tidyr>.
- Xie Y, Cheng J, Tan X, Aden-Buie G (2025). _DT: A Wrapper of the JavaScript Library 'DataTables'_. doi:10.32614/CRAN.package.DT <https://doi.org/10.32614/CRAN.package.DT>, R package version 0.34.0, <https://CRAN.R-project.org/package=DT>.
- Zeileis A, Grothendieck G (2005). "zoo: S3 Infrastructure for Regular and Irregular Time Series." _Journal of Statistical Software_, *14*(6), 1-27. doi:10.18637/jss.v014.i06 <https://doi.org/10.18637/jss.v014.i06>.
- Zhu H (2024). _kableExtra: Construct Complex Table with 'kable' and Pipe Syntax_. doi:10.32614/CRAN.package.kableExtra <https://doi.org/10.32614/CRAN.package.kableExtra>, R package version 1.4.0, <https://CRAN.R-project.org/package=kableExtra>.
All credit goes to the great Dani Cosme for teaching me how to make these! You can find her github here!