This file illustrates the normalization and differential analysis of
RNAseq data on a real life dataset. The data are provided by courtesy of
the transcriptomic platform of IPS2 and the name of the genes were
shuffled. Hence the results are not biologically interpretable. They are
composed of three files, all included in the directory
data
:
D1-counts.txt
contains the raw counts of the
experiment (13 columns: the first one contains the gene names, the
others correspond to 12 different samples);
D1-genesLength.txt
contains information about gene
lengths;
D1-targets.txt
contains information about the sample
and the experimental design.
In the rest of this file, we:
1/ first import and describe the data;
2/ then perform and compare different normalizations;
3/ finally perform a differential analysis using different methods and models.
We advice that you use the file by reading the R
code while trying to make sense of it. Then, you run it and analyze the
results. The packages devtools
, ggplot2
,
gridExtra
, reshape2
, mixOmics
,
RColorBrewer
, edgeR
and
VennDiagram
are required to compile this file. Information
about my system (including package and R versions) are
provided at the end of this file (HTML format), in Section “Session
information”.
The packages are loaded with:
library(ggplot2)
library(gridExtra)
library(reshape2)
library(mixOmics)
library(RColorBrewer)
library(FactoMineR)
library(factoextra)
library(edgeR)
library(VennDiagram)
library(devtools)
library(plotly)
This command line is used to create a subdirectory to store results of the analyses:
dir.create("results")
## Warning in dir.create("results"): 'results' already exists
The data used in this practical session correspond to 12 samples of RNAseq obtained from microdissections on plants. Plants have four different genotypes: the wild type (“wt”) plant and three types of mutants and are observed three times in the same conditions. for information Mutants 1 and 2 have a similar phenotype (more complicated leaves than the wild type) whereas mutant 3 has the opposite phenotype (simpler leaves than the wild type).
The datasets are included in the repository data
located
at the root of the material for this practical session. They can be
loaded using:
raw_counts <- read.table("data/D1-counts.txt", header = TRUE, row.names = 1)
raw_counts <- as.matrix(raw_counts)
design <- read.table("data/D1-targets.txt", header = TRUE)
gene_lengths <- scan("data/D1-genesLength.txt")
The dimensions of the raw count matrix are equal to:
dim(raw_counts)
## [1] 50897 12
which shows that the data contains 12 columns (samples) and 50897 rows (genes). And a quick look is obtained by:
head(raw_counts)
## wt_1 wt_2 wt_3 mut1_1 mut1_2 mut1_3 mut2_1 mut2_2 mut2_3
## Medtr0001s0010.1 0 0 0 1 0 1 0 0 0
## Medtr0001s0070.1 0 0 0 0 0 0 0 0 0
## Medtr0001s0100.1 0 0 0 0 0 0 0 0 0
## Medtr0001s0120.1 0 0 0 0 0 0 0 0 0
## Medtr0001s0160.1 0 0 0 0 0 0 0 0 0
## Medtr0001s0190.1 0 0 0 0 0 0 0 0 0
## mut3_1 mut3_2 mut3_3
## Medtr0001s0010.1 0 1 0
## Medtr0001s0070.1 0 0 0
## Medtr0001s0100.1 0 0 0
## Medtr0001s0120.1 0 0 0
## Medtr0001s0160.1 0 0 0
## Medtr0001s0190.1 0 0 0
which shows that the row names are gene names and that the data are made of counts. Many of these counts are equal to 0. A basic exploratory analysis of the count data is provided in the next section.
The experimental design is described in the object
design
:
design
## labels group replicat
## 1 wt_1 wt repbio1
## 2 wt_2 wt repbio2
## 3 wt_3 wt repbio3
## 4 mut1_1 mut1 repbio1
## 5 mut1_2 mut1 repbio2
## 6 mut1_3 mut1 repbio3
## 7 mut2_1 mut2 repbio1
## 8 mut2_2 mut2 repbio2
## 9 mut2_3 mut2 repbio3
## 10 mut3_1 mut3 repbio1
## 11 mut3_2 mut3 repbio2
## 12 mut3_3 mut3 repbio3
that contains 3 columns: the first one labels
is the
sample name, the second one group
is the group of the
sample (wild type or one of three types of mutant) and the last one
replicat
is the biological replicate number (that we will
suppose to be corresponding from one group to another).
We start the analysis by filtering out the genes for which no count has been found:
raw_counts_wn <- raw_counts[rowSums(raw_counts) > 0, ]
dim(raw_counts_wn)
## [1] 27916 12
The number of genes which have been filtered out is thus equal to 22981.
It is often useful, to visualize the count distribution, to compute “pseudo counts”, which are log-transformed counts:
pseudo_counts <- log2(raw_counts_wn + 1)
head(pseudo_counts)
## wt_1 wt_2 wt_3 mut1_1 mut1_2 mut1_3 mut2_1
## Medtr0001s0010.1 0.000000 0.000000 0.000000 1.000000 0.000000 1.000000 0.000000
## Medtr0001s0200.1 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## Medtr0001s0260.1 6.357552 6.768184 6.643856 6.169925 6.507795 6.189825 6.209453
## Medtr0001s0360.1 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000
## Medtr0001s0490.1 1.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## Medtr0002s0040.1 7.707359 6.918863 7.826548 2.584963 2.000000 4.643856 6.643856
## mut2_2 mut2_3 mut3_1 mut3_2 mut3_3
## Medtr0001s0010.1 0.000000 0.000000 0.000000 1.000000 0.000000
## Medtr0001s0200.1 0.000000 0.000000 0.000000 0.000000 0.000000
## Medtr0001s0260.1 6.392317 6.672425 6.820179 6.554589 6.539159
## Medtr0001s0360.1 1.584963 0.000000 1.000000 0.000000 0.000000
## Medtr0001s0490.1 0.000000 1.584963 1.000000 0.000000 1.000000
## Medtr0002s0040.1 7.651052 8.044394 3.169925 3.459432 1.584963
df_raw <- melt(pseudo_counts, id = rownames(raw_counts_wn))
names(df_raw)[1:2]<- c("id", "sample")
df_raw$method <- rep("Raw counts", nrow(df_raw))
head(df_raw)
## id sample value method
## 1 Medtr0001s0010.1 wt_1 0.000000 Raw counts
## 2 Medtr0001s0200.1 wt_1 0.000000 Raw counts
## 3 Medtr0001s0260.1 wt_1 6.357552 Raw counts
## 4 Medtr0001s0360.1 wt_1 0.000000 Raw counts
## 5 Medtr0001s0490.1 wt_1 1.000000 Raw counts
## 6 Medtr0002s0040.1 wt_1 7.707359 Raw counts
The object df_raw
will be used later to compare the
effect of different normalization on the count distribution in the
different samples.
First, we provide an overview of the distribution of the first sample (without the genes that have been filtered out) by plotting the histograms of raw counts and pseudo counts:
df <- data.frame(rcounts = raw_counts_wn[ ,1], prcounts = pseudo_counts[ ,1])
p <- ggplot(data=df, aes(x = rcounts, y = ..density..)) +
geom_histogram(fill = "lightblue") + theme_bw() +
ggtitle(paste0("count distribution '", design$labels[1], "'")) +
xlab("counts")
p2 <- ggplot(data=df, aes(x = prcounts, y = ..density..)) +
geom_histogram(fill = "lightblue") + theme_bw() +
ggtitle(paste0("count distribution - '", design$labels[1], "'")) +
xlab(expression(log[2](counts + 1)))
grid.arrange(p, p2, ncol = 2)
This figure shows that the count distribution is highly skewed with a large proportion of genes with a count equal to 0 and a few number of genes with an extreme count (more that \(2^{15}\)).
An alternative representation can be obtained with boxplots (all samples, colored by group)
df <- cbind(design, t(pseudo_counts))
df <- melt(df, id.vars = c("labels", "group", "replicat"))
p <- ggplot(data = df, aes(x = labels, y = value, fill = group)) +
geom_boxplot() + theme_bw() + ggtitle("count distributions") +
xlab("sample") + ylab("counts") +
theme(axis.text.x = element_text(angle = 90))
p
df <- data.frame("A" = (pseudo_counts[ ,3] + pseudo_counts[ ,4])/2,
"M" = pseudo_counts[ ,3] - pseudo_counts[ ,4])
p <- ggplot(data = df, aes(x = A, y = M)) + geom_point(alpha = 0.2) +
theme_bw() + geom_smooth(colour = "red")
p
df <- as.matrix(dist(t(pseudo_counts)))
df
## wt_1 wt_2 wt_3 mut1_1 mut1_2 mut1_3 mut2_1 mut2_2
## wt_1 0.0000 126.2933 130.8818 135.1886 125.2456 146.5416 120.6482 129.58103
## wt_2 126.2933 0.0000 120.4416 141.6461 121.6069 146.8200 114.4024 132.22552
## wt_3 130.8818 120.4416 0.0000 124.3985 124.1820 127.5267 137.5582 116.34194
## mut1_1 135.1886 141.6461 124.3985 0.0000 107.3116 118.6190 135.4935 112.08930
## mut1_2 125.2456 121.6069 124.1820 107.3116 0.0000 123.9054 116.4351 120.51413
## mut1_3 146.5416 146.8200 127.5267 118.6190 123.9054 0.0000 153.2298 120.09793
## mut2_1 120.6482 114.4024 137.5582 135.4935 116.4351 153.2298 0.0000 131.61151
## mut2_2 129.5810 132.2255 116.3419 112.0893 120.5141 120.0979 131.6115 0.00000
## mut2_3 132.7267 128.8963 123.6715 117.8781 119.8149 124.7136 127.9787 99.19577
## mut3_1 127.5728 129.2516 133.7004 138.1283 122.7834 146.2628 131.1581 138.35110
## mut3_2 124.5903 126.7231 120.9488 127.7823 123.4257 138.5271 127.2142 115.29926
## mut3_3 130.4482 127.8773 144.7460 146.4226 127.1282 167.7606 121.2774 145.41084
## mut2_3 mut3_1 mut3_2 mut3_3
## wt_1 132.72674 127.5728 124.5903 130.4482
## wt_2 128.89625 129.2516 126.7231 127.8773
## wt_3 123.67150 133.7004 120.9488 144.7460
## mut1_1 117.87812 138.1283 127.7823 146.4226
## mut1_2 119.81491 122.7834 123.4257 127.1282
## mut1_3 124.71359 146.2628 138.5271 167.7606
## mut2_1 127.97870 131.1581 127.2142 121.2774
## mut2_2 99.19577 138.3511 115.2993 145.4108
## mut2_3 0.00000 138.6242 120.2381 144.2214
## mut3_1 138.62420 0.0000 120.1136 118.8938
## mut3_2 120.23807 120.1136 0.0000 118.7948
## mut3_3 144.22141 118.8938 118.7948 0.0000
cim_color <- colorRampPalette(rev(brewer.pal(9, "Reds")))(16)
cim(df, color = cim_color, symkey = FALSE)
Finally, a PCA is performed to identify the main sources of variability in the dataset. Pseudo-counts are used to perform this PCA.
resPCA <- PCA(t(pseudo_counts), ncp = 12, graph = FALSE)
fviz_eig(resPCA)
The first component explains more variance than the other components which almost all equally reproduce variance. The projection of the samples on the first two PCs is obtained with:
p <- fviz_pca_ind(resPCA, col.ind = design$group)
p
plotly::ggplotly(p)
This figure shows that mutants 1 and 3 are well separated from the other wild type but that wild type is not well separated from mutant 2.
This section performs different normalization approaches for RNAseq data. These normalization are performed using edgeR (TC, RPKM, UQ, TMM). The final count distribution is compared to the raw count distribution in the last part of this section.
In this section, the package edgeR is perform to compare the other normalization approaches. To do so, a DGEList object has to be created from the count data:
dge2 <- DGEList(raw_counts_wn)
dge2
## An object of class "DGEList"
## $counts
## wt_1 wt_2 wt_3 mut1_1 mut1_2 mut1_3 mut2_1 mut2_2 mut2_3
## Medtr0001s0010.1 0 0 0 1 0 1 0 0 0
## Medtr0001s0200.1 0 1 0 0 0 0 0 0 0
## Medtr0001s0260.1 81 108 99 71 90 72 73 83 101
## Medtr0001s0360.1 0 0 1 0 0 0 0 2 0
## Medtr0001s0490.1 1 3 0 0 0 0 0 0 2
## mut3_1 mut3_2 mut3_3
## Medtr0001s0010.1 0 1 0
## Medtr0001s0200.1 0 0 0
## Medtr0001s0260.1 112 93 92
## Medtr0001s0360.1 1 0 0
## Medtr0001s0490.1 1 0 1
## 27911 more rows ...
##
## $samples
## group lib.size norm.factors
## wt_1 1 5932414 1
## wt_2 1 7129204 1
## wt_3 1 6223759 1
## mut1_1 1 5577595 1
## mut1_2 1 6272344 1
## 7 more rows ...
The library sizes and the normalization factors for all samples are obtained by:
dge2$samples
## group lib.size norm.factors
## wt_1 1 5932414 1
## wt_2 1 7129204 1
## wt_3 1 6223759 1
## mut1_1 1 5577595 1
## mut1_2 1 6272344 1
## mut1_3 1 6634542 1
## mut2_1 1 5897356 1
## mut2_2 1 5785184 1
## mut2_3 1 5767730 1
## mut3_1 1 7106881 1
## mut3_2 1 5872052 1
## mut3_3 1 6035692 1
Normalization by Total Count (TC) is obtained directly by the
function cpm
. Pseudo-counts (\(\log_2\) transformed counts) are computed
and stored in a data frame for later use.
norm_lib <- cpm(dge2)
pseudo_TC <- log2(cpm(dge2) + 1)
df_TC <- melt(pseudo_TC, id = rownames(raw_counts_wn))
names(df_TC)[1:2] <- c ("id", "sample")
df_TC$method <- rep("TC", nrow(df_TC))
RPKM needs information about gene lengths which have to be passed to
the function rpkm
. Again, pseudo-counts are computed and
stored in a data frame for further use.
gene_lengths_wn <- gene_lengths[rowSums(raw_counts) > 0]
pseudo_RPKM <- log2(rpkm(dge2, gene.length = gene_lengths_wn) + 1)
df_RPKM <- melt(pseudo_RPKM, id = rownames(raw_counts_wn))
names(df_RPKM)[1:2] <- c ("id", "sample")
df_RPKM$method <- rep("RPKM", nrow(df_RPKM))
Upper quartile normalization is obtained with the function
calcNormFactors
. This function changes the variable
norm.factors
in the DGEList object and thus also the way
the functions cpm
and rpkm
are computing
counts: now, a normalized library size, equal to the original library
size multiplied by the normalization factor, is used to compute
normalized counts: \(\tilde{K}_{gj} =
\frac{K_{gj}}{\mbox{modified LS}}*10^6\). The normalization
factors are computed using:
dge2 <- calcNormFactors(dge2, method = "upperquartile")
dge2$samples
## group lib.size norm.factors
## wt_1 1 5932414 1.0047308
## wt_2 1 7129204 0.9810476
## wt_3 1 6223759 1.0130558
## mut1_1 1 5577595 1.0068753
## mut1_2 1 6272344 0.9942229
## mut1_3 1 6634542 1.0593861
## mut2_1 1 5897356 0.9873347
## mut2_2 1 5785184 1.0124342
## mut2_3 1 5767730 1.0453656
## mut3_1 1 7106881 0.9792811
## mut3_2 1 5872052 0.9857220
## mut3_3 1 6035692 0.9361638
The previous formula is compared to the result of the function
cpm
in the following command lines:
test_normcount <- sweep(dge2$counts * 10^6, 2,
dge2$samples$lib.size * dge2$samples$norm.factors,
"/")
range(as.vector(test_normcount - cpm(dge2)))
## [1] -9.094947e-13 1.818989e-12
which shows no difference. Finally, pseudo counts are obtained and stored in a data frame:
pseudo_UQ <- log2(cpm(dge2) + 1)
df_UQ <- melt(pseudo_UQ, id = rownames(raw_counts_wn))
names(df_UQ)[1:2] <- c ("id", "sample")
df_UQ$method <- rep("UQ", nrow(df_UQ))
dge2 <- calcNormFactors(dge2, method = "RLE")
dge2$samples
## group lib.size norm.factors
## wt_1 1 5932414 1.0068551
## wt_2 1 7129204 0.9870920
## wt_3 1 6223759 1.0023952
## mut1_1 1 5577595 1.0065636
## mut1_2 1 6272344 1.0081697
## mut1_3 1 6634542 1.0556197
## mut2_1 1 5897356 1.0035566
## mut2_2 1 5785184 1.0130438
## mut2_3 1 5767730 1.0433540
## mut3_1 1 7106881 0.9784525
## mut3_2 1 5872052 0.9669348
## mut3_3 1 6035692 0.9337173
pseudo_RLE <- log2(cpm(dge2) + 1)
df_RLE <- melt(pseudo_RLE, id = rownames(raw_counts_wn))
names(df_RLE)[1:2] <- c ("id", "sample")
df_RLE$method <- rep("RLE", nrow(df_RLE))
TMM normalization works similarly as UQ and updates the value of the
variable norm.factors
:
dge2 <- calcNormFactors(dge2, method = "TMM")
dge2$samples
## group lib.size norm.factors
## wt_1 1 5932414 1.0082044
## wt_2 1 7129204 0.9850930
## wt_3 1 6223759 1.0047944
## mut1_1 1 5577595 1.0064979
## mut1_2 1 6272344 1.0094894
## mut1_3 1 6634542 1.0593265
## mut2_1 1 5897356 0.9977013
## mut2_2 1 5785184 1.0096719
## mut2_3 1 5767730 1.0387295
## mut3_1 1 7106881 0.9791254
## mut3_2 1 5872052 0.9637238
## mut3_3 1 6035692 0.9429276
Normalized pseudo counts are obtained with the function
cpm
and stored in a data frame:
pseudo_TMM <- log2(cpm(dge2) + 1)
df_TMM <- melt(pseudo_TMM, id = rownames(raw_counts_wn))
names(df_TMM)[1:2] <- c ("id", "sample")
df_TMM$method <- rep("TMM", nrow(df_TMM))
This last section shows the comparison between all normalization methods (and original raw data) for this data set. First, the distributions of all samples and all normalization methods are compared with boxplots and density plots:
df_allnorm <- rbind(df_raw, df_TC, df_RPKM, df_UQ, df_TMM, df_RLE)
df_allnorm$method <- factor(df_allnorm$method,
levels = c("Raw counts", "TC", "RPKM", "TMM", "UQ", "RLE"))
p <- ggplot(data=df_allnorm, aes(x=sample, y=value, fill=method)) +
geom_boxplot() + theme_bw() +
ggtitle("Boxplots of normalized pseudo counts\n for all samples by normalization methods") + facet_grid(. ~ method) +
ylab(expression(log[2] ~ (normalized ~ count + 1))) + xlab("") +
theme(title = element_text(size=10), axis.text.x = element_blank(),
axis.ticks.x = element_blank())
print(p)
p <- ggplot(data = df_allnorm, aes(x = value, colour = sample)) +
geom_density() + theme_bw() +
ggtitle("Density of normalized pseudo counts\n for all samples by normalization methods") +
facet_grid(. ~ method) + ylab("density") + xlab("") +
theme(title = element_text(size=10), axis.text.x = element_blank(),
axis.ticks.x = element_blank())
print(p)
Visually, DESeq, UQ and TMM seem to behave similarly and provide comparable distributions between samples. Finally, a PCA is performed on the pseudo counts obtained after TMM normalization:
resPCA <- PCA(t(pseudo_TMM), ncp = 12, graph = FALSE)
fviz_eig(resPCA)
The first component explains more variance than the other components which almost all equally reproduce variance. The projection of the samples on the first two PCs is obtained with:
fviz_pca_ind(resPCA, col.ind = design$group)
This figure shows that a better separation between wild type samples and mutant 2 samples. On the contrary mutants 2 and 1 are less well separated than in raw count data. This is a biologically plausible result since the organization of the different mutants are concordant with their phenotypes. The normalization has thus improved the data quality.
df <- as.matrix(dist(t(pseudo_TMM)))
df
## wt_1 wt_2 wt_3 mut1_1 mut1_2 mut1_3 mut2_1 mut2_2
## wt_1 0.00000 71.28860 78.90104 84.55244 70.64068 88.26819 69.53304 81.23892
## wt_2 71.28860 0.00000 70.97388 88.57248 68.29427 93.19227 60.76116 83.92985
## wt_3 78.90104 70.97388 0.00000 74.50747 74.80749 74.06644 86.66028 68.40858
## mut1_1 84.55244 88.57248 74.50747 0.00000 60.14370 63.17140 87.75813 64.73119
## mut1_2 70.64068 68.29427 74.80749 60.14370 0.00000 73.07235 64.22984 73.08702
## mut1_3 88.26819 93.19227 74.06644 63.17140 73.07235 0.00000 95.41909 62.47330
## mut2_1 69.53304 60.76116 86.66028 87.75813 64.22984 95.41909 0.00000 86.65133
## mut2_2 81.23892 83.92985 68.40858 64.73119 73.08702 62.47330 86.65133 0.00000
## mut2_3 83.23634 81.67530 74.63152 68.53364 72.88852 67.17804 82.96370 50.16135
## mut3_1 71.38379 76.36299 81.36898 84.32391 69.76380 93.05559 76.37973 86.49078
## mut3_2 72.65982 72.80861 67.96187 78.93174 72.95065 77.86109 79.56811 67.63906
## mut3_3 74.59898 69.04147 89.09842 96.24201 72.17156 103.20172 70.93649 96.58503
## mut2_3 mut3_1 mut3_2 mut3_3
## wt_1 83.23634 71.38379 72.65982 74.59898
## wt_2 81.67530 76.36299 72.80861 69.04147
## wt_3 74.63152 81.36898 67.96187 89.09842
## mut1_1 68.53364 84.32391 78.93174 96.24201
## mut1_2 72.88852 69.76380 72.95065 72.17156
## mut1_3 67.17804 93.05559 77.86109 103.20172
## mut2_1 82.96370 76.37973 79.56811 70.93649
## mut2_2 50.16135 86.49078 67.63906 96.58503
## mut2_3 0.00000 87.55700 71.88621 95.17747
## mut3_1 87.55700 0.00000 65.23875 58.12088
## mut3_2 71.88621 65.23875 0.00000 73.23508
## mut3_3 95.17747 58.12088 73.23508 0.00000
cim_color <- colorRampPalette(rev(brewer.pal(9, "Blues")))(16)
cim(df, color = cim_color, symkey = FALSE)
This section will compare the results of different types of approach to obtain genes which are differentially expressed between the wild type plants and each of the mutants:
a standard NB exact test between two conditions;
a GLM with only the genotype effect;
a GLM with the plant (individual) and genotype effects.
All analyses are performed with the R package edgeR. A last section compares the results obtained using the different methods.
The following command lines loop over the different mutant types
(stored in the vector all_conditions
) and perform the
following operations:
first, a DGEList object is created with the wild type samples and the current mutant type samples. The “wt” condition is chosen as the reference level. Normalization is performed using TMM;
then, the functions estimateCommonDisp
and
estimateTagwiseDisp
respectively estimate a common
dispersion parameter and uses it as a prior to estimate gene specific
dispersions \(\phi_g\) for all \(g\). The “Biological Coefficient of
Variation” plot is also displayed which displays \(\phi_g\) versus the average (normalized)
count for all genes;
finally, an exact test is performed with the function
exactTest
to find genes which are differentially expressed
between the two conditions (wild type and mutant). The output of the
function is printed and p-values and adjusted p-values (Benjamini and
Hochberg approach) are stored in pvals_pairwiseExact
. The
function topTags
is used to print the genes with the
smallest p-values and the function decideTestsDGE
is used
to extract genes whose adjusted p-value is smaller than 5%. The names of
those genes as well as the fact that they are up/down regulated in the
mutant type are saved in DGE_pairwiseExact
.
cur_counts <- raw_counts_wn[ ,1:6]
cur_design <- design[1:6, ]
cur_dge <- DGEList(cur_counts, group = cur_design$group, remove.zeros = TRUE)
## Removing 994 rows with all zero counts
cur_dge <- calcNormFactors(cur_dge, method = "TMM")
pseudo_counts <- log2(cpm(cur_dge) + 1)
# estimate dispersion
cur_dge <- estimateDisp(cur_dge)
## Using classic mode.
plotBCV(cur_dge, main = paste0("BCV plot"))
# perform test
res_et <- exactTest(cur_dge)
p_adjust <- p.adjust(res_et$table$PValue, method = "BH")
df <- data.frame("pvalue" = c(res_et$table$PValue, p_adjust),
"type" = rep(c("raw", "adjusted"), each = nrow(res_et$table)))
df$type <- factor(df$type, levels = c("raw", "adjusted"))
p <- ggplot(df, aes(x = pvalue)) + geom_histogram() + theme_bw() +
facet_wrap(. ~ type) + xlab("p-value")
p
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# extracting results
to_export <- topTags(res_et, n = nrow(res_et$table))
write.table(to_export, file = "results/mesjolisresultats.csv", sep = ";",
row.names = TRUE, col.names = TRUE, dec = ",")
topTags(res_et, adjust.method = "bonferroni")
## Comparison of groups: wt-mut1
## logFC logCPM PValue FWER
## Medtr7g079200.1 -10.429058 5.646475 5.610042e-140 1.510336e-135
## Medtr6g034805.1 5.621576 3.157061 5.045605e-39 1.358378e-34
## Medtr2g028530.1 9.381832 2.800090 5.143291e-31 1.384677e-26
## Medtr3g092475.1 2.761049 4.927626 5.894048e-30 1.586796e-25
## Medtr7g116150.1 -6.313914 2.897514 5.512949e-29 1.484196e-24
## Medtr6g051975.1 -8.734510 2.175164 1.126573e-28 3.032961e-24
## Medtr6g445300.1 8.201772 1.703605 9.031220e-21 2.431385e-16
## Medtr2g436330.1 2.304948 3.858789 1.101598e-20 2.965721e-16
## Medtr2g054640.1 -1.999636 6.178289 6.286447e-19 1.692437e-14
## Medtr0024s0070.1 -6.839291 2.166794 6.391235e-18 1.720648e-13
cur_res <- decideTestsDGE(res_et, adjust.method = "BH", p.value = 0.05)
cur_res
## TestResults matrix
## wt-mut1
## Medtr0001s0010.1 0
## Medtr0001s0200.1 0
## Medtr0001s0260.1 0
## Medtr0001s0360.1 0
## Medtr0001s0490.1 0
## 26917 more rows ...
sel_deg <- which(cur_res[ ,1] != 0)
length(sel_deg)
## [1] 275
DEG_pairwiseExact <- rownames(cur_res)[sel_deg]
head(DEG_pairwiseExact)
## [1] "Medtr0002s0040.1" "Medtr0002s0530.1" "Medtr0003s0010.1" "Medtr0007s0160.1"
## [5] "Medtr0024s0070.1" "Medtr0026s0200.1"
# explore results
df <- data.frame("logCPM" = res_et$table$logCPM,
"logFC" = res_et$table$logFC,
"selected" = unname(ifelse(cur_res[ ,1] != 0, "DEG", "not DE")))
p <- ggplot(df, aes(x = logCPM, y = logFC, colour = selected)) +
geom_point(alpha = 0.5) + theme_bw() +
scale_colour_manual(name = "test result", values = c("red", "black"))
p
df$adjpval <- p.adjust(res_et$table$PValue, "BH")
p <- ggplot(df, aes(x = logFC, y = -log10(adjpval), colour = selected)) +
geom_point(alpha = 0.5) + theme_bw() +
scale_colour_manual(name = "test result", values = c("red", "black"))
p
df <- pseudo_counts[sel_deg, ]
df <- scale(t(as.matrix(df)))
cim_color <- colorRampPalette(rev(brewer.pal(11, "RdYlGn")))(256)
res_cim <- cim(df, color = cim_color, cluster = "col", col.names = FALSE,
dist.method = c("euclidean", "correlation"))
Extracting dendrogram associated with these genes are groups is easy with:
res_hac <- as.hclust(res_cim$ddc)
plot(res_hac, labels = FALSE, xlab = "", sub = "", main = "Groups of genes")
rect.hclust(res_hac, k = 3, border = "red")
genes_group <- cutree(res_hac, k = 3)
head(genes_group)
## Medtr0002s0040.1 Medtr0002s0530.1 Medtr0003s0010.1 Medtr0007s0160.1
## 1 2 2 2
## Medtr0024s0070.1 Medtr0026s0200.1
## 2 2
table(genes_group)
## genes_group
## 1 2 3
## 35 214 26
In this section, we fit a generalized linear model to account for the group effect. The model writes \(K_{gj} \sim \mbox{NB}(\mu_{gj}, \phi_g)\) with \(\mathbb{E}(\log K_{gj}) = \log(s_j) + \log(\lambda_{gj})\). Here, \(j\) denotes one of the four genotypes (wild type or mutant 1, 2 and 3) and \(\lambda_{gj}\) is explained by: \(\log(\lambda_{g,\textrm{mutant}}) = \lambda_{g0} + \beta_{g1} \mathbb{I}_{\textrm{mutant}_1} + \beta_{g2} \mathbb{I}_{\textrm{mutant}_2} + \beta_{g3} \mathbb{I}_{\textrm{mutant}_3}\). In this model, the wild type plant is the reference genotype and thus, testing if the second (resp., the third, the fourth) coefficient in the model is equal to zero is equivalent to find DEGs between the wild type and mutant 1 (resp. 2, 3).
First, a DGEList object is created with all samples and is normalized by TMM:
# create dataset with 'wt' and current mutant and normalize (TMM)
cur_dge <- DGEList(raw_counts_wn, group = design$group)
cur_dge <- calcNormFactors(cur_dge, method = "TMM")
Then, a design matrix is created: the group condition is stored in a
variable group
in which the reference level is set to “wt”.
Then, this variable is used in the function model.matrix
to
obtain the design matrix:
# design matrix
group <- relevel(as.factor(design$group), ref = "wt")
design_matrix <- model.matrix(~ group)
design_matrix
## (Intercept) groupmut1 groupmut2 groupmut3
## 1 1 0 0 0
## 2 1 0 0 0
## 3 1 0 0 0
## 4 1 1 0 0
## 5 1 1 0 0
## 6 1 1 0 0
## 7 1 0 1 0
## 8 1 0 1 0
## 9 1 0 1 0
## 10 1 0 0 1
## 11 1 0 0 1
## 12 1 0 0 1
## attr(,"assign")
## [1] 0 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
# estimate dispersion
cur_dge <- estimateDisp(cur_dge, design_matrix)
plotBCV(cur_dge, main = paste0("BCV plot"))
# fit model and perform test
fit <- glmFit(cur_dge, design_matrix)
res_GLM1 <- glmLRT(fit, coef = 2)
table_res <- res_GLM1$table
p_adjust <- p.adjust(table_res$PValue, method = "BH")
table_res$padj <- p_adjust
df <- data.frame("pvalue" = c(res_GLM1$table$PValue, p_adjust),
"type" = rep(c("raw", "adjusted"), each = nrow(res_GLM1$table)))
df$type <- factor(df$type, levels = c("raw", "adjusted"))
p <- ggplot(df, aes(x = pvalue)) + geom_histogram() + theme_bw() +
facet_wrap(. ~ type) + xlab("p-value")
p
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# extracting results
cur_res <- decideTestsDGE(res_GLM1, adjust.method = "BH", p.value = 0.05)
cur_res
## TestResults matrix
## groupmut1
## Medtr0001s0010.1 0
## Medtr0001s0200.1 0
## Medtr0001s0260.1 0
## Medtr0001s0360.1 0
## Medtr0001s0490.1 0
## 27911 more rows ...
sel_deg <- which(cur_res[ ,1] != 0)
length(sel_deg)
## [1] 305
DEG_GLM1 <- rownames(cur_res)[sel_deg]
head(DEG_GLM1)
## [1] "Medtr0002s0040.1" "Medtr0002s0530.1" "Medtr0003s0010.1" "Medtr0007s0160.1"
## [5] "Medtr0024s0070.1" "Medtr0026s0200.1"
# explore results
df <- data.frame("logCPM" = res_GLM1$table$logCPM,
"logFC" = res_GLM1$table$logFC,
"selected" = unname(ifelse(cur_res[ ,1] != 0, "DEG", "not DE")))
p <- ggplot(df, aes(x = logCPM, y = logFC, colour = selected)) +
geom_point(alpha = 0.5) + theme_bw() +
scale_colour_manual(name = "test result", values = c("red", "black"))
p
df$adjpval <- p.adjust(res_GLM1$table$PValue, "BH")
p <- ggplot(df, aes(x = logFC, y = -log10(adjpval), colour = selected)) +
geom_point(alpha = 0.5) + theme_bw() +
scale_colour_manual(name = "test result", values = c("red", "black"))
p
Compared to what was found with the Exact test, the number of obtained DEGs is of the same order (only a bit larger). A more precise comparison is made in the last part of this section.
The difference between mutant 1 and mutant 3 can be assessed using contrasts on the same fitted model:
# fit model and perform test
contrasts <- c(0, 1, 0, -1)
res <- glmLRT(fit, contrast = contrasts)
# extracting results
cur_res <- decideTestsDGE(res, adjust.method = "BH", p.value = 0.05)
sel_deg <- which(cur_res[ ,1] != 0)
length(sel_deg)
## [1] 1362
DEG_GLM1c <- rownames(cur_res)[sel_deg]
head(DEG_GLM1c)
## [1] "Medtr0002s0330.1" "Medtr0003s0010.1" "Medtr0003s0090.1" "Medtr0003s0350.1"
## [5] "Medtr0007s0160.1" "Medtr0008s0110.1"
The previous model does not take into account all information about the experimental design. Only the group effect has been tested whereas an additional effect might influence the expression level: the replicat effect. In this section we fit a model with an additive contribution of both effects. This problem is equivalent to introduce a (fixed) individual effect in the model. In this situation, the average level of \(\log\) normalized counts is expressed as: \(\log(\lambda_{g,\textrm{rep},\textrm{mutant}}) = \lambda_{g0} + \beta_{g1} \mathbb{I}_{\textrm{rep}_1} + \beta_{g2} \mathbb{I}_{\textrm{rep}_2} + \beta_{g3} \mathbb{I}_{\textrm{mutant}_1} + \beta_{g4} \mathbb{I}_{\textrm{mutant}_2} + \beta_{g5} \mathbb{I}_{\textrm{mutant}_3}\) and DEGs for each of the three mutant types are obtained by testing coefficients \(\beta_{g3}\), \(\beta_{g4}\) and \(\beta_{g5}\).
The method is performed similarly as before: a DGEList is created and normalization factors are obtained with the TMM approach. Then a design matrix that incorporates the two factors in an additive model is created:
# create dataset with 'wt' and current mutant and normalize (TMM)
cur_dge <- DGEList(raw_counts_wn, group = group)
cur_dge <- calcNormFactors(cur_dge, method = "TMM")
# create design matrix
design_matrix <- model.matrix(~ design$replicat + group)
design_matrix
## (Intercept) design$replicatrepbio2 design$replicatrepbio3 groupmut1
## 1 1 0 0 0
## 2 1 1 0 0
## 3 1 0 1 0
## 4 1 0 0 1
## 5 1 1 0 1
## 6 1 0 1 1
## 7 1 0 0 0
## 8 1 1 0 0
## 9 1 0 1 0
## 10 1 0 0 0
## 11 1 1 0 0
## 12 1 0 1 0
## groupmut2 groupmut3
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
## 7 1 0
## 8 1 0
## 9 1 0
## 10 0 1
## 11 0 1
## 12 0 1
## attr(,"assign")
## [1] 0 1 1 2 2 2
## attr(,"contrasts")
## attr(,"contrasts")$`design$replicat`
## [1] "contr.treatment"
##
## attr(,"contrasts")$group
## [1] "contr.treatment"
The dispersion is then estimated using the same approach as in the previous section:
cur_dge <- estimateDisp(cur_dge, design_matrix)
plotBCV(cur_dge, main = paste0("BCV plot"))
And finally, the model is fitted with glmFit
and tests
are performed on the coefficients of interest with glmLRT
.
Results extracted with topTags
and
decideTestsDGE
are saved in pvals_GLM2
and
listDEG_GLM2
:
# GLM fit
fit <- glmFit(cur_dge, design_matrix)
res_GLM2 <- glmLRT(fit, coef = 4)
df <- data.frame("pvalue" = c(res_GLM2$table$PValue,
p.adjust(res_GLM2$table$PValue, method = "BH")),
"type" = rep(c("raw", "adjusted"), each = nrow(res_GLM2$table)))
df$type <- factor(df$type, levels = c("raw", "adjusted"))
p <- ggplot(df, aes(x = pvalue)) + geom_histogram() + theme_bw() +
facet_wrap(. ~ type) + xlab("p-value")
p
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# extracting results
cur_res <- decideTestsDGE(res_GLM2, adjust.method = "BH", p.value = 0.05)
cur_res
## TestResults matrix
## groupmut1
## Medtr0001s0010.1 0
## Medtr0001s0200.1 0
## Medtr0001s0260.1 0
## Medtr0001s0360.1 0
## Medtr0001s0490.1 0
## 27911 more rows ...
sel_deg <- which(cur_res[ ,1] != 0)
length(sel_deg)
## [1] 313
DEG_GLM2 <- rownames(cur_res)[sel_deg]
head(DEG_GLM2)
## [1] "Medtr0002s0040.1" "Medtr0003s0010.1" "Medtr0007s0160.1" "Medtr0024s0070.1"
## [5] "Medtr0026s0200.1" "Medtr0027s0090.1"
# explore results
df <- data.frame("logCPM" = res_GLM2$table$logCPM,
"logFC" = res_GLM2$table$logFC,
"selected" = unname(ifelse(cur_res[ ,1] != 0, "DEG", "not DE")))
p <- ggplot(df, aes(x = logCPM, y = logFC, colour = selected)) +
geom_point(alpha = 0.5) + theme_bw() +
scale_colour_manual(name = "test result", values = c("red", "black"))
p
df$adjpval <- p.adjust(res_GLM2$table$PValue, "BH")
p <- ggplot(df, aes(x = logFC, y = -log10(adjpval), colour = selected)) +
geom_point(alpha = 0.5) + theme_bw() +
scale_colour_manual(name = "test result", values = c("red", "black"))
p
In this last part, a Venn diagram is displayed for the results (unique genes for mutant type 1 versus reference) of the three approaches:
vd <- venn.diagram(x=list("Exact test" = DEG_pairwiseExact,
"GLM\n group" = DEG_GLM1,
"GLM\n group + replicate" = DEG_GLM2),
fill = brewer.pal(3, "Set3"),
cat.col = c("darkgreen", "black", "darkblue"),
cat.cex = 1.5, fontface = "bold", filename = NULL)
grid.draw(vd)
This chart yields to several conclusions:
first, a large number of genes (\(201\) out of approximately 400) are in common between all three approaches. As expected, the two more consistent methods are the two GLM approaches (with an additional number of \(57\) DEGs in common and \(46+28+9+19 = 102\) DEGs that are specific of a given model). The Exact Test has 42 DEGs that are not found by one of the other two approaches and only \(19+9=28\) DEGs in common with one of the two GLM models only;
the 46 DEGs that were found exclusively by GLM with two additive factors might be interesting because they are DEGs which can not be found if the genotype effect is not included in the model.
To obtain results similar to mines, make sure that your session has identical features than the one used to compile this document:
session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.3.3 (2024-02-29)
## os Ubuntu 22.04.4 LTS
## system x86_64, linux-gnu
## ui X11
## language en_US
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Europe/Paris
## date 2024-04-10
## pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0)
## backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0)
## BiocParallel 1.34.1 2023-05-05 [1] Bioconductor
## broom 1.0.4 2023-03-11 [1] CRAN (R 4.3.0)
## bslib 0.4.2 2022-12-16 [1] CRAN (R 4.3.0)
## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0)
## callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.0)
## car 3.1-2 2023-03-30 [1] CRAN (R 4.3.0)
## carData 3.0-5 2022-01-06 [1] CRAN (R 4.3.0)
## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
## cluster 2.1.6 2023-12-01 [3] RSPM (R 4.3.0)
## codetools 0.2-20 2024-03-31 [3] RSPM (R 4.3.0)
## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
## corpcor 1.6.10 2021-09-16 [1] CRAN (R 4.3.0)
## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)
## crosstalk 1.2.0 2021-11-04 [1] CRAN (R 4.3.0)
## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.0)
## devtools * 2.4.5 2022-10-11 [1] CRAN (R 4.3.1)
## digest 0.6.31 2022-12-11 [1] CRAN (R 4.3.0)
## dplyr 1.1.2 2023-04-20 [1] CRAN (R 4.3.0)
## DT 0.33 2024-04-04 [3] RSPM (R 4.3.0)
## edgeR * 3.42.2 2023-05-02 [1] Bioconductor
## ellipse 0.4.5 2023-04-05 [1] CRAN (R 4.3.0)
## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0)
## emmeans 1.10.1 2024-04-06 [3] RSPM (R 4.3.0)
## estimability 1.5 2024-02-20 [3] RSPM (R 4.3.0)
## evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0)
## factoextra * 1.0.7 2020-04-01 [3] CRAN (R 4.0.2)
## FactoMineR * 2.10 2024-02-29 [3] RSPM (R 4.3.0)
## fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0)
## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0)
## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
## flashClust 1.01-2 2012-08-21 [3] CRAN (R 4.0.2)
## formatR 1.14 2023-01-17 [1] CRAN (R 4.3.0)
## fs 1.6.2 2023-04-25 [1] CRAN (R 4.3.0)
## futile.logger * 1.4.3 2016-07-10 [1] CRAN (R 4.3.0)
## futile.options 1.0.1 2018-04-20 [1] CRAN (R 4.3.0)
## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
## ggplot2 * 3.4.2 2023-04-03 [1] CRAN (R 4.3.0)
## ggpubr 0.6.0 2023-02-10 [1] CRAN (R 4.3.0)
## ggrepel 0.9.3 2023-02-03 [1] CRAN (R 4.3.0)
## ggsignif 0.6.4 2022-10-13 [1] CRAN (R 4.3.0)
## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
## gridExtra * 2.3 2017-09-09 [1] CRAN (R 4.3.0)
## gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0)
## highr 0.10 2022-12-22 [1] CRAN (R 4.3.0)
## htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0)
## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0)
## httpuv 1.6.11 2023-05-11 [1] CRAN (R 4.3.0)
## httr 1.4.6 2023-05-08 [1] CRAN (R 4.3.0)
## igraph 1.5.1 2023-08-10 [1] CRAN (R 4.3.1)
## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.3.0)
## jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.3.0)
## knitr 1.42 2023-01-25 [1] CRAN (R 4.3.0)
## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.3.0)
## lambda.r 1.2.4 2019-09-18 [1] CRAN (R 4.3.0)
## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.0)
## lattice * 0.22-6 2024-03-20 [3] RSPM (R 4.3.0)
## lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.3.0)
## leaps 3.1 2020-01-16 [3] RSPM (R 4.2.0)
## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
## limma * 3.56.1 2023-05-07 [1] Bioconductor
## locfit 1.5-9.7 2023-01-02 [1] CRAN (R 4.3.0)
## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
## MASS * 7.3-60.0.1 2024-01-13 [3] RSPM (R 4.3.0)
## Matrix 1.6-5 2024-01-11 [3] RSPM (R 4.3.0)
## matrixStats 0.63.0 2022-11-18 [1] CRAN (R 4.3.0)
## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0)
## mgcv 1.9-1 2023-12-21 [3] RSPM (R 4.3.0)
## mime 0.12 2021-09-28 [1] CRAN (R 4.3.0)
## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
## mixOmics * 6.24.0 2023-04-25 [1] Bioconductor (R 4.3.0)
## multcompView 0.1-10 2024-03-08 [3] RSPM (R 4.3.0)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
## mvtnorm 1.2-4 2023-11-27 [3] RSPM (R 4.3.0)
## nlme 3.1-164 2023-11-27 [3] RSPM (R 4.3.0)
## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
## pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.3.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
## pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.3.0)
## plotly * 4.10.2 2023-06-03 [1] CRAN (R 4.3.0)
## plyr 1.8.8 2022-11-11 [1] CRAN (R 4.3.0)
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.3.0)
## processx 3.8.1 2023-04-18 [1] CRAN (R 4.3.0)
## profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.0)
## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0)
## ps 1.7.5 2023-04-18 [1] CRAN (R 4.3.0)
## purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.0)
## R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
## rARPACK 0.11-0 2016-03-10 [1] CRAN (R 4.3.0)
## RColorBrewer * 1.1-3 2022-04-03 [1] CRAN (R 4.3.0)
## Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.3.0)
## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.3.0)
## reshape2 * 1.4.4 2020-04-09 [1] CRAN (R 4.3.0)
## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
## rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.3.0)
## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.0)
## rstatix 0.7.2 2023-02-01 [1] CRAN (R 4.3.0)
## rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0)
## sass 0.4.6 2023-05-03 [1] CRAN (R 4.3.0)
## scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0)
## scatterplot3d 0.3-44 2023-05-05 [3] RSPM (R 4.2.0)
## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
## shiny 1.7.4 2022-12-15 [1] CRAN (R 4.3.0)
## stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.0)
## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
## tidyr 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
## tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.0)
## usethis * 2.1.6 2022-05-25 [1] CRAN (R 4.3.0)
## utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)
## vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.3.0)
## VennDiagram * 1.7.3 2022-04-12 [1] CRAN (R 4.3.3)
## viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.0)
## withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0)
## xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0)
## yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
##
## [1] /home/nathalie/R/x86_64-pc-linux-gnu-library/4.3
## [2] /usr/local/lib/R/site-library
## [3] /usr/lib/R/site-library
## [4] /usr/lib/R/library
##
## ──────────────────────────────────────────────────────────────────────────────