In this computer lab you are going to analyze data from a nutrigenomic study in mouse (Martin et al, 2007). Forty mice were studied on which two sets of variables were acquired:

Biological units (mice) were cross-classified according to two factors experimental design (4 replicates):

The goal of this computer lab is to show you two examples of a combined analysis of the gene expression data and the hepatic fatty acid data. For this purpose you will use the statistical software package R. Since we will focus on the interpretation of the results, no previous exposure to R is required. If you want to learn more about R, see our AMC Graduate School course Computing in R.

The experimental data has been made available in the package mixOmics, which can be installed and activated as follows (in order to excute R code from within RStudio, just click the ‘Play’ button in the upper right hand corner of the chunk). In case you still need to install packages on your computer, just remove the # on the first two lines:

#source("http://bioconductor.org/biocLite.R")
#biocLite("mixOmics")
library(mixOmics)

Now you can load the experimental data of the nutrigenomic study:

data(nutrimouse)

The variable nutrimouse contains four components:

You can for example inspect the fatty acid data as follows:

nutrimouse$lipid
##    C14.0 C16.0 C18.0 C16.1n.9 C16.1n.7 C18.1n.9 C18.1n.7 C20.1n.9 C20.3n.9
## 1   0.34 26.45 10.22     0.35     3.10    16.98     2.41     0.26     0.00
## 2   0.38 24.04  9.93     0.55     2.54    20.07     3.92     0.23     0.00
## 3   0.36 23.70  8.96     0.55     2.65    22.89     3.96     0.26     0.19
## 4   0.22 25.48  8.14     0.49     2.82    21.92     2.52     0.00     0.00
## 5   0.37 24.80  9.63     0.46     2.85    21.38     2.96     0.30     0.27
## 6   1.70 26.04  6.59     0.66     7.26    28.23     8.99     0.36     2.89
## 7   0.35 25.94  9.68     0.36     3.60    17.62     2.15     0.25     0.00
## 8   0.34 28.63  9.95     0.29     3.27    17.02     1.99     0.31     0.00
## 9   0.22 25.34  8.81     0.44     2.36    18.39     1.81     0.00     0.00
## 10  1.38 28.49  5.63     0.90     7.01    36.68     8.85     0.21     2.03
## 11  0.26 25.73  8.30     0.43     2.74    21.75     2.64     0.00     0.00
## 12  0.44 24.28  8.63     0.53     3.33    23.86     3.02     0.26     0.20
## 13  0.32 24.63  9.99     0.45     2.39    17.93     3.49     0.27     0.21
## 14  0.34 26.04  9.81     0.35     2.36    20.14     2.33     0.26     0.22
## 15  0.35 24.76  9.38     0.54     2.47    19.66     4.02     0.22     0.19
## 16  0.24 26.46 10.97     0.31     2.81    14.69     1.88     0.31     0.00
## 17  1.21 23.45  5.59     0.67     6.31    33.84     8.56     0.47     2.54
## 18  0.30 29.72  8.95     0.45     2.86    17.79     2.01     0.00     0.00
## 19  1.30 27.00  5.72     0.81     7.86    33.50     9.88     0.00     2.13
## 20  0.38 24.09  8.22     0.60     3.89    24.61     3.89     0.27     0.27
## 21  3.24 23.59  2.68     1.11    13.09    35.61    11.41     0.37     0.22
## 22  0.60 19.95  3.18     1.21     4.89    35.91     4.58     0.30     0.07
## 23  0.38 17.64  6.99     0.74     2.58    21.23     2.85     0.26     0.00
## 24  0.44 22.73  4.71     0.75     2.27    25.10     1.91     0.38     0.00
## 25  0.47 14.65  4.29     0.66     2.88    23.15     2.85     0.45     0.07
## 26  0.64 20.49  2.71     1.09     4.05    38.32     5.07     0.26     0.07
## 27  0.52 18.44  5.21     0.87     3.37    32.04     3.76     0.42     0.00
## 28  0.49 17.72  6.02     0.58     3.77    21.04     1.77     0.65     0.00
## 29  0.40 21.70  6.12     0.78     2.07    22.49     1.82     0.44     0.00
## 30  0.61 16.25  4.55     0.61     5.42    21.43     2.22     0.25     0.00
## 31  3.19 22.91  3.60     0.99    13.90    33.55    10.64     0.30     0.07
## 32  2.26 23.27  1.68     1.50    10.38    41.23    15.03     0.20     0.23
## 33  0.60 20.25  3.09     1.18     3.66    36.59     4.75     0.36     0.07
## 34  0.54 20.18  4.51     0.71     3.55    24.82     3.77     0.39     0.00
## 35  0.39 20.71  6.41     0.71     2.18    24.09     1.83     0.40     0.00
## 36  2.92 21.79  3.56     1.23    10.66    38.27    11.55     0.38     0.33
## 37  0.63 21.57  3.97     0.82     5.14    25.53     3.18     0.33     0.00
## 38  0.36 25.23 10.33     0.45     1.59    16.49     1.53     0.37     0.00
## 39  0.64 16.20  5.77     0.67     4.10    25.12     2.11     0.21     0.00
## 40  0.40 20.70  7.40     0.63     2.72    19.97     3.13     0.40     0.00
##    C18.2n.6 C18.3n.6 C20.2n.6 C20.3n.6 C20.4n.6 C22.4n.6 C22.5n.6 C18.3n.3
## 1      8.93     0.00     0.00     0.78     3.07     0.00     0.00     5.97
## 2     14.98     0.30     0.30     1.64    15.34     0.58     2.10     0.00
## 3     16.06     0.27     0.33     1.51    13.27     0.54     1.77     0.00
## 4     13.89     0.00     0.00     1.10     3.92     0.00     0.00     0.49
## 5     14.55     0.27     0.23     1.58    11.85     0.32     0.44     0.42
## 6      3.47     2.66     0.00     0.81     5.09     0.00     0.56     0.00
## 7      8.73     0.00     0.00     0.68     2.57     0.00     0.00     8.40
## 8      7.75     0.00     0.00     0.72     2.64     0.00     0.00     6.01
## 9     15.65     0.00     0.00     1.07     4.23     0.00     0.00     0.55
## 10     2.31     0.00     0.00     0.59     3.67     0.00     0.39     0.00
## 11    12.64     0.00     0.00     1.05     3.75     0.00     0.00     0.43
## 12    15.02     0.23     0.21     1.32    10.30     0.27     0.29     0.53
## 13    15.74     0.28     0.36     1.64    15.76     0.72     2.52     0.00
## 14    15.53     0.30     0.19     1.37    11.76     0.31     0.45     0.54
## 15    15.65     0.47     0.35     1.60    15.07     0.66     1.89     0.00
## 16     9.38     0.00     0.00     0.78     3.12     0.00     0.00     6.22
## 17     2.77     5.07     0.00     0.67     3.87     0.00     0.43     0.00
## 18    11.89     0.00     0.00     1.00     3.83     0.00     0.00     0.00
## 19     3.56     0.00     0.00     0.82     4.96     0.00     0.00     0.00
## 20    13.54     0.24     0.23     1.44    10.60     0.31     0.36     0.43
## 21     5.65     0.17     0.05     0.20     1.55     0.00     0.17     0.09
## 22    21.10     0.41     0.31     0.53     2.98     0.18     0.08     1.00
## 23    33.14     0.66     0.61     0.74     8.67     0.65     0.98     0.10
## 24    22.65     0.00     0.27     0.38     1.27     0.00     0.21     1.16
## 25    40.02     0.85     0.83     1.04     4.72     0.73     1.13     0.14
## 26    21.52     0.28     0.32     0.35     1.85     0.14     0.07     1.17
## 27    23.93     0.32     0.09     0.64     4.76     0.26     0.19     1.16
## 28    13.84     0.09     0.12     0.22     1.39     0.11     0.00    21.50
## 29    22.05     0.00     0.24     0.47     1.87     0.00     0.19     1.31
## 30    14.75     0.12     0.10     0.22     1.52     0.00     0.00    21.62
## 31     5.45     0.18     0.00     0.40     2.97     0.07     0.38     0.07
## 32     2.79     0.14     0.00     0.11     0.75     0.00     0.07     0.00
## 33    22.93     0.31     0.33     0.41     2.00     0.17     0.07     1.31
## 34    31.31     0.46     0.54     0.73     5.47     0.60     0.95     0.17
## 35    23.09     0.00     0.24     0.40     1.90     0.00     0.00     1.45
## 36     5.60     0.18     0.07     0.26     1.83     0.05     0.17     0.07
## 37    12.60     0.08     0.11     0.17     0.96     0.06     0.00    18.11
## 38    18.25     0.00     0.26     0.57     3.02     0.00     0.00     0.71
## 39    21.06     0.21     0.13     0.14     1.84     0.00     0.00    14.39
## 40    27.33     0.43     0.59     0.89    11.18     0.63     1.62     0.00
##    C20.3n.3 C20.5n.3 C22.5n.3 C22.6n.3
## 1      0.37     8.62     1.75    10.39
## 2      0.00     0.00     0.48     2.61
## 3      0.00     0.00     0.22     2.51
## 4      0.00     2.99     1.04    14.99
## 5      0.00     0.30     0.35     6.69
## 6      0.00     0.00     2.13     2.56
## 7      0.42     7.37     2.05     9.84
## 8      0.39     7.96     2.33    10.40
## 9      0.00     3.13     1.65    16.36
## 10     0.00     0.00     0.00     1.86
## 11     0.00     2.90     1.16    16.21
## 12     0.00     0.25     0.43     6.61
## 13     0.00     0.00     0.00     3.27
## 14     0.00     0.23     0.45     7.04
## 15     0.00     0.00     0.00     2.71
## 16     0.37     9.48     2.01    10.96
## 17     0.00     0.00     2.58     1.99
## 18     0.00     2.79     1.08    17.35
## 19     0.00     0.00     0.00     2.44
## 20     0.00     0.35     0.31     5.97
## 21     0.00     0.15     0.00     0.64
## 22     0.07     0.25     0.24     2.16
## 23     0.00     0.09     0.00     1.70
## 24     0.00     2.17     2.04    11.56
## 25     0.00     0.07     0.06     0.91
## 26     0.07     0.16     0.15     1.22
## 27     0.00     0.30     0.28     3.44
## 28     0.64     4.10     1.94     4.02
## 29     0.00     2.65     2.16    13.26
## 30     0.48     3.86     1.55     4.45
## 31     0.00     0.18     0.00     1.16
## 32     0.00     0.07     0.00     0.28
## 33     0.07     0.20     0.23     1.41
## 34     0.00     0.00     0.21     1.11
## 35     0.00     2.82     1.80    11.57
## 36     0.00     0.13     0.30     0.64
## 37     0.52     2.66     1.27     2.29
## 38     0.00     2.91     1.64    16.28
## 39     0.24     2.31     0.98     3.87
## 40     0.00     0.13     0.00     1.83

The column names of this table indicate the names of the 21 hepatic FAs, see Wikipedia for a clear explanation of the nomenclature; note that ‘n.’ stands for \(\omega\).

The first three rows of the gene expression data can be inspected as follows:

head(nutrimouse$gene,3)
##   X36b4 ACAT1 ACAT2  ACBP  ACC1  ACC2 ACOTH ADISP ADSS1 ALDH3  AM2R   AOX
## 1 -0.42 -0.65 -0.84 -0.34 -1.29 -1.13 -0.93 -0.98 -1.19 -0.68 -0.59 -0.16
## 2 -0.44 -0.68 -0.91 -0.32 -1.23 -1.06 -0.99 -0.97 -1.00 -0.62 -0.58 -0.12
## 3 -0.48 -0.74 -1.10 -0.46 -1.30 -1.09 -1.06 -1.08 -1.18 -0.75 -0.66 -0.16
##    BACT  BIEN  BSEP Bcl.3 C16SR  CACP  CAR1   CBS CIDEA  COX1  COX2  CPT2
## 1 -0.22 -0.89 -0.69 -1.18  1.66 -0.92 -0.97 -0.26 -1.21 -1.11 -1.18 -0.87
## 2 -0.32 -0.88 -0.60 -1.07  1.65 -0.87 -0.92 -0.36 -1.17 -1.06 -1.06 -0.87
## 3 -0.32 -0.89 -0.70 -1.17  1.57 -1.02 -0.98 -0.40 -1.29 -1.17 -1.14 -0.95
##   CYP24 CYP26 CYP27a1 CYP27b1 CYP2b10 CYP2b13 CYP2c29 CYP3A11 CYP4A10
## 1 -1.37 -1.21   -0.71   -1.31   -1.23   -1.19   -0.06   -0.09   -0.81
## 2 -1.14 -1.12   -0.62   -1.14   -1.20   -1.06   -0.20   -0.34   -0.88
## 3 -1.30 -1.22   -0.78   -1.29   -1.32   -1.25   -0.30   -0.45   -0.71
##   CYP4A14 CYP7a CYP8b1   FAS   FAT  FDFT   FXR G6PDH G6Pase    GK    GS
## 1   -0.81 -0.77  -0.77 -0.41 -1.03 -0.98 -0.93 -1.22  -0.46 -0.71 -1.24
## 2   -0.84 -0.71  -0.63 -0.37 -0.98 -0.92 -0.87 -1.09  -0.63 -0.67 -1.22
## 3   -0.98 -0.93  -0.53 -0.30 -1.03 -1.04 -1.00 -1.28  -1.06 -0.68 -1.36
##    GSTa GSTmu GSTpi2 HMGCoAred HPNCL  IL.2 L.FABP   LCE  LDLr   LPK   LPL
## 1  0.00  0.02   0.45     -0.95 -0.65 -0.94   0.24  0.09 -0.82 -0.32 -1.01
## 2 -0.05 -0.05   0.30     -0.86 -0.69 -0.94   0.27  0.06 -0.68 -0.39 -0.97
## 3 -0.13 -0.19   0.18     -0.96 -0.75 -1.16   0.17 -0.05 -0.82 -0.38 -1.11
##    LXRa  LXRb  Lpin Lpin1 Lpin2 Lpin3 M.CPT1  MCAD  MDR1  MDR2  MRP6    MS
## 1 -0.82 -1.00 -0.87 -0.85 -0.85 -1.23  -1.15 -0.60 -1.15 -0.77 -0.99 -1.11
## 2 -0.82 -0.95 -0.97 -0.99 -0.87 -1.12  -1.06 -0.62 -1.10 -0.65 -0.85 -1.06
## 3 -0.91 -1.16 -0.95 -0.94 -0.90 -1.25  -1.26 -0.70 -1.26 -0.86 -0.90 -1.20
##   MTHFR NGFiB NURR1  Ntcp OCTN2   PAL  PDK4  PECI  PLTP PMDCI   PON PPARa
## 1 -0.96 -1.21 -1.21 -0.49 -1.15 -1.32 -1.16 -0.68 -1.10 -0.52 -0.52 -0.93
## 2 -0.99 -1.08 -1.10 -0.45 -1.15 -1.25 -1.16 -0.69 -0.99 -0.52 -0.55 -0.86
## 3 -1.10 -1.24 -1.32 -0.44 -1.20 -1.16 -1.27 -0.92 -1.03 -0.60 -0.65 -0.95
##   PPARd PPARg   PXR Pex11a  RARa RARb2  RXRa RXRb2 RXRg1   S14  SHP1
## 1 -1.51 -1.06 -0.99  -1.00 -1.20 -1.19 -0.67 -0.95 -1.16 -0.93 -1.10
## 2 -1.59 -1.02 -0.96  -1.02 -1.06 -1.11 -0.59 -0.95 -1.10 -0.86 -0.97
## 3 -1.71 -1.14 -1.10  -1.20 -1.16 -1.23 -0.68 -1.07 -1.21 -0.84 -1.09
##   SIAT4c SPI1.1 SR.BI   THB THIOL   TRa   TRb Tpalpha Tpbeta  UCP2  UCP3
## 1  -1.07   1.19 -0.84 -0.79 -0.18 -1.48 -1.07   -0.69  -1.11 -1.06 -1.19
## 2  -0.97   1.15 -0.86 -0.85 -0.15 -1.46 -1.00   -0.74  -1.09 -0.99 -1.10
## 3  -1.04   1.09 -0.95 -0.92 -0.24 -1.58 -1.16   -0.81  -1.14 -1.05 -1.20
##     VDR VLDLr  Waf1   ap2 apoA.I  apoB apoC3 apoE c.fos cHMGCoAS cMOAT
## 1 -1.17 -1.08 -1.18 -1.19   0.76 -0.12 -0.49 1.08 -1.15    -1.15 -0.78
## 2 -1.17 -0.99 -1.12 -1.16   0.86 -0.13 -0.35 1.12 -1.08    -0.95 -0.73
## 3 -1.30 -1.13 -1.30 -1.37   0.82 -0.19 -0.42 1.04 -1.18    -0.93 -0.89
##   eif2g hABC1 i.BABP i.BAT i.FABP i.NOS mABC1 mHMGCoAS
## 1 -1.23 -1.16  -0.78 -1.65  -1.14 -1.24 -0.85    -0.03
## 2 -1.02 -1.11  -0.73 -1.67  -1.03 -1.20 -0.84    -0.12
## 3 -1.14 -1.25  -0.89 -1.89  -1.16 -1.35 -0.96    -0.12

Now write the gene names to a text file:

write(colnames(nutrimouse$gene),file = "geneNames.txt")

Question Use the DAVID Functional Annotation Tool to find pathways that are enriched for the genes probed in this experiment. Note that DAVID will not be able to map a large part of the genes, since non-standard gene names were used; go ahead nevertheless. Which are the most enriched pathways and do they make sense in the context of this experiment?

Let us now have a look at the data itself and in particular the correlation structure of the two datasets. Correlation matrices within and between gene expression and fatty acid data can be visualised as follows:

X <- nutrimouse$lipid
Y <- nutrimouse$gene
# You might have to increase the size of the plot window in RStudio
imgCor(X, Y, X.var.names = FALSE, Y.var.names = FALSE, interactive.dev=FALSE)
## [1] 141

Question Can you detect any structure in the correlation matrices. If so, which?

Below you will use two different methods to investigate the within and between correlation structure in a more systematic way.

WGCNA


Weighted gene co-expression network analysis (WGCNA) is an often used method for finding clusters (modules) of highly correlated genes and for relating modules to external sample traits (in this case, hepatic fatty acid profiles).

The corresponding packages can be installed (remove the # on the first two lines) and activated as follows:

#source("http://bioconductor.org/biocLite.R")
#biocLite("WGCNA")
library(fastcluster)
library(WGCNA)

Let us first have another look at the two data sets separately. First, perform hierarchical clustering on the samples:

## Clustering gene expression data using Euclidean distance and average linkage
sampleTree <- hclust(dist(Y), method="average")
plot(sampleTree,labels=with(nutrimouse,paste(genotype,diet)))

## Clustering fatty acid data using Euclidean distance and average linkage 
sampleTree <- hclust(dist(X), method="average")
plot(sampleTree,labels=with(nutrimouse,paste(genotype,diet)))

Question What can you say about the effects of diet and genotype on hepatic gene expression and FA content?

As explained in this morning’s lecture, the first step of the WGCNA approach consists of the choice of the soft thresholding power \(\beta\) to which co-expression similarity is raised to calculate adjacency. The choice of the soft thresholding power is based on the criterion of approximate scale-free topology. The following chunk of R code can be used to construct a gene co-expression network and choose a value for \(\beta\):

# Try values 1,2, ...,10,12,14,...,20 
powers <- c(c(1:10), seq(from = 12, to=20, by=2))
# Call the network topology analysis function on the gene expression profiles
sft <- pickSoftThreshold(Y, powerVector = powers, verbose = 5)
## pickSoftThreshold: will use block size 120.
##  pickSoftThreshold: calculating connectivity for given powers...
##    ..working on genes 1 through 120 of 120
##    Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
## 1      1   0.8290  1.960         0.9220  46.700   47.5000  65.20
## 2      2   0.3870  0.528         0.7840  24.400   23.7000  42.80
## 3      3   0.0436 -0.152         0.4090  14.500   13.9000  30.50
## 4      4   0.3090 -0.464         0.6020   9.310    8.2200  22.70
## 5      5   0.5970 -0.643         0.8060   6.280    5.0300  17.40
## 6      6   0.7050 -0.761         0.8790   4.400    3.2300  13.60
## 7      7   0.7550 -0.910         0.8260   3.170    2.1200  10.80
## 8      8   0.8470 -0.976         0.8870   2.340    1.4600   8.67
## 9      9   0.9030 -1.020         0.9570   1.750    1.0100   7.11
## 10    10   0.7870 -1.110         0.7790   1.340    0.7540   5.88
## 11    12   0.8270 -1.180         0.8590   0.806    0.4240   4.08
## 12    14   0.2150 -2.570         0.1270   0.506    0.2400   2.89
## 13    16   0.8510 -1.180         0.8960   0.328    0.1380   2.07
## 14    18   0.1610 -2.050        -0.0731   0.219    0.0761   1.50
## 15    20   0.8810 -1.180         0.9450   0.150    0.0430   1.10
# Plot the results:
par(mfrow = c(1,2))
cex1 <- 0.9
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n", main = paste("Scale independence"))
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],labels=powers,cex=cex1,col="red")
# This line corresponds to using an R^2 cut-off of h
abline(h=0.90,col="red")
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5], xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n", main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")

Question Which value for \(\beta\) would you choose?

Question Can you explain why the mean connectivity is a decreasing function of the soft thresholding power?

Constructing the gene network and identifying modules is now a simple function call (type ?blockwiseModules in the R console if you want to have detailed explanation of the many different parameters of this function):

net <- blockwiseModules(Y, power = 9, TOMType = "unsigned", minModuleSize = 5, reassignThreshold = 0, mergeCutHeight = 0.25, numericLabels = TRUE, pamRespectsDendro = FALSE, saveTOMs = FALSE, verbose = 0)

The resulting dendrogram can be displayed together with the color assignment using the following code:

# Convert labels to colors for plotting
mergedColors <- labels2colors(net$colors)
# Plot the dendrogram and the module colors underneath
plotDendroAndColors(net$dendrograms[[1]], mergedColors[net$blockGenes[[1]]], "Module colors", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05)

# Frequency table of the number of genes in each module
table(mergedColors)
## mergedColors
##      blue     brown      grey turquoise    yellow 
##        27        10        11        65         7

Question How should the grey module be interpreted?

You can extract, for example, the genes belonging to the turqoise module and write them to a text file as follows:

turquoiseGenes <- colnames(Y)[mergedColors=="turquoise"]
write(turquoiseGenes, file="turquoiseGenes.txt")

Question Without doing a full-fledged enrichment analysis, can you speculate about the main pathways these genes are involved in?

We would now like to identify modules that are significantly associated with the measured FA profiles (‘traits’). For this purpose each module is summarized by the profile of its first eigengene and then module eigengenes are correlated with external traits:

nSamples = nrow(Y)

# Recalculate module eigengenes with color labels
MEs0 = moduleEigengenes(Y, mergedColors)$eigengenes
MEs = orderMEs(MEs0)

# Calculate correlation of module eigengenes with the hepatic fatty acid profiles ('traits')
moduleTraitCor = cor(MEs, X, use = "p")
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)

# Display correlations and their p-values
textMatrix = paste(signif(moduleTraitCor, 2), "\n(", signif(moduleTraitPvalue, 1), ")", sep = "")
dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3))
# Display the correlation values within a heatmap plot
labeledHeatmap(Matrix = moduleTraitCor, xLabels = names(X), yLabels = names(MEs), ySymbols = names(MEs), colorLabels = FALSE, colors = blueWhiteRed(50), textMatrix = textMatrix, setStdMargins = FALSE, 
cex.text = 0.5, zlim = c(-1,1), main = paste("Module-trait relationships"))