In this computer lab you are going to analyze data from a nutrigenomic study in mouse (Martin et al, 2007). Forty mice were studied on which two sets of variables were acquired:

• Expression levels of 120 genes measured in liver cells, selected (among about 30,000) as potentially relevant in the context of the nutrition study
• Concentrations (in percentages) of 21 hepatic fatty acids (FA) measured by gas chromatography

Biological units (mice) were cross-classified according to two factors experimental design (4 replicates):

• Genotype: 2-levels factor, wild-type (WT) and PPAR$$\alpha$$ -/- (PPAR)
• Diet: 5-levels factor. Oils used for experimental diets preparation were corn and colza oils (50/50) for a reference diet (REF), hydrogenated coconut oil for a saturated fatty acid diet (COC), sunflower oil for an $$\omega$$-6 fatty acid-rich diet (SUN), linseed oil for an $$\omega$$-3 rich diet (LIN) and corn/colza/enriched fish oils for the FISH diet (43/43/14)

The goal of this computer lab is to show you two examples of a combined analysis of the gene expression data and the hepatic fatty acid data. For this purpose you will use the statistical software package R. Since we will focus on the interpretation of the results, no previous exposure to R is required. If you want to learn more about R, see our AMC Graduate School course Computing in R.

The experimental data has been made available in the package mixOmics, which can be installed and activated as follows (in order to excute R code from within RStudio, just click the ‘Play’ button in the upper right hand corner of the chunk). In case you still need to install packages on your computer, just remove the # on the first two lines:

#source("http://bioconductor.org/biocLite.R")
#biocLite("mixOmics")
library(mixOmics)

Now you can load the experimental data of the nutrigenomic study:

data(nutrimouse)

The variable nutrimouse contains four components:

• nutrimouse$gene: data frame, that is a table, with 40 samples (rows) and 120 gene expression variables (columns) • nutrimouse$lipid: data frame with 40 samples (rows) and 21 lipids (columns)
• nutrimouse$diet: factor of 5 levels containing 40 labels for the diet factor • nutrimouse$genotype: factor of 2 levels containing 40 labels for the diet factor

You can for example inspect the fatty acid data as follows:

nutrimouse$lipid ## C14.0 C16.0 C18.0 C16.1n.9 C16.1n.7 C18.1n.9 C18.1n.7 C20.1n.9 C20.3n.9 ## 1 0.34 26.45 10.22 0.35 3.10 16.98 2.41 0.26 0.00 ## 2 0.38 24.04 9.93 0.55 2.54 20.07 3.92 0.23 0.00 ## 3 0.36 23.70 8.96 0.55 2.65 22.89 3.96 0.26 0.19 ## 4 0.22 25.48 8.14 0.49 2.82 21.92 2.52 0.00 0.00 ## 5 0.37 24.80 9.63 0.46 2.85 21.38 2.96 0.30 0.27 ## 6 1.70 26.04 6.59 0.66 7.26 28.23 8.99 0.36 2.89 ## 7 0.35 25.94 9.68 0.36 3.60 17.62 2.15 0.25 0.00 ## 8 0.34 28.63 9.95 0.29 3.27 17.02 1.99 0.31 0.00 ## 9 0.22 25.34 8.81 0.44 2.36 18.39 1.81 0.00 0.00 ## 10 1.38 28.49 5.63 0.90 7.01 36.68 8.85 0.21 2.03 ## 11 0.26 25.73 8.30 0.43 2.74 21.75 2.64 0.00 0.00 ## 12 0.44 24.28 8.63 0.53 3.33 23.86 3.02 0.26 0.20 ## 13 0.32 24.63 9.99 0.45 2.39 17.93 3.49 0.27 0.21 ## 14 0.34 26.04 9.81 0.35 2.36 20.14 2.33 0.26 0.22 ## 15 0.35 24.76 9.38 0.54 2.47 19.66 4.02 0.22 0.19 ## 16 0.24 26.46 10.97 0.31 2.81 14.69 1.88 0.31 0.00 ## 17 1.21 23.45 5.59 0.67 6.31 33.84 8.56 0.47 2.54 ## 18 0.30 29.72 8.95 0.45 2.86 17.79 2.01 0.00 0.00 ## 19 1.30 27.00 5.72 0.81 7.86 33.50 9.88 0.00 2.13 ## 20 0.38 24.09 8.22 0.60 3.89 24.61 3.89 0.27 0.27 ## 21 3.24 23.59 2.68 1.11 13.09 35.61 11.41 0.37 0.22 ## 22 0.60 19.95 3.18 1.21 4.89 35.91 4.58 0.30 0.07 ## 23 0.38 17.64 6.99 0.74 2.58 21.23 2.85 0.26 0.00 ## 24 0.44 22.73 4.71 0.75 2.27 25.10 1.91 0.38 0.00 ## 25 0.47 14.65 4.29 0.66 2.88 23.15 2.85 0.45 0.07 ## 26 0.64 20.49 2.71 1.09 4.05 38.32 5.07 0.26 0.07 ## 27 0.52 18.44 5.21 0.87 3.37 32.04 3.76 0.42 0.00 ## 28 0.49 17.72 6.02 0.58 3.77 21.04 1.77 0.65 0.00 ## 29 0.40 21.70 6.12 0.78 2.07 22.49 1.82 0.44 0.00 ## 30 0.61 16.25 4.55 0.61 5.42 21.43 2.22 0.25 0.00 ## 31 3.19 22.91 3.60 0.99 13.90 33.55 10.64 0.30 0.07 ## 32 2.26 23.27 1.68 1.50 10.38 41.23 15.03 0.20 0.23 ## 33 0.60 20.25 3.09 1.18 3.66 36.59 4.75 0.36 0.07 ## 34 0.54 20.18 4.51 0.71 3.55 24.82 3.77 0.39 0.00 ## 35 0.39 20.71 6.41 0.71 2.18 24.09 1.83 0.40 0.00 ## 36 2.92 21.79 3.56 1.23 10.66 38.27 11.55 0.38 0.33 ## 37 0.63 21.57 3.97 0.82 5.14 25.53 3.18 0.33 0.00 ## 38 0.36 25.23 10.33 0.45 1.59 16.49 1.53 0.37 0.00 ## 39 0.64 16.20 5.77 0.67 4.10 25.12 2.11 0.21 0.00 ## 40 0.40 20.70 7.40 0.63 2.72 19.97 3.13 0.40 0.00 ## C18.2n.6 C18.3n.6 C20.2n.6 C20.3n.6 C20.4n.6 C22.4n.6 C22.5n.6 C18.3n.3 ## 1 8.93 0.00 0.00 0.78 3.07 0.00 0.00 5.97 ## 2 14.98 0.30 0.30 1.64 15.34 0.58 2.10 0.00 ## 3 16.06 0.27 0.33 1.51 13.27 0.54 1.77 0.00 ## 4 13.89 0.00 0.00 1.10 3.92 0.00 0.00 0.49 ## 5 14.55 0.27 0.23 1.58 11.85 0.32 0.44 0.42 ## 6 3.47 2.66 0.00 0.81 5.09 0.00 0.56 0.00 ## 7 8.73 0.00 0.00 0.68 2.57 0.00 0.00 8.40 ## 8 7.75 0.00 0.00 0.72 2.64 0.00 0.00 6.01 ## 9 15.65 0.00 0.00 1.07 4.23 0.00 0.00 0.55 ## 10 2.31 0.00 0.00 0.59 3.67 0.00 0.39 0.00 ## 11 12.64 0.00 0.00 1.05 3.75 0.00 0.00 0.43 ## 12 15.02 0.23 0.21 1.32 10.30 0.27 0.29 0.53 ## 13 15.74 0.28 0.36 1.64 15.76 0.72 2.52 0.00 ## 14 15.53 0.30 0.19 1.37 11.76 0.31 0.45 0.54 ## 15 15.65 0.47 0.35 1.60 15.07 0.66 1.89 0.00 ## 16 9.38 0.00 0.00 0.78 3.12 0.00 0.00 6.22 ## 17 2.77 5.07 0.00 0.67 3.87 0.00 0.43 0.00 ## 18 11.89 0.00 0.00 1.00 3.83 0.00 0.00 0.00 ## 19 3.56 0.00 0.00 0.82 4.96 0.00 0.00 0.00 ## 20 13.54 0.24 0.23 1.44 10.60 0.31 0.36 0.43 ## 21 5.65 0.17 0.05 0.20 1.55 0.00 0.17 0.09 ## 22 21.10 0.41 0.31 0.53 2.98 0.18 0.08 1.00 ## 23 33.14 0.66 0.61 0.74 8.67 0.65 0.98 0.10 ## 24 22.65 0.00 0.27 0.38 1.27 0.00 0.21 1.16 ## 25 40.02 0.85 0.83 1.04 4.72 0.73 1.13 0.14 ## 26 21.52 0.28 0.32 0.35 1.85 0.14 0.07 1.17 ## 27 23.93 0.32 0.09 0.64 4.76 0.26 0.19 1.16 ## 28 13.84 0.09 0.12 0.22 1.39 0.11 0.00 21.50 ## 29 22.05 0.00 0.24 0.47 1.87 0.00 0.19 1.31 ## 30 14.75 0.12 0.10 0.22 1.52 0.00 0.00 21.62 ## 31 5.45 0.18 0.00 0.40 2.97 0.07 0.38 0.07 ## 32 2.79 0.14 0.00 0.11 0.75 0.00 0.07 0.00 ## 33 22.93 0.31 0.33 0.41 2.00 0.17 0.07 1.31 ## 34 31.31 0.46 0.54 0.73 5.47 0.60 0.95 0.17 ## 35 23.09 0.00 0.24 0.40 1.90 0.00 0.00 1.45 ## 36 5.60 0.18 0.07 0.26 1.83 0.05 0.17 0.07 ## 37 12.60 0.08 0.11 0.17 0.96 0.06 0.00 18.11 ## 38 18.25 0.00 0.26 0.57 3.02 0.00 0.00 0.71 ## 39 21.06 0.21 0.13 0.14 1.84 0.00 0.00 14.39 ## 40 27.33 0.43 0.59 0.89 11.18 0.63 1.62 0.00 ## C20.3n.3 C20.5n.3 C22.5n.3 C22.6n.3 ## 1 0.37 8.62 1.75 10.39 ## 2 0.00 0.00 0.48 2.61 ## 3 0.00 0.00 0.22 2.51 ## 4 0.00 2.99 1.04 14.99 ## 5 0.00 0.30 0.35 6.69 ## 6 0.00 0.00 2.13 2.56 ## 7 0.42 7.37 2.05 9.84 ## 8 0.39 7.96 2.33 10.40 ## 9 0.00 3.13 1.65 16.36 ## 10 0.00 0.00 0.00 1.86 ## 11 0.00 2.90 1.16 16.21 ## 12 0.00 0.25 0.43 6.61 ## 13 0.00 0.00 0.00 3.27 ## 14 0.00 0.23 0.45 7.04 ## 15 0.00 0.00 0.00 2.71 ## 16 0.37 9.48 2.01 10.96 ## 17 0.00 0.00 2.58 1.99 ## 18 0.00 2.79 1.08 17.35 ## 19 0.00 0.00 0.00 2.44 ## 20 0.00 0.35 0.31 5.97 ## 21 0.00 0.15 0.00 0.64 ## 22 0.07 0.25 0.24 2.16 ## 23 0.00 0.09 0.00 1.70 ## 24 0.00 2.17 2.04 11.56 ## 25 0.00 0.07 0.06 0.91 ## 26 0.07 0.16 0.15 1.22 ## 27 0.00 0.30 0.28 3.44 ## 28 0.64 4.10 1.94 4.02 ## 29 0.00 2.65 2.16 13.26 ## 30 0.48 3.86 1.55 4.45 ## 31 0.00 0.18 0.00 1.16 ## 32 0.00 0.07 0.00 0.28 ## 33 0.07 0.20 0.23 1.41 ## 34 0.00 0.00 0.21 1.11 ## 35 0.00 2.82 1.80 11.57 ## 36 0.00 0.13 0.30 0.64 ## 37 0.52 2.66 1.27 2.29 ## 38 0.00 2.91 1.64 16.28 ## 39 0.24 2.31 0.98 3.87 ## 40 0.00 0.13 0.00 1.83 The column names of this table indicate the names of the 21 hepatic FAs, see Wikipedia for a clear explanation of the nomenclature; note that ‘n.’ stands for $$\omega$$. The first three rows of the gene expression data can be inspected as follows: head(nutrimouse$gene,3)
##   X36b4 ACAT1 ACAT2  ACBP  ACC1  ACC2 ACOTH ADISP ADSS1 ALDH3  AM2R   AOX
## 1 -0.42 -0.65 -0.84 -0.34 -1.29 -1.13 -0.93 -0.98 -1.19 -0.68 -0.59 -0.16
## 2 -0.44 -0.68 -0.91 -0.32 -1.23 -1.06 -0.99 -0.97 -1.00 -0.62 -0.58 -0.12
## 3 -0.48 -0.74 -1.10 -0.46 -1.30 -1.09 -1.06 -1.08 -1.18 -0.75 -0.66 -0.16
##    BACT  BIEN  BSEP Bcl.3 C16SR  CACP  CAR1   CBS CIDEA  COX1  COX2  CPT2
## 1 -0.22 -0.89 -0.69 -1.18  1.66 -0.92 -0.97 -0.26 -1.21 -1.11 -1.18 -0.87
## 2 -0.32 -0.88 -0.60 -1.07  1.65 -0.87 -0.92 -0.36 -1.17 -1.06 -1.06 -0.87
## 3 -0.32 -0.89 -0.70 -1.17  1.57 -1.02 -0.98 -0.40 -1.29 -1.17 -1.14 -0.95
##   CYP24 CYP26 CYP27a1 CYP27b1 CYP2b10 CYP2b13 CYP2c29 CYP3A11 CYP4A10
## 1 -1.37 -1.21   -0.71   -1.31   -1.23   -1.19   -0.06   -0.09   -0.81
## 2 -1.14 -1.12   -0.62   -1.14   -1.20   -1.06   -0.20   -0.34   -0.88
## 3 -1.30 -1.22   -0.78   -1.29   -1.32   -1.25   -0.30   -0.45   -0.71
##   CYP4A14 CYP7a CYP8b1   FAS   FAT  FDFT   FXR G6PDH G6Pase    GK    GS
## 1   -0.81 -0.77  -0.77 -0.41 -1.03 -0.98 -0.93 -1.22  -0.46 -0.71 -1.24
## 2   -0.84 -0.71  -0.63 -0.37 -0.98 -0.92 -0.87 -1.09  -0.63 -0.67 -1.22
## 3   -0.98 -0.93  -0.53 -0.30 -1.03 -1.04 -1.00 -1.28  -1.06 -0.68 -1.36
##    GSTa GSTmu GSTpi2 HMGCoAred HPNCL  IL.2 L.FABP   LCE  LDLr   LPK   LPL
## 1  0.00  0.02   0.45     -0.95 -0.65 -0.94   0.24  0.09 -0.82 -0.32 -1.01
## 2 -0.05 -0.05   0.30     -0.86 -0.69 -0.94   0.27  0.06 -0.68 -0.39 -0.97
## 3 -0.13 -0.19   0.18     -0.96 -0.75 -1.16   0.17 -0.05 -0.82 -0.38 -1.11
##    LXRa  LXRb  Lpin Lpin1 Lpin2 Lpin3 M.CPT1  MCAD  MDR1  MDR2  MRP6    MS
## 1 -0.82 -1.00 -0.87 -0.85 -0.85 -1.23  -1.15 -0.60 -1.15 -0.77 -0.99 -1.11
## 2 -0.82 -0.95 -0.97 -0.99 -0.87 -1.12  -1.06 -0.62 -1.10 -0.65 -0.85 -1.06
## 3 -0.91 -1.16 -0.95 -0.94 -0.90 -1.25  -1.26 -0.70 -1.26 -0.86 -0.90 -1.20
##   MTHFR NGFiB NURR1  Ntcp OCTN2   PAL  PDK4  PECI  PLTP PMDCI   PON PPARa
## 1 -0.96 -1.21 -1.21 -0.49 -1.15 -1.32 -1.16 -0.68 -1.10 -0.52 -0.52 -0.93
## 2 -0.99 -1.08 -1.10 -0.45 -1.15 -1.25 -1.16 -0.69 -0.99 -0.52 -0.55 -0.86
## 3 -1.10 -1.24 -1.32 -0.44 -1.20 -1.16 -1.27 -0.92 -1.03 -0.60 -0.65 -0.95
##   PPARd PPARg   PXR Pex11a  RARa RARb2  RXRa RXRb2 RXRg1   S14  SHP1
## 1 -1.51 -1.06 -0.99  -1.00 -1.20 -1.19 -0.67 -0.95 -1.16 -0.93 -1.10
## 2 -1.59 -1.02 -0.96  -1.02 -1.06 -1.11 -0.59 -0.95 -1.10 -0.86 -0.97
## 3 -1.71 -1.14 -1.10  -1.20 -1.16 -1.23 -0.68 -1.07 -1.21 -0.84 -1.09
##   SIAT4c SPI1.1 SR.BI   THB THIOL   TRa   TRb Tpalpha Tpbeta  UCP2  UCP3
## 1  -1.07   1.19 -0.84 -0.79 -0.18 -1.48 -1.07   -0.69  -1.11 -1.06 -1.19
## 2  -0.97   1.15 -0.86 -0.85 -0.15 -1.46 -1.00   -0.74  -1.09 -0.99 -1.10
## 3  -1.04   1.09 -0.95 -0.92 -0.24 -1.58 -1.16   -0.81  -1.14 -1.05 -1.20
##     VDR VLDLr  Waf1   ap2 apoA.I  apoB apoC3 apoE c.fos cHMGCoAS cMOAT
## 1 -1.17 -1.08 -1.18 -1.19   0.76 -0.12 -0.49 1.08 -1.15    -1.15 -0.78
## 2 -1.17 -0.99 -1.12 -1.16   0.86 -0.13 -0.35 1.12 -1.08    -0.95 -0.73
## 3 -1.30 -1.13 -1.30 -1.37   0.82 -0.19 -0.42 1.04 -1.18    -0.93 -0.89
##   eif2g hABC1 i.BABP i.BAT i.FABP i.NOS mABC1 mHMGCoAS
## 1 -1.23 -1.16  -0.78 -1.65  -1.14 -1.24 -0.85    -0.03
## 2 -1.02 -1.11  -0.73 -1.67  -1.03 -1.20 -0.84    -0.12
## 3 -1.14 -1.25  -0.89 -1.89  -1.16 -1.35 -0.96    -0.12

Now write the gene names to a text file:

write(colnames(nutrimouse$gene),file = "geneNames.txt") Question Use the DAVID Functional Annotation Tool to find pathways that are enriched for the genes probed in this experiment. Note that DAVID will not be able to map a large part of the genes, since non-standard gene names were used; go ahead nevertheless. Which are the most enriched pathways and do they make sense in the context of this experiment? Answer Among the most enriched KEGG pathways are the PPAR signaling pathway and the Fatty acid degradation pathway, which are obviously relevant to this particular experiment. Let us now have a look at the data itself and in particular the correlation structure of the two datasets. Correlation matrices within and between gene expression and fatty acid data can be visualised as follows: X <- nutrimouse$lipid
Y <- nutrimouse$gene # You might have to increase the size of the plot window in RStudio imgCor(X, Y, X.var.names = FALSE, Y.var.names = FALSE, interactive.dev=FALSE) ## [1] 141 Question Can you detect any structure in the correlation matrices. If so, which? Answer • Gene (within, red bar): It seems that overall there is quite a high correlation between the expression profiles of most genes, as indicated by the general trend towards the red color range. Since the genes were simply alphabetically ordered, it is difficult to detect clusters of highly correlating genes from this figure. Hierarchical clustering of gene expression profiles would be a suitable approach to do so. • Lipid (within, blue bar): It seems that there are some blocks of highly correlating lipid profiles (on the diagonal). Given that the FAs are roughly ordered by the number of carbon atoms and the number of double bonds, this pattern might be related to some of the structural properties of the FAs. Again hierarchical clustering of lipid profiles would be a suitable approach to detect clusters of correlating lipids. • Between: Some FAs seem to be correlating with a large number of genes, as indicated by the cells in the yellow-red color range. Also some anti-correlation can be observed (cells in the blue color range). Below you will use two different methods to investigate the within and between correlation structure in a more systematic way. ## WGCNA Weighted gene co-expression network analysis (WGCNA) is an often used method for finding clusters (modules) of highly correlated genes and for relating modules to external sample traits (in this case, hepatic fatty acid profiles). The corresponding packages can be installed (remove the # on the first two lines) and activated as follows: #source("http://bioconductor.org/biocLite.R") #biocLite("WGCNA") library(fastcluster) library(WGCNA) Let us first have another look at the two data sets separately. First, perform hierarchical clustering on the samples: ## Clustering gene expression data using Euclidean distance and average linkage sampleTree <- hclust(dist(Y), method="average") plot(sampleTree,labels=with(nutrimouse,paste(genotype,diet))) ## Clustering fatty acid data using Euclidean distance and average linkage sampleTree <- hclust(dist(X), method="average") plot(sampleTree,labels=with(nutrimouse,paste(genotype,diet))) Question What can you say about the effects of diet and genotype on hepatic gene expression and FA content? Answer • Gene: There is a strong effect of genotype as can be seen from the two clusters roughly separating the WT and PPAR mice. Within the two clusters, a relatively strong effect of the FISH diet can be observed in the WT mice. Overall dietary effects were more pronounced in wild-type mice than in PPAR mice. See also (Martin et al, 2007). • Lipid: Both diet and genotype effects are present. Mice fed a COC diet (highly enriched for saturated fatty acids, that is, FAs without double bonds) cluster separately from the other mice. As explained in this morning’s lecture, the first step of the WGCNA approach consists of the choice of the soft thresholding power $$\beta$$ to which co-expression similarity is raised to calculate adjacency. The choice of the soft thresholding power is based on the criterion of approximate scale-free topology. The following chunk of R code can be used to construct a gene co-expression network and choose a value for $$\beta$$: # Try values 1,2, ...,10,12,14,...,20 powers <- c(c(1:10), seq(from = 12, to=20, by=2)) # Call the network topology analysis function on the gene expression profiles sft <- pickSoftThreshold(Y, powerVector = powers, verbose = 5) ## pickSoftThreshold: will use block size 120. ## pickSoftThreshold: calculating connectivity for given powers... ## ..working on genes 1 through 120 of 120 ## Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. ## 1 1 0.8290 1.960 0.9220 46.700 47.5000 65.20 ## 2 2 0.3870 0.528 0.7840 24.400 23.7000 42.80 ## 3 3 0.0436 -0.152 0.4090 14.500 13.9000 30.50 ## 4 4 0.3090 -0.464 0.6020 9.310 8.2200 22.70 ## 5 5 0.5970 -0.643 0.8060 6.280 5.0300 17.40 ## 6 6 0.7050 -0.761 0.8790 4.400 3.2300 13.60 ## 7 7 0.7550 -0.910 0.8260 3.170 2.1200 10.80 ## 8 8 0.8470 -0.976 0.8870 2.340 1.4600 8.67 ## 9 9 0.9030 -1.020 0.9570 1.750 1.0100 7.11 ## 10 10 0.7870 -1.110 0.7790 1.340 0.7540 5.88 ## 11 12 0.8270 -1.180 0.8590 0.806 0.4240 4.08 ## 12 14 0.2150 -2.570 0.1270 0.506 0.2400 2.89 ## 13 16 0.8510 -1.180 0.8960 0.328 0.1380 2.07 ## 14 18 0.1610 -2.050 -0.0731 0.219 0.0761 1.50 ## 15 20 0.8810 -1.180 0.9450 0.150 0.0430 1.10 # Plot the results: par(mfrow = c(1,2)) cex1 <- 0.9 # Scale-free topology fit index as a function of the soft-thresholding power plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n", main = paste("Scale independence"))
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],labels=powers,cex=cex1,col="red") # This line corresponds to using an R^2 cut-off of h abline(h=0.90,col="red") # Mean connectivity as a function of the soft-thresholding power plot(sft$fitIndices[,1], sft$fitIndices[,5], xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n", main = paste("Mean connectivity")) text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red") Question Which value for $$\beta$$ would you choose? Answer We choose the power 9, which is the lowest power for which the scale-free topology fit index curve - more or less - flattens out upon reaching a high value (in this case, roughly 0.90). Note the irregular behaviour for higher powers, probably caused by the relatively small number of genes used for constructing the co-expression network. Question Can you explain why the mean connectivity is a decreasing function of the soft thresholding power? Answer The connectivity of a gene is defined is the sum of the weighted correlations with all other genes. Since raising an absolute correlation value to powers >1 can only make it smaller, the gene connectivity and mean connectivity are lower for higher powers. Constructing the gene network and identifying modules is now a simple function call (type ?blockwiseModules in the R console if you want to have detailed explanation of the many different parameters of this function): net <- blockwiseModules(Y, power = 9, TOMType = "unsigned", minModuleSize = 5, reassignThreshold = 0, mergeCutHeight = 0.25, numericLabels = TRUE, pamRespectsDendro = FALSE, saveTOMs = FALSE, verbose = 0) The resulting dendrogram can be displayed together with the color assignment using the following code: # Convert labels to colors for plotting mergedColors <- labels2colors(net$colors)
# Plot the dendrogram and the module colors underneath
plotDendroAndColors(net$dendrograms[[1]], mergedColors[net$blockGenes[[1]]], "Module colors", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05)

# Frequency table of the number of genes in each module
table(mergedColors)
## mergedColors
##      blue     brown      grey turquoise    yellow
##        27        10        11        65         7

Question How should the grey module be interpreted?

Answer The grey label is reserved for genes outside of all genuine modules.You can clearly see this by the height of the branches for most genes belonging to the grey module. The right-most pair of genes has also been assigned to the grey module, since in our call to blockwiseModules we imposed a minimum module size of five (argument minModuleSize).

You can extract, for example, the genes belonging to the turqoise module and write them to a text file as follows:

turquoiseGenes <- colnames(Y)[mergedColors=="turquoise"]
write(turquoiseGenes, file="turquoiseGenes.txt")

Question Without doing a full-fledged enrichment analysis, can you speculate about the main pathways these genes are involved in?

• Xenobiotic metabolism is evident from the presence of a large number of cytochrome P450 genes and of the constitutive androstane receptor (CAR1), a key regulator of xenobiotic and endobiotic metabolism
• PPAR signaling
• Chlolesterol and fatty acid metabolism: LXRs, RXRs

We would now like to identify modules that are significantly associated with the measured FA profiles (‘traits’). For this purpose each module is summarized by the profile of its first eigengene and then module eigengenes are correlated with external traits:

nSamples = nrow(Y)

# Recalculate module eigengenes with color labels
MEs0 = moduleEigengenes(Y, mergedColors)\$eigengenes
MEs = orderMEs(MEs0)

# Calculate correlation of module eigengenes with the hepatic fatty acid profiles ('traits')
moduleTraitCor = cor(MEs, X, use = "p")
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples)

# Display correlations and their p-values
textMatrix = paste(signif(moduleTraitCor, 2), "\n(", signif(moduleTraitPvalue, 1), ")", sep = "")
dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3))
# Display the correlation values within a heatmap plot
labeledHeatmap(Matrix = moduleTraitCor, xLabels = names(X), yLabels = names(MEs), ySymbols = names(MEs), colorLabels = FALSE, colors = blueWhiteRed(50), textMatrix = textMatrix, setStdMargins = FALSE,
cex.text = 0.5, zlim = c(-1,1), main = paste("Module-trait relationships"))