0.1 R Markdown metadata

In the Markdown options above, change the author and date strings so that they accurately reflect you and your submission.

1 Overview

Today’s lab will recapitulate the same exploratory data analysis of annotated gene expression data that we carried out earlier using Python, this time using any combination of R, Bioconductor, and ggplot2 that you’d like. Again, there are intentionally strong similarities with the problem set, so keep an eye out for opportunities to constructively transfer knowledge between the activities.

We’ll continue using the same gene expression dataset as before from PMID 17959824. As a reminder, the it assesses yeast (Saccharomyces cerevisiae) transcription when grown in chemostat (steady-state) continuous culture under six different nutrient limitations (glucose (carbon), ammonium (nitrogen), sulfate, phosphate, and the amino acids uracil or leucine) and six different growth/flow rates (in units of percent volume / hour).

1.1 Starting simple (again)

You’ve already read about this experiment and should be familiar with its basic structure and encoding as expression.tsv and map.tsv files. So let’s get them loaded into R! Start by ensuring that this file and your two input files are again in the same directory, such that you see them when you run R’s equivalent of the ls command:

dir( )
##  [1] "~$r02-r.pptx"   "expression.tsv" "k04-r.html"     "k04-r.Rmd"     
##  [5] "l04-r.html"     "l04-r.pptx"     "l04-r.Rmd"      "map.tsv"       
##  [9] "r02-r.pptx"     "r02-r.Rmd"

1.2 Finishing complicated (again)

Now the rest is up to you again! You should do at least a few interesting things with the dataset, which can again include any combination of:

  • Loading any combination of the expression matrix or its metadata using data.frames (i.e. read.delim), transforming them, or displaying them.
  • Plotting one or more genes’ growth rate series within or among nutrient limitation conditions.
  • Plotting one or more genes’ differences between nutrient limitations at the fastest (i.e. largest) growth rate.
  • Identifying and/or plotting the most up- or down-regulated genes within each nutrient limitation (either at the greatest growth rate, or overall).
  • Identifying and/or plotting the most up-or down-regulated genes between related nutrient conditions (e.g. elemental nutrients vs. amino acid auxotrophies).
  • Formally testing for statistically differential expression among any combination of these conditions.
  • Making sure to appropriately link and convert between metadata as necessary (e.g. relating the sample ID "G0.05" to the Glucose nutrient at time point 0.05, or the gene with ID YMR123W to the name PKR1).

Most importantly, come up with at least one or two good ideas of what to analyze - even if you don’t know how to do it yet! And then ask Google and/or ask us to figure it out. This is your opportunity to finish something you started earlier, or come up with a new or more interesting question to ask.

Add text (any combination of notes and R chunks) at the this Markdown file as needed, and to get you started, here’s an example. Note that R Studio isn’t as good as Jupyter at automatically abbreviating long output, so we’ll frequently use head to show only the first n elements of an entity, and you’ll sometimes have to scroll to the right to see the entire output:

dfrmExpression <- read.delim( "expression.tsv", row.names = 1, stringsAsFactors = FALSE )
head( dfrmExpression, 25 )
##                                                                                                                                   NAME
## YMR123W                                   PKR1       || biological process unknown || molecular function unknown || YMR123W || 1082847
## YJL012C                               VTC4       || vacuole fusion, non-autophagic || molecular function unknown || YJL012C || 1083730
## YIL043C                                     CBR1       || electron transport || cytochrome-b5 reductase activity || YIL043C || 1086073
## YHR161C                                                            YAP1801    || endocytosis || clathrin binding || YHR161C || 1082031
## YGR192C         TDH3       || glycolysis* || glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) activity || YGR192C || 1082314
## YFL030W                           AGX1       || glycine biosynthesis || alanine-glyoxylate transaminase activity || YFL030W || 1084273
## YMR132C                                   JLP2       || biological process unknown || molecular function unknown || YMR132C || 1084272
## YLR360W                           VPS38      || late endosome to vacuole transport || molecular function unknown || YLR360W || 1082720
## YEL021W   URA3       || 'de novo' pyrimidine base biosynthesis* || orotidine-5'-phosphate decarboxylase activity || YEL021W || 1085994
## YLR163C            MAS1       || mitochondrial protein processing || mitochondrial processing peptidase activity || YLR163C || 1085883
## YPL235W            RVB2       || regulation of transcription from RNA polymerase II promoter* || ATPase activity || YPL235W || 1084807
## YDL061C                                 RPS29B     || protein biosynthesis || structural constituent of ribosome || YDL061C || 1081582
## YGL085W                                              || biological process unknown || molecular function unknown || YGL085W || 1086743
## YKL047W                                              || biological process unknown || molecular function unknown || YKL047W || 1081106
## YDL248W                                            COS7       || biological process unknown || receptor activity || YDL248W || 1085813
## YMR086C-A                                                                                              ||  ||  || YMR086C-A || 1083761
## YNL090W                                 RHO2       || cell wall organization and biogenesis* || GTPase activity* || YNL090W || 1082726
## YDR335W                                                 MSN5       || protein-nucleus export || protein binding* || YDR335W || 1084560
## YPL244C                              HUT1       || UDP-galactose transport || UDP-galactose transporter activity || YPL244C || 1081802
## YGL094C                          PAN2       || postreplication repair* || poly(A)-specific ribonuclease activity || YGL094C || 1083948
## YOR362C                           PRE10      || ubiquitin-dependent protein catabolism || endopeptidase activity || YOR362C || 1086519
## YFR032C-A                             RPL29      || protein biosynthesis || structural constituent of ribosome || YFR032C-A || 1082417
## YLR143W                                              || biological process unknown || molecular function unknown || YLR143W || 1083634
## YLR152C                                              || biological process unknown || molecular function unknown || YLR152C || 1086273
## YMR121C                                RPL15B     || protein biosynthesis || structural constituent of ribosome* || YMR121C || 1084662
##                 G0.05      G0.1 G0.15  G0.2 G0.25        G0.3 N0.05  N0.1 N0.15
## YMR123W   -0.73000000 -0.370000 -0.46 -0.41 -0.06  0.13000000 -1.55 -1.33 -0.51
## YJL012C   -0.19000000 -0.080000 -0.16 -0.03 -0.16 -0.16000000  0.85  0.81  0.54
## YIL043C   -0.45000000 -0.160000 -0.26 -0.22 -0.19 -0.27000000 -1.15 -0.89 -0.82
## YHR161C   -0.05000000 -0.070000  0.14 -0.04 -0.29 -0.10000000  0.45  0.22 -0.50
## YGR192C   -2.20000000 -2.910000 -1.99 -1.34 -0.99 -0.85000000 -1.97 -1.47 -2.15
## YFL030W    1.70000000  1.710000  1.52  1.71  0.65 -1.08000000  0.12 -0.02 -1.72
## YMR132C    0.16000000  0.600000  0.05 -0.23 -0.03  0.16000000 -0.85 -0.03  0.93
## YLR360W   -0.09000000  0.050000  0.16  0.17 -0.05  0.07000000 -0.20 -0.12  0.58
## YEL021W   -0.60000000 -0.500000 -0.17  0.16  0.11  0.22000000 -0.66 -0.52 -0.42
## YLR163C   -0.03000000 -0.120000 -0.17  0.09  0.15  0.23000000  0.10 -0.18 -0.79
## YPL235W   -0.14000000 -0.220000 -0.05  0.06 -0.07  0.07000000  0.09  0.03  0.00
## YDL061C    0.08000000  0.110000  0.08 -0.03  0.21  0.15000000 -0.89 -0.41  0.79
## YGL085W    0.25000000  0.350000 -0.24 -0.16  0.22  0.34000000 -0.68 -0.34  0.17
## YKL047W   -0.06000000 -0.210000 -0.25 -0.17 -0.22 -0.16000000 -0.11 -0.02  0.07
## YDL248W   -0.17000000 -0.090000 -0.05 -0.04 -0.58 -0.57000000 -0.11 -0.22  0.20
## YMR086C-A -0.00506832 -0.139995 -0.47 -0.21 -0.36  0.00399855  0.62  0.52 -0.12
## YNL090W   -0.56000000 -0.050000 -0.10 -0.05  0.08  0.27000000 -0.74 -0.38 -0.17
## YDR335W    0.06000000 -0.030000  0.29  0.16  0.14  0.14000000  0.55  0.19  0.05
## YPL244C   -0.72000000 -0.480000 -0.57 -0.32 -0.13  0.00000000 -1.31 -0.99 -0.99
## YGL094C   -0.58000000 -0.410000 -0.37 -0.29 -0.27 -0.22000000 -0.28 -0.21  0.28
## YOR362C   -0.15000000 -0.040000  0.09  0.00 -0.09 -0.15000000 -0.60 -0.69 -0.35
## YFR032C-A -0.43000000 -0.030000 -0.31 -0.33  0.05 -0.11000000 -1.04 -0.51  0.41
## YLR143W   -0.48000000 -0.530000 -0.41 -0.27 -0.29  0.03000000 -0.18 -0.27 -0.04
## YLR152C    1.89000000  1.810000  1.32  0.94  0.11 -0.07000000  0.82  0.84  0.57
## YMR121C    0.69000000  0.700000  0.41  0.33  0.02  0.12000000  0.69  0.56  0.25
##            N0.2 N0.25  N0.3 P0.05  P0.1 P0.15  P0.2 P0.25  P0.3     S0.05  S0.1
## YMR123W   -0.62  0.02  0.27 -0.79 -0.44 -0.17  0.14  0.11  0.32 -1.820000 -0.95
## YJL012C    0.32  0.28  0.45  3.72  4.09  3.70  3.79  3.74  3.49  0.830000  0.73
## YIL043C   -0.61 -0.48 -0.32 -0.11 -0.08  0.19  0.35 -0.03  0.10 -0.380000 -0.73
## YHR161C   -0.87 -0.49 -0.26 -0.56 -0.52 -0.49 -0.40 -0.26 -0.27 -0.190000 -0.55
## YGR192C   -1.42 -1.07 -0.60  0.39  0.75  1.05  1.34  1.51  1.11 -2.840000 -2.56
## YFL030W   -2.01 -2.80 -2.98 -1.07 -1.19 -1.87 -2.36 -3.34 -4.20 -0.700000 -0.23
## YMR132C    0.62  0.71  0.10  0.50  0.31  0.53  0.63  0.59  0.91  0.590000  0.14
## YLR360W    0.35  0.13  0.08 -0.23 -0.29 -0.17  0.08 -0.13 -0.26  0.510000  0.24
## YEL021W   -0.32 -0.14  0.20 -0.78 -0.58 -0.44 -0.62 -0.10 -0.32 -1.060000 -0.67
## YLR163C   -0.81 -0.77 -0.33 -0.71 -0.58 -0.70 -0.85 -0.50 -0.63  0.490000 -0.37
## YPL235W    0.03  0.09 -0.06 -0.19 -0.05  0.08 -0.10  0.06  0.10  0.280000  0.04
## YDL061C    0.73  0.44  0.27 -0.63 -0.62 -0.48 -0.37 -0.40  0.12 -0.610000  0.09
## YGL085W    0.43  0.31  0.18  0.02  0.04  0.27  0.29  0.30  0.26 -0.180000 -0.02
## YKL047W    0.13 -0.05 -0.18  0.69  0.67  0.63  0.35  0.32  0.18  0.410000  0.22
## YDL248W   -0.18 -0.26 -0.64 -0.60 -0.82 -1.01 -1.10 -1.63 -1.42 -0.100000  0.27
## YMR086C-A  0.09 -0.13 -0.04 -0.18 -0.19  0.26 -0.04  0.26  0.65  0.467001 -0.21
## YNL090W    0.10  0.16  0.23 -0.33 -0.26  0.01  0.14  0.19  0.15  0.470000  0.27
## YDR335W    0.04  0.00  0.14  0.37  0.25  0.12  0.36  0.28  0.01  0.380000  0.43
## YPL244C   -0.47 -0.51  0.02 -0.76 -0.62 -0.25  0.23  0.21  0.25 -1.560000 -0.82
## YGL094C    0.00  0.18 -0.04  0.12  0.22  0.21  0.31  0.18  0.28 -0.440000 -0.07
## YOR362C   -0.28 -0.02 -0.09 -0.14 -0.63 -0.30 -0.42 -0.20 -0.09 -0.420000 -0.25
## YFR032C-A  0.40  0.34  0.00 -0.65 -0.51 -0.16  0.21  0.27  0.79  0.180000  0.07
## YLR143W   -0.14 -0.07  0.11 -0.09  0.24  0.19  0.16  0.35  0.41  0.800000 -0.42
## YLR152C   -0.03 -0.31 -0.41 -1.25 -1.39 -0.99 -0.91 -1.16 -1.17  0.640000  0.23
## YMR121C   -0.06 -0.01 -0.25  0.51  0.56  0.62  0.13  0.30  0.20  1.240000  0.75
##                S0.15  S0.2      S0.25  S0.3 L0.05  L0.1 L0.15  L0.2 L0.25  L0.3
## YMR123W   -0.1300000 -0.15 -0.2000000  0.05 -0.41 -0.25 -0.20 -0.14  0.16  0.12
## YJL012C    0.5200000  0.42  0.3200000  0.19 -0.37 -0.07  0.01  0.12  0.19  0.04
## YIL043C   -0.1700000 -0.22 -0.1500000 -0.16  0.02 -0.06 -0.16 -0.09 -0.23 -0.06
## YHR161C   -0.3500000 -0.64 -0.5100000 -0.35 -0.16 -0.25 -0.12 -0.12 -0.17 -0.31
## YGR192C   -2.1200000 -1.24 -1.4200000 -0.72 -1.74 -0.66 -0.35  0.12  0.25 -0.29
## YFL030W   -1.5700000 -1.63 -1.9400000 -1.78 -0.19 -1.21 -1.98 -2.43 -2.51 -3.17
## YMR132C    0.2600000  0.07  0.4300000  0.23 -0.46 -0.12 -0.17  0.14 -0.01  0.22
## YLR360W   -0.0100000  0.06  0.0900000  0.08 -0.33 -0.04 -0.17 -0.10 -0.05  0.02
## YEL021W   -0.2300000 -0.13 -0.1400000  0.00 -0.55 -0.42 -0.72 -0.16 -0.03  0.23
## YLR163C   -0.5800000 -0.69 -0.6700000 -0.69 -0.33 -0.33 -0.15 -0.30 -0.22 -0.14
## YPL235W    0.1800000 -0.04  0.1000000 -0.03 -0.11 -0.11  0.23  0.18  0.13  0.28
## YDL061C    0.2000000  0.27  0.2200000  0.27  0.00 -0.12 -0.09  0.08 -0.03  0.19
## YGL085W    0.2800000  0.17  0.2900000  0.25  0.00  0.20  0.02  0.30  0.11  0.41
## YKL047W    0.1800000  0.01  0.1600000  0.03 -0.10  0.25  0.06  0.10  0.09  0.09
## YDL248W   -0.0100000  0.21 -0.0500000  0.19  0.02  0.09 -0.42 -0.18 -0.01 -0.22
## YMR086C-A -0.0780547 -0.10 -0.0450466 -0.21  0.40 -0.65  0.36  0.55  0.18 -0.20
## YNL090W    0.1700000  0.31  0.2200000  0.32 -0.28  0.11  0.33  0.36  0.38  0.29
## YDR335W    0.5800000  0.39  0.2900000  0.10  0.43  0.27  0.21 -0.03  0.08  0.02
## YPL244C   -0.5400000 -0.46 -0.2900000 -0.05 -0.56 -0.28 -0.10  0.01  0.13  0.27
## YGL094C   -0.0200000  0.00 -0.0600000  0.00 -0.27 -0.36 -0.24 -0.25 -0.13 -0.08
## YOR362C    0.3100000  0.06  0.0900000 -0.19  0.13  0.05  0.07  0.06  0.02 -0.01
## YFR032C-A  0.3700000  0.44  0.4100000  0.30  0.10 -0.12  0.02  0.12  0.10  0.28
## YLR143W   -0.3800000 -0.22 -0.2400000  0.00 -0.37 -0.26 -0.02 -0.05  0.01  0.06
## YLR152C   -0.4200000 -0.27 -0.4300000 -0.31  0.51  0.01 -0.32 -0.54 -1.05 -1.39
## YMR121C    0.5600000  0.40  0.4800000  0.34  0.72  0.49  0.37  0.35  0.03  0.36
##           U0.05      U0.1 U0.15  U0.2 U0.25  U0.3
## YMR123W   -0.90 -0.340000 -0.04  0.17  0.04  0.25
## YJL012C   -0.62  0.070000  0.07  0.02  0.25  0.37
## YIL043C    0.18  0.480000 -0.12  0.05 -0.20 -0.12
## YHR161C   -0.67 -0.200000 -0.12 -0.08  0.19  0.03
## YGR192C   -3.66 -1.490000 -0.85 -0.50 -0.07  0.07
## YFL030W   -1.27 -1.680000 -2.29 -3.17 -3.39 -3.23
## YMR132C   -0.03 -0.160000  0.05  0.48  0.15  0.04
## YLR360W   -0.34  0.430000  0.07  0.08  0.22  0.01
## YEL021W   -3.07 -3.560000 -3.24 -3.00 -3.25 -3.02
## YLR163C   -0.55 -0.790000 -0.57 -0.45 -0.66 -0.56
## YPL235W    0.12  0.050000  0.08  0.28  0.03 -0.08
## YDL061C   -0.16  0.260000  0.31  0.29  0.31  0.07
## YGL085W   -0.59  0.080000  0.07  0.37  0.19  0.12
## YKL047W    0.65  0.390000  0.22  0.26  0.29  0.16
## YDL248W   -0.25  0.760000  0.64  0.04  0.39  0.31
## YMR086C-A  0.95 -0.094996 -0.50 -0.10 -0.03  0.12
## YNL090W   -0.32  0.060000  0.22  0.36  0.23  0.18
## YDR335W   -0.11 -0.040000  0.03 -0.11 -0.06 -0.09
## YPL244C   -1.01 -0.410000 -0.27  0.01 -0.03  0.15
## YGL094C   -0.28 -0.320000 -0.22 -0.26 -0.21 -0.13
## YOR362C   -0.76 -0.180000 -0.09  0.05 -0.10 -0.05
## YFR032C-A -0.03 -0.160000  0.23  0.61  0.09 -0.10
## YLR143W    0.20 -0.450000 -0.21  0.05  0.12  0.18
## YLR152C    0.01 -0.470000 -1.29 -1.12 -1.02 -1.09
## YMR121C    2.11  0.880000  0.49  0.73  0.35  0.22
dfrmMetadata <- read.delim( "map.tsv", row.names = 1, stringsAsFactors = FALSE )
dfrmMetadata
##        Nutrient Rate
## G0.05   Glucose 0.05
## G0.1    Glucose 0.10
## G0.15   Glucose 0.15
## G0.2    Glucose 0.20
## G0.25   Glucose 0.25
## G0.3    Glucose 0.30
## N0.05  Ammonium 0.05
## N0.1   Ammonium 0.10
## N0.15  Ammonium 0.15
## N0.2   Ammonium 0.20
## N0.25  Ammonium 0.25
## N0.3   Ammonium 0.30
## P0.05 Phosphate 0.05
## P0.1  Phosphate 0.10
## P0.15 Phosphate 0.15
## P0.2  Phosphate 0.20
## P0.25 Phosphate 0.25
## P0.3  Phosphate 0.30
## S0.05   Sulfate 0.05
## S0.1    Sulfate 0.10
## S0.15   Sulfate 0.15
## S0.2    Sulfate 0.20
## S0.25   Sulfate 0.25
## S0.3    Sulfate 0.30
## L0.05   Leucine 0.05
## L0.1    Leucine 0.10
## L0.15   Leucine 0.15
## L0.2    Leucine 0.20
## L0.25   Leucine 0.25
## L0.3    Leucine 0.30
## U0.05    Uracil 0.05
## U0.1     Uracil 0.10
## U0.15    Uracil 0.15
## U0.2     Uracil 0.20
## U0.25    Uracil 0.25
## U0.3     Uracil 0.30
colnames( dfrmExpression )
##  [1] "NAME"  "G0.05" "G0.1"  "G0.15" "G0.2"  "G0.25" "G0.3"  "N0.05" "N0.1" 
## [10] "N0.15" "N0.2"  "N0.25" "N0.3"  "P0.05" "P0.1"  "P0.15" "P0.2"  "P0.25"
## [19] "P0.3"  "S0.05" "S0.1"  "S0.15" "S0.2"  "S0.25" "S0.3"  "L0.05" "L0.1" 
## [28] "L0.15" "L0.2"  "L0.25" "L0.3"  "U0.05" "U0.1"  "U0.15" "U0.2"  "U0.25"
## [37] "U0.3"
head( rownames( dfrmExpression ), 100 )
##   [1] "YMR123W"   "YJL012C"   "YIL043C"   "YHR161C"   "YGR192C"   "YFL030W"  
##   [7] "YMR132C"   "YLR360W"   "YEL021W"   "YLR163C"   "YPL235W"   "YDL061C"  
##  [13] "YGL085W"   "YKL047W"   "YDL248W"   "YMR086C-A" "YNL090W"   "YDR335W"  
##  [19] "YPL244C"   "YGL094C"   "YOR362C"   "YFR032C-A" "YLR143W"   "YLR152C"  
##  [25] "YMR121C"   "YMR198W"   "YKL056C"   "YJL087C"   "YNL286W"   "YDL050C"  
##  [31] "YDL237W"   "YMR110C"   "YDL030W"   "YPL213W"   "YNL275W"   "YBL092W"  
##  [37] "YDR333C"   "YGL063W"   "YIL001W"   "YGL072C"   "YOR340C"   "YNL284C"  
##  [43] "YDR529C"   "YER029C"   "YMR176W"   "YDL235C"   "YLR130C"   "YPL092W"  
##  [49] "YJL065C"   "YBL081W"   "YIL096C"   "YDR399W"   "YEL074W"   "YBR158W"  
##  [55] "YGL208W"   "YHR194W"   "YBR167C"   "YJL045W"   "YIL076W"   "YDR302W"  
##  [61] "YER009W"   "YGL061C"   "YER018C"   "YDR388W"   "YNL253W"   "YPL081W"  
##  [67] "YLR196W"   "YIL085C"   "YDL224C"   "YER190W"   "YMR174C"   "YDR397C"  
##  [73] "YBR147W"   "YPL090C"   "YHR183W"   "YDL204W"   "YDR507C"   "YJL034W"  
##  [79] "YBR156C"   "YNL242W"   "YKL089W"   "YLR185W"   "YIL074C"   "YDR300C"  
##  [85] "YDL213C"   "YDR377W"   "YPL070W"   "YEL052W"   "YMR163C"   "YLR194C"  
##  [91] "YBR136W"   "YGL206C"   "YGL030W"   "YFL061W"   "YPL266W"   "YNL251C"  
##  [97] "YHR172W"   "YEL061C"   "YLR174W"   "YIL063C"

Again, don’t forget to scroll to the right for wide output:

head( dfrmExpression["NAME"], 25 )
##                                                                                                                                   NAME
## YMR123W                                   PKR1       || biological process unknown || molecular function unknown || YMR123W || 1082847
## YJL012C                               VTC4       || vacuole fusion, non-autophagic || molecular function unknown || YJL012C || 1083730
## YIL043C                                     CBR1       || electron transport || cytochrome-b5 reductase activity || YIL043C || 1086073
## YHR161C                                                            YAP1801    || endocytosis || clathrin binding || YHR161C || 1082031
## YGR192C         TDH3       || glycolysis* || glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) activity || YGR192C || 1082314
## YFL030W                           AGX1       || glycine biosynthesis || alanine-glyoxylate transaminase activity || YFL030W || 1084273
## YMR132C                                   JLP2       || biological process unknown || molecular function unknown || YMR132C || 1084272
## YLR360W                           VPS38      || late endosome to vacuole transport || molecular function unknown || YLR360W || 1082720
## YEL021W   URA3       || 'de novo' pyrimidine base biosynthesis* || orotidine-5'-phosphate decarboxylase activity || YEL021W || 1085994
## YLR163C            MAS1       || mitochondrial protein processing || mitochondrial processing peptidase activity || YLR163C || 1085883
## YPL235W            RVB2       || regulation of transcription from RNA polymerase II promoter* || ATPase activity || YPL235W || 1084807
## YDL061C                                 RPS29B     || protein biosynthesis || structural constituent of ribosome || YDL061C || 1081582
## YGL085W                                              || biological process unknown || molecular function unknown || YGL085W || 1086743
## YKL047W                                              || biological process unknown || molecular function unknown || YKL047W || 1081106
## YDL248W                                            COS7       || biological process unknown || receptor activity || YDL248W || 1085813
## YMR086C-A                                                                                              ||  ||  || YMR086C-A || 1083761
## YNL090W                                 RHO2       || cell wall organization and biogenesis* || GTPase activity* || YNL090W || 1082726
## YDR335W                                                 MSN5       || protein-nucleus export || protein binding* || YDR335W || 1084560
## YPL244C                              HUT1       || UDP-galactose transport || UDP-galactose transporter activity || YPL244C || 1081802
## YGL094C                          PAN2       || postreplication repair* || poly(A)-specific ribonuclease activity || YGL094C || 1083948
## YOR362C                           PRE10      || ubiquitin-dependent protein catabolism || endopeptidase activity || YOR362C || 1086519
## YFR032C-A                             RPL29      || protein biosynthesis || structural constituent of ribosome || YFR032C-A || 1082417
## YLR143W                                              || biological process unknown || molecular function unknown || YLR143W || 1083634
## YLR152C                                              || biological process unknown || molecular function unknown || YLR152C || 1086273
## YMR121C                                RPL15B     || protein biosynthesis || structural constituent of ribosome* || YMR121C || 1084662

Recall that [] returns the slice NAME (in this case a sub-data.frame containing one column, NAME), whereas [[]] returns the element NAME (in this case a string vector):

head( dfrmExpression[["NAME"]], 10 )
##  [1] "PKR1       || biological process unknown || molecular function unknown || YMR123W || 1082847"                                
##  [2] "VTC4       || vacuole fusion, non-autophagic || molecular function unknown || YJL012C || 1083730"                            
##  [3] "CBR1       || electron transport || cytochrome-b5 reductase activity || YIL043C || 1086073"                                  
##  [4] "YAP1801    || endocytosis || clathrin binding || YHR161C || 1082031"                                                         
##  [5] "TDH3       || glycolysis* || glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) activity || YGR192C || 1082314"      
##  [6] "AGX1       || glycine biosynthesis || alanine-glyoxylate transaminase activity || YFL030W || 1084273"                        
##  [7] "JLP2       || biological process unknown || molecular function unknown || YMR132C || 1084272"                                
##  [8] "VPS38      || late endosome to vacuole transport || molecular function unknown || YLR360W || 1082720"                        
##  [9] "URA3       || 'de novo' pyrimidine base biosynthesis* || orotidine-5'-phosphate decarboxylase activity || YEL021W || 1085994"
## [10] "MAS1       || mitochondrial protein processing || mitochondrial processing peptidase activity || YLR163C || 1085883"

For vector types, the two forms ([] and [[]]) are more-or-less equivalent.

dfrmExpression[["NAME"]][1]
## [1] "PKR1       || biological process unknown || molecular function unknown || YMR123W || 1082847"

R is notably worse than Python at non-numeric data manipulation. To perform the same simplification of the ||-delineated human-readable gene names that we did in a couple simple lines of Python, the magic is (initially on just the first name as an example):

strsplit( dfrmExpression[["NAME"]][1], "\\|\\|" )
## [[1]]
## [1] "PKR1       "                  " biological process unknown "
## [3] " molecular function unknown " " YMR123W "                   
## [5] " 1082847"
strsplit( dfrmExpression[["NAME"]][1], "\\|\\|" )[[1]]
## [1] "PKR1       "                  " biological process unknown "
## [3] " molecular function unknown " " YMR123W "                   
## [5] " 1082847"
trimws( strsplit( dfrmExpression[["NAME"]][1], "\\|\\|" )[[1]] )
## [1] "PKR1"                       "biological process unknown"
## [3] "molecular function unknown" "YMR123W"                   
## [5] "1082847"
trimws( strsplit( dfrmExpression[["NAME"]][1], "\\|\\|" )[[1]] )[1]
## [1] "PKR1"

Got all that? We split the string at every occurrence of "||" (which requires extra escape characters in R), keep only the first element of the resulting list (which is of length one), remove its whitespace, and then keep the first element of that resulting vector (which is of length equal to the number of ||-delimited tokens in the first name).

Even more confusingly, to apply this to every name in the original data frame column, we add one more layer:

astrNames <- trimws( lapply( strsplit( dfrmExpression[["NAME"]], "\\|\\|" ), "[[", 1 ) )
head( astrNames, 100 )
##   [1] "PKR1"    "VTC4"    "CBR1"    "YAP1801" "TDH3"    "AGX1"    "JLP2"   
##   [8] "VPS38"   "URA3"    "MAS1"    "RVB2"    "RPS29B"  ""        ""       
##  [15] "COS7"    ""        "RHO2"    "MSN5"    "HUT1"    "PAN2"    "PRE10"  
##  [22] "RPL29"   ""        ""        "RPL15B"  "CIK1"    ""        "TRL1"   
##  [29] "CUS2"    ""        ""        ""        "PRP9"    "LEA1"    ""       
##  [36] "RPL32"   ""        "PUS2"    ""        ""        "RPA43"   "MRPL10" 
##  [43] "QCR7"    "SMB1"    "ECM5"    "YPD1"    "ZRT2"    "SSU1"    "DLS1"   
##  [50] ""        ""        "HPT1"    ""        "AMN1"    "SIP2"    "MDM31"  
##  [57] "POP7"    ""        "SEC28"   "GPI11"   "NTF2"    "DUO1"    "SPC25"  
##  [64] "RVS167"  "TEX1"    "RPS9A"   "PWP1"    "KTR7"    "WHI4"    "YRF1-2" 
##  [71] "PAI3"    "NCB2"    ""        "RPS6A"   "GND1"    "RTN2"    "GIN4"   
##  [78] "KAR2"    "SLI15"   "ATG2"    "MIF2"    "RPL37A"  "SER33"   "PRO1"   
##  [85] "NOP6"    "ATP17"   "MUK1"    "AFG1"    ""        ""        "MEC1"   
##  [92] "CHC1"    "RPL30"   ""        "DIM1"    "NRD1"    "SPC97"   "CIN8"   
##  [99] "IDP2"    "YRB2"

But after all of that work, we can save the results back into the original data frame to work with later:

dfrmExpression[["NAME"]] <- astrNames
head( dfrmExpression["NAME"], 25 )
##              NAME
## YMR123W      PKR1
## YJL012C      VTC4
## YIL043C      CBR1
## YHR161C   YAP1801
## YGR192C      TDH3
## YFL030W      AGX1
## YMR132C      JLP2
## YLR360W     VPS38
## YEL021W      URA3
## YLR163C      MAS1
## YPL235W      RVB2
## YDL061C    RPS29B
## YGL085W          
## YKL047W          
## YDL248W      COS7
## YMR086C-A        
## YNL090W      RHO2
## YDR335W      MSN5
## YPL244C      HUT1
## YGL094C      PAN2
## YOR362C     PRE10
## YFR032C-A   RPL29
## YLR143W          
## YLR152C          
## YMR121C    RPL15B

Now, back to work…

dfrmData <- dfrmExpression[,-1]
head( dfrmData, 25 )
##                 G0.05      G0.1 G0.15  G0.2 G0.25        G0.3 N0.05  N0.1 N0.15
## YMR123W   -0.73000000 -0.370000 -0.46 -0.41 -0.06  0.13000000 -1.55 -1.33 -0.51
## YJL012C   -0.19000000 -0.080000 -0.16 -0.03 -0.16 -0.16000000  0.85  0.81  0.54
## YIL043C   -0.45000000 -0.160000 -0.26 -0.22 -0.19 -0.27000000 -1.15 -0.89 -0.82
## YHR161C   -0.05000000 -0.070000  0.14 -0.04 -0.29 -0.10000000  0.45  0.22 -0.50
## YGR192C   -2.20000000 -2.910000 -1.99 -1.34 -0.99 -0.85000000 -1.97 -1.47 -2.15
## YFL030W    1.70000000  1.710000  1.52  1.71  0.65 -1.08000000  0.12 -0.02 -1.72
## YMR132C    0.16000000  0.600000  0.05 -0.23 -0.03  0.16000000 -0.85 -0.03  0.93
## YLR360W   -0.09000000  0.050000  0.16  0.17 -0.05  0.07000000 -0.20 -0.12  0.58
## YEL021W   -0.60000000 -0.500000 -0.17  0.16  0.11  0.22000000 -0.66 -0.52 -0.42
## YLR163C   -0.03000000 -0.120000 -0.17  0.09  0.15  0.23000000  0.10 -0.18 -0.79
## YPL235W   -0.14000000 -0.220000 -0.05  0.06 -0.07  0.07000000  0.09  0.03  0.00
## YDL061C    0.08000000  0.110000  0.08 -0.03  0.21  0.15000000 -0.89 -0.41  0.79
## YGL085W    0.25000000  0.350000 -0.24 -0.16  0.22  0.34000000 -0.68 -0.34  0.17
## YKL047W   -0.06000000 -0.210000 -0.25 -0.17 -0.22 -0.16000000 -0.11 -0.02  0.07
## YDL248W   -0.17000000 -0.090000 -0.05 -0.04 -0.58 -0.57000000 -0.11 -0.22  0.20
## YMR086C-A -0.00506832 -0.139995 -0.47 -0.21 -0.36  0.00399855  0.62  0.52 -0.12
## YNL090W   -0.56000000 -0.050000 -0.10 -0.05  0.08  0.27000000 -0.74 -0.38 -0.17
## YDR335W    0.06000000 -0.030000  0.29  0.16  0.14  0.14000000  0.55  0.19  0.05
## YPL244C   -0.72000000 -0.480000 -0.57 -0.32 -0.13  0.00000000 -1.31 -0.99 -0.99
## YGL094C   -0.58000000 -0.410000 -0.37 -0.29 -0.27 -0.22000000 -0.28 -0.21  0.28
## YOR362C   -0.15000000 -0.040000  0.09  0.00 -0.09 -0.15000000 -0.60 -0.69 -0.35
## YFR032C-A -0.43000000 -0.030000 -0.31 -0.33  0.05 -0.11000000 -1.04 -0.51  0.41
## YLR143W   -0.48000000 -0.530000 -0.41 -0.27 -0.29  0.03000000 -0.18 -0.27 -0.04
## YLR152C    1.89000000  1.810000  1.32  0.94  0.11 -0.07000000  0.82  0.84  0.57
## YMR121C    0.69000000  0.700000  0.41  0.33  0.02  0.12000000  0.69  0.56  0.25
##            N0.2 N0.25  N0.3 P0.05  P0.1 P0.15  P0.2 P0.25  P0.3     S0.05  S0.1
## YMR123W   -0.62  0.02  0.27 -0.79 -0.44 -0.17  0.14  0.11  0.32 -1.820000 -0.95
## YJL012C    0.32  0.28  0.45  3.72  4.09  3.70  3.79  3.74  3.49  0.830000  0.73
## YIL043C   -0.61 -0.48 -0.32 -0.11 -0.08  0.19  0.35 -0.03  0.10 -0.380000 -0.73
## YHR161C   -0.87 -0.49 -0.26 -0.56 -0.52 -0.49 -0.40 -0.26 -0.27 -0.190000 -0.55
## YGR192C   -1.42 -1.07 -0.60  0.39  0.75  1.05  1.34  1.51  1.11 -2.840000 -2.56
## YFL030W   -2.01 -2.80 -2.98 -1.07 -1.19 -1.87 -2.36 -3.34 -4.20 -0.700000 -0.23
## YMR132C    0.62  0.71  0.10  0.50  0.31  0.53  0.63  0.59  0.91  0.590000  0.14
## YLR360W    0.35  0.13  0.08 -0.23 -0.29 -0.17  0.08 -0.13 -0.26  0.510000  0.24
## YEL021W   -0.32 -0.14  0.20 -0.78 -0.58 -0.44 -0.62 -0.10 -0.32 -1.060000 -0.67
## YLR163C   -0.81 -0.77 -0.33 -0.71 -0.58 -0.70 -0.85 -0.50 -0.63  0.490000 -0.37
## YPL235W    0.03  0.09 -0.06 -0.19 -0.05  0.08 -0.10  0.06  0.10  0.280000  0.04
## YDL061C    0.73  0.44  0.27 -0.63 -0.62 -0.48 -0.37 -0.40  0.12 -0.610000  0.09
## YGL085W    0.43  0.31  0.18  0.02  0.04  0.27  0.29  0.30  0.26 -0.180000 -0.02
## YKL047W    0.13 -0.05 -0.18  0.69  0.67  0.63  0.35  0.32  0.18  0.410000  0.22
## YDL248W   -0.18 -0.26 -0.64 -0.60 -0.82 -1.01 -1.10 -1.63 -1.42 -0.100000  0.27
## YMR086C-A  0.09 -0.13 -0.04 -0.18 -0.19  0.26 -0.04  0.26  0.65  0.467001 -0.21
## YNL090W    0.10  0.16  0.23 -0.33 -0.26  0.01  0.14  0.19  0.15  0.470000  0.27
## YDR335W    0.04  0.00  0.14  0.37  0.25  0.12  0.36  0.28  0.01  0.380000  0.43
## YPL244C   -0.47 -0.51  0.02 -0.76 -0.62 -0.25  0.23  0.21  0.25 -1.560000 -0.82
## YGL094C    0.00  0.18 -0.04  0.12  0.22  0.21  0.31  0.18  0.28 -0.440000 -0.07
## YOR362C   -0.28 -0.02 -0.09 -0.14 -0.63 -0.30 -0.42 -0.20 -0.09 -0.420000 -0.25
## YFR032C-A  0.40  0.34  0.00 -0.65 -0.51 -0.16  0.21  0.27  0.79  0.180000  0.07
## YLR143W   -0.14 -0.07  0.11 -0.09  0.24  0.19  0.16  0.35  0.41  0.800000 -0.42
## YLR152C   -0.03 -0.31 -0.41 -1.25 -1.39 -0.99 -0.91 -1.16 -1.17  0.640000  0.23
## YMR121C   -0.06 -0.01 -0.25  0.51  0.56  0.62  0.13  0.30  0.20  1.240000  0.75
##                S0.15  S0.2      S0.25  S0.3 L0.05  L0.1 L0.15  L0.2 L0.25  L0.3
## YMR123W   -0.1300000 -0.15 -0.2000000  0.05 -0.41 -0.25 -0.20 -0.14  0.16  0.12
## YJL012C    0.5200000  0.42  0.3200000  0.19 -0.37 -0.07  0.01  0.12  0.19  0.04
## YIL043C   -0.1700000 -0.22 -0.1500000 -0.16  0.02 -0.06 -0.16 -0.09 -0.23 -0.06
## YHR161C   -0.3500000 -0.64 -0.5100000 -0.35 -0.16 -0.25 -0.12 -0.12 -0.17 -0.31
## YGR192C   -2.1200000 -1.24 -1.4200000 -0.72 -1.74 -0.66 -0.35  0.12  0.25 -0.29
## YFL030W   -1.5700000 -1.63 -1.9400000 -1.78 -0.19 -1.21 -1.98 -2.43 -2.51 -3.17
## YMR132C    0.2600000  0.07  0.4300000  0.23 -0.46 -0.12 -0.17  0.14 -0.01  0.22
## YLR360W   -0.0100000  0.06  0.0900000  0.08 -0.33 -0.04 -0.17 -0.10 -0.05  0.02
## YEL021W   -0.2300000 -0.13 -0.1400000  0.00 -0.55 -0.42 -0.72 -0.16 -0.03  0.23
## YLR163C   -0.5800000 -0.69 -0.6700000 -0.69 -0.33 -0.33 -0.15 -0.30 -0.22 -0.14
## YPL235W    0.1800000 -0.04  0.1000000 -0.03 -0.11 -0.11  0.23  0.18  0.13  0.28
## YDL061C    0.2000000  0.27  0.2200000  0.27  0.00 -0.12 -0.09  0.08 -0.03  0.19
## YGL085W    0.2800000  0.17  0.2900000  0.25  0.00  0.20  0.02  0.30  0.11  0.41
## YKL047W    0.1800000  0.01  0.1600000  0.03 -0.10  0.25  0.06  0.10  0.09  0.09
## YDL248W   -0.0100000  0.21 -0.0500000  0.19  0.02  0.09 -0.42 -0.18 -0.01 -0.22
## YMR086C-A -0.0780547 -0.10 -0.0450466 -0.21  0.40 -0.65  0.36  0.55  0.18 -0.20
## YNL090W    0.1700000  0.31  0.2200000  0.32 -0.28  0.11  0.33  0.36  0.38  0.29
## YDR335W    0.5800000  0.39  0.2900000  0.10  0.43  0.27  0.21 -0.03  0.08  0.02
## YPL244C   -0.5400000 -0.46 -0.2900000 -0.05 -0.56 -0.28 -0.10  0.01  0.13  0.27
## YGL094C   -0.0200000  0.00 -0.0600000  0.00 -0.27 -0.36 -0.24 -0.25 -0.13 -0.08
## YOR362C    0.3100000  0.06  0.0900000 -0.19  0.13  0.05  0.07  0.06  0.02 -0.01
## YFR032C-A  0.3700000  0.44  0.4100000  0.30  0.10 -0.12  0.02  0.12  0.10  0.28
## YLR143W   -0.3800000 -0.22 -0.2400000  0.00 -0.37 -0.26 -0.02 -0.05  0.01  0.06
## YLR152C   -0.4200000 -0.27 -0.4300000 -0.31  0.51  0.01 -0.32 -0.54 -1.05 -1.39
## YMR121C    0.5600000  0.40  0.4800000  0.34  0.72  0.49  0.37  0.35  0.03  0.36
##           U0.05      U0.1 U0.15  U0.2 U0.25  U0.3
## YMR123W   -0.90 -0.340000 -0.04  0.17  0.04  0.25
## YJL012C   -0.62  0.070000  0.07  0.02  0.25  0.37
## YIL043C    0.18  0.480000 -0.12  0.05 -0.20 -0.12
## YHR161C   -0.67 -0.200000 -0.12 -0.08  0.19  0.03
## YGR192C   -3.66 -1.490000 -0.85 -0.50 -0.07  0.07
## YFL030W   -1.27 -1.680000 -2.29 -3.17 -3.39 -3.23
## YMR132C   -0.03 -0.160000  0.05  0.48  0.15  0.04
## YLR360W   -0.34  0.430000  0.07  0.08  0.22  0.01
## YEL021W   -3.07 -3.560000 -3.24 -3.00 -3.25 -3.02
## YLR163C   -0.55 -0.790000 -0.57 -0.45 -0.66 -0.56
## YPL235W    0.12  0.050000  0.08  0.28  0.03 -0.08
## YDL061C   -0.16  0.260000  0.31  0.29  0.31  0.07
## YGL085W   -0.59  0.080000  0.07  0.37  0.19  0.12
## YKL047W    0.65  0.390000  0.22  0.26  0.29  0.16
## YDL248W   -0.25  0.760000  0.64  0.04  0.39  0.31
## YMR086C-A  0.95 -0.094996 -0.50 -0.10 -0.03  0.12
## YNL090W   -0.32  0.060000  0.22  0.36  0.23  0.18
## YDR335W   -0.11 -0.040000  0.03 -0.11 -0.06 -0.09
## YPL244C   -1.01 -0.410000 -0.27  0.01 -0.03  0.15
## YGL094C   -0.28 -0.320000 -0.22 -0.26 -0.21 -0.13
## YOR362C   -0.76 -0.180000 -0.09  0.05 -0.10 -0.05
## YFR032C-A -0.03 -0.160000  0.23  0.61  0.09 -0.10
## YLR143W    0.20 -0.450000 -0.21  0.05  0.12  0.18
## YLR152C    0.01 -0.470000 -1.29 -1.12 -1.02 -1.09
## YMR121C    2.11  0.880000  0.49  0.73  0.35  0.22
iGCN4 <- match( "GCN4", dfrmExpression[["NAME"]] )
iGCN4
## [1] 4646
rownames( dfrmExpression )[iGCN4]
## [1] "YEL009C"
dfrmData[iGCN4,]
##         G0.05 G0.1 G0.15 G0.2 G0.25 G0.3 N0.05 N0.1 N0.15 N0.2 N0.25 N0.3 P0.05
## YEL009C  1.15 0.97  0.97 0.32   0.2    0  1.14 1.33  1.17 1.13   0.7  0.3  1.21
##         P0.1 P0.15 P0.2 P0.25  P0.3 S0.05 S0.1 S0.15 S0.2 S0.25  S0.3 L0.05
## YEL009C 1.11  0.88 0.43  0.16 -0.01     1 0.36  0.12 0.02  0.07 -0.26  1.34
##         L0.1 L0.15 L0.2 L0.25  L0.3 U0.05 U0.1 U0.15 U0.2 U0.25  U0.3
## YEL009C 0.97  0.65 0.09 -0.18 -0.42  1.11 0.41 -0.19 -0.1 -0.17 -0.51
plot( 1:ncol( dfrmData ), dfrmData[iGCN4,], type = "l" )

plot( 1:ncol( dfrmData ), dfrmData[iGCN4,], type = "l", xaxt = "n",
      xlab = "",
      ylab = dfrmExpression["NAME"][iGCN4,] )
axis( 1, at = 1:ncol( dfrmData ), labels = colnames( dfrmData ), las = 2 )

library(ggplot2)
ggplot( ) + geom_line( aes( 1:ncol( dfrmData ), as.numeric(dfrmData[iGCN4,])) )

ggplot( ) + geom_line( aes( 1:ncol( dfrmData ), as.numeric(dfrmData[iGCN4,])) ) +
   scale_x_continuous( name = "", breaks = 1:ncol( dfrmData ), labels = colnames( dfrmData ) ) +
   theme( axis.text.x = element_text( angle = 45, hjust = 1 ) ) +
   ylab( dfrmExpression[["NAME"]][iGCN4] )


Let’s continue with the same additional example as before. Note that R doesn’t really have a good alternative to reading the original data as frames, so we’ll stick with the structures we’ve already built above (rather than re-loading them in another format like we did in Python).

afGlucose <- dfrmMetadata["Nutrient"] == "Glucose"
afGlucose
##       Nutrient
## G0.05     TRUE
## G0.1      TRUE
## G0.15     TRUE
## G0.2      TRUE
## G0.25     TRUE
## G0.3      TRUE
## N0.05    FALSE
## N0.1     FALSE
## N0.15    FALSE
## N0.2     FALSE
## N0.25    FALSE
## N0.3     FALSE
## P0.05    FALSE
## P0.1     FALSE
## P0.15    FALSE
## P0.2     FALSE
## P0.25    FALSE
## P0.3     FALSE
## S0.05    FALSE
## S0.1     FALSE
## S0.15    FALSE
## S0.2     FALSE
## S0.25    FALSE
## S0.3     FALSE
## L0.05    FALSE
## L0.1     FALSE
## L0.15    FALSE
## L0.2     FALSE
## L0.25    FALSE
## L0.3     FALSE
## U0.05    FALSE
## U0.1     FALSE
## U0.15    FALSE
## U0.2     FALSE
## U0.25    FALSE
## U0.3     FALSE
adFirstGene <- dfrmExpression[1,-1]
adFirstGene
##         G0.05  G0.1 G0.15  G0.2 G0.25 G0.3 N0.05  N0.1 N0.15  N0.2 N0.25 N0.3
## YMR123W -0.73 -0.37 -0.46 -0.41 -0.06 0.13 -1.55 -1.33 -0.51 -0.62  0.02 0.27
##         P0.05  P0.1 P0.15 P0.2 P0.25 P0.3 S0.05  S0.1 S0.15  S0.2 S0.25 S0.3
## YMR123W -0.79 -0.44 -0.17 0.14  0.11 0.32 -1.82 -0.95 -0.13 -0.15  -0.2 0.05
##         L0.05  L0.1 L0.15  L0.2 L0.25 L0.3 U0.05  U0.1 U0.15 U0.2 U0.25 U0.3
## YMR123W -0.41 -0.25  -0.2 -0.14  0.16 0.12  -0.9 -0.34 -0.04 0.17  0.04 0.25
adFirstGeneGlucose <- adFirstGene[afGlucose]
adFirstGeneGlucose
## [1] -0.73 -0.37 -0.46 -0.41 -0.06  0.13
adFirstGeneOther <- adFirstGene[!afGlucose]
adFirstGeneOther
##  [1] -1.55 -1.33 -0.51 -0.62  0.02  0.27 -0.79 -0.44 -0.17  0.14  0.11  0.32
## [13] -1.82 -0.95 -0.13 -0.15 -0.20  0.05 -0.41 -0.25 -0.20 -0.14  0.16  0.12
## [25] -0.90 -0.34 -0.04  0.17  0.04  0.25
t.test( adFirstGeneGlucose, adFirstGeneOther )
## 
##  Welch Two Sample t-test
## 
## data:  adFirstGeneGlucose and adFirstGeneOther
## t = -0.043799, df = 12.512, p-value = 0.9658
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3536437  0.3396437
## sample estimates:
##  mean of x  mean of y 
## -0.3166667 -0.3096667
lsFirstGeneGlucoseTTest <- t.test( adFirstGeneGlucose, adFirstGeneOther )
names( lsFirstGeneGlucoseTTest )
##  [1] "statistic"   "parameter"   "p.value"     "conf.int"    "estimate"   
##  [6] "null.value"  "stderr"      "alternative" "method"      "data.name"
lsFirstGeneGlucoseTTest["p.value"]
## $p.value
## [1] 0.9657557
lsFirstGeneGlucoseTTest[["p.value"]]
## [1] 0.9657557
lsFirstGeneGlucoseTTest[[3]]
## [1] 0.9657557

Now, we can do this next step the same way we did in Python, using a for loop to construct a vector of numeric p-values:

adPs <- c()
for( iGene in 1:nrow( dfrmData ) ) {
   lsTest <- t.test( dfrmData[iGene, afGlucose], dfrmData[iGene, !afGlucose] )
   adPs[iGene] <- lsTest[["p.value"]]
}
head( adPs, 100 )
##   [1] 9.657557e-01 3.093475e-04 4.511485e-01 1.263265e-02 3.512822e-02
##   [6] 5.439103e-04 4.375654e-01 6.083414e-01 4.992052e-03 3.966839e-05
##  [11] 5.709254e-02 2.406893e-01 8.942786e-01 2.157193e-08 7.548512e-01
##  [16] 1.586571e-02 2.329955e-01 4.500093e-01 9.816524e-01 1.028740e-03
##  [21] 8.288520e-02 1.902277e-02 1.078654e-02 6.120312e-03 4.867914e-01
##  [26] 4.331932e-04 3.274743e-01 9.914171e-02 7.952911e-01 6.006733e-01
##  [31] 3.762373e-01 8.621155e-02 1.806734e-01 2.163387e-02 1.282305e-03
##  [36] 3.295156e-01 1.283778e-02 4.274155e-01 3.993228e-01 9.056287e-01
##  [41] 7.383414e-01 2.594774e-02 3.488206e-03 5.178298e-05 7.164193e-01
##  [46] 5.925450e-01 3.882493e-03 5.010793e-02 2.762366e-01 6.641149e-01
##  [51] 2.919149e-02 9.965011e-01 3.775379e-01 1.691185e-02 3.674130e-04
##  [56] 3.957748e-01 5.492884e-01 9.411102e-01 2.068806e-02 3.217844e-01
##  [61] 7.771635e-01 1.320304e-03 6.537981e-02 4.072965e-01 1.649004e-01
##  [66] 4.827485e-01 6.532829e-01 7.985595e-04 2.079977e-01 8.271679e-02
##  [71] 6.476730e-01 7.259310e-01 1.894165e-01 3.678150e-01 6.614514e-04
##  [76] 4.892644e-01 2.401266e-02 2.444502e-02 1.314629e-02 3.368175e-02
##  [81] 6.991194e-04 1.937475e-01 1.739353e-02 9.013114e-02 7.346542e-01
##  [86] 4.241373e-02 4.936567e-01 1.958054e-03 2.434236e-03 5.758941e-02
##  [91] 8.610407e-01 8.660269e-02 4.317326e-03 8.307372e-02 2.061247e-01
##  [96] 8.750649e-02 7.978199e-03 6.482396e-06 1.709710e-03 1.621952e-01

However, R strongly recommends avoiding for loops whenever possible. Instead, use the apply family of functions to iterate over every element in a collection directly. apply itself runs on matrix-form data (i.e. data.frames):

lsTests <- apply( dfrmData, 1, function( ls ) t.test( ls[afGlucose], ls[!afGlucose] ) )
head( lsTests )
## $YMR123W
## 
##  Welch Two Sample t-test
## 
## data:  ls[afGlucose] and ls[!afGlucose]
## t = -0.043799, df = 12.512, p-value = 0.9658
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.3536437  0.3396437
## sample estimates:
##  mean of x  mean of y 
## -0.3166667 -0.3096667 
## 
## 
## $YJL012C
## 
##  Welch Two Sample t-test
## 
## data:  ls[afGlucose] and ls[!afGlucose]
## t = -4.0853, df = 29.502, p-value = 0.0003093
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.6387894 -0.5458772
## sample estimates:
##  mean of x  mean of y 
## -0.1300000  0.9623333 
## 
## 
## $YIL043C
## 
##  Welch Two Sample t-test
## 
## data:  ls[afGlucose] and ls[!afGlucose]
## t = -0.76375, df = 29.181, p-value = 0.4511
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.21817821  0.09951154
## sample estimates:
##  mean of x  mean of y 
## -0.2583333 -0.1990000 
## 
## 
## $YHR161C
## 
##  Welch Two Sample t-test
## 
## data:  ls[afGlucose] and ls[!afGlucose]
## t = 2.8284, df = 15.119, p-value = 0.01263
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.05325613 0.37807720
## sample estimates:
##   mean of x   mean of y 
## -0.06833333 -0.28400000 
## 
## 
## $YGR192C
## 
##  Welch Two Sample t-test
## 
## data:  ls[afGlucose] and ls[!afGlucose]
## t = -2.3995, df = 11.074, p-value = 0.03513
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.83987587 -0.08012413
## sample estimates:
##  mean of x  mean of y 
## -1.7133333 -0.7533333 
## 
## 
## $YFL030W
## 
##  Welch Two Sample t-test
## 
## data:  ls[afGlucose] and ls[!afGlucose]
## t = 5.9615, df = 7.066, p-value = 0.0005439
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.789336 4.134664
## sample estimates:
## mean of x mean of y 
##     1.035    -1.927

We can then combine this with lapply, which runs on linear data (lists or vectors), to keep just each result’s p-value:

lsPs <- lapply( lsTests, "[[", "p.value" )
head( lsPs )
## $YMR123W
## [1] 0.9657557
## 
## $YJL012C
## [1] 0.0003093475
## 
## $YIL043C
## [1] 0.4511485
## 
## $YHR161C
## [1] 0.01263265
## 
## $YGR192C
## [1] 0.03512822
## 
## $YFL030W
## [1] 0.0005439103

And finally convert that to a (numeric) vector:

adPs <- as.numeric( lsPs )
head( adPs, 100 )
##   [1] 9.657557e-01 3.093475e-04 4.511485e-01 1.263265e-02 3.512822e-02
##   [6] 5.439103e-04 4.375654e-01 6.083414e-01 4.992052e-03 3.966839e-05
##  [11] 5.709254e-02 2.406893e-01 8.942786e-01 2.157193e-08 7.548512e-01
##  [16] 1.586571e-02 2.329955e-01 4.500093e-01 9.816524e-01 1.028740e-03
##  [21] 8.288520e-02 1.902277e-02 1.078654e-02 6.120312e-03 4.867914e-01
##  [26] 4.331932e-04 3.274743e-01 9.914171e-02 7.952911e-01 6.006733e-01
##  [31] 3.762373e-01 8.621155e-02 1.806734e-01 2.163387e-02 1.282305e-03
##  [36] 3.295156e-01 1.283778e-02 4.274155e-01 3.993228e-01 9.056287e-01
##  [41] 7.383414e-01 2.594774e-02 3.488206e-03 5.178298e-05 7.164193e-01
##  [46] 5.925450e-01 3.882493e-03 5.010793e-02 2.762366e-01 6.641149e-01
##  [51] 2.919149e-02 9.965011e-01 3.775379e-01 1.691185e-02 3.674130e-04
##  [56] 3.957748e-01 5.492884e-01 9.411102e-01 2.068806e-02 3.217844e-01
##  [61] 7.771635e-01 1.320304e-03 6.537981e-02 4.072965e-01 1.649004e-01
##  [66] 4.827485e-01 6.532829e-01 7.985595e-04 2.079977e-01 8.271679e-02
##  [71] 6.476730e-01 7.259310e-01 1.894165e-01 3.678150e-01 6.614514e-04
##  [76] 4.892644e-01 2.401266e-02 2.444502e-02 1.314629e-02 3.368175e-02
##  [81] 6.991194e-04 1.937475e-01 1.739353e-02 9.013114e-02 7.346542e-01
##  [86] 4.241373e-02 4.936567e-01 1.958054e-03 2.434236e-03 5.758941e-02
##  [91] 8.610407e-01 8.660269e-02 4.317326e-03 8.307372e-02 2.061247e-01
##  [96] 8.750649e-02 7.978199e-03 6.482396e-06 1.709710e-03 1.621952e-01
dMinP <- min( adPs )
dMinP
## [1] 5.524548e-16
iMinP <- match( dMinP, adPs )
iMinP
## [1] 156
strMostDifferentialGeneInGlucose <- dfrmExpression[["NAME"]][iMinP]
strMostDifferentialGeneInGlucose
## [1] "HXT7"
ggplot( ) + geom_line( aes( 1:ncol( dfrmData ), as.numeric(dfrmData[iMinP,])) ) +
   scale_x_continuous( name = "", breaks = 1:ncol( dfrmData ), labels = colnames( dfrmData ) ) +
   theme( axis.text.x = element_text( angle = 45, hjust = 1 ) ) +
   ylab( strMostDifferentialGeneInGlucose )

ggplot( ) + geom_boxplot( aes( dfrmMetadata[["Nutrient"]], as.numeric(dfrmData[iMinP,]) ) )

As you might have noticed above, R does something funny (and often irritating) to factors (categorically valued strings) by default: it alphabetizes them. That’s what causes the plot above to be reordered. We can use a small R trick to fix this in our plot; unique is a function that returns unique values in a list in the order that they first occur:

unique( dfrmMetadata[["Nutrient"]] )
## [1] "Glucose"   "Ammonium"  "Phosphate" "Sulfate"   "Leucine"   "Uracil"

So we’ll replace our Nutrient factor with one containing identical values, but more intuitively reordered levels:

kNutrients <- factor( dfrmMetadata[["Nutrient"]], levels = unique( dfrmMetadata[["Nutrient"]] ) )
kNutrients
##  [1] Glucose   Glucose   Glucose   Glucose   Glucose   Glucose   Ammonium 
##  [8] Ammonium  Ammonium  Ammonium  Ammonium  Ammonium  Phosphate Phosphate
## [15] Phosphate Phosphate Phosphate Phosphate Sulfate   Sulfate   Sulfate  
## [22] Sulfate   Sulfate   Sulfate   Leucine   Leucine   Leucine   Leucine  
## [29] Leucine   Leucine   Uracil    Uracil    Uracil    Uracil    Uracil   
## [36] Uracil   
## Levels: Glucose Ammonium Phosphate Sulfate Leucine Uracil

And now plot it again (otherwise as above):

ggplot( ) + geom_boxplot( aes( kNutrients, as.numeric(dfrmData[iMinP,]),
                               color = kNutrients ) ) +
   xlab( "" ) +
   ylab( strMostDifferentialGeneInGlucose ) +
   theme( legend.position = "n" )

Bonus points! Why did R turn up a different most-differential-in-glucose gene than Python did? Is this also a reasonable result biologically?