In the Markdown options above, change the author
and date
strings so that they accurately reflect you and your submission.
Today’s lab will recapitulate the same exploratory data analysis of annotated gene expression data that we carried out earlier using Python, this time using any combination of R, Bioconductor, and ggplot2 that you’d like. Again, there are intentionally strong similarities with the problem set, so keep an eye out for opportunities to constructively transfer knowledge between the activities.
We’ll continue using the same gene expression dataset as before from PMID 17959824. As a reminder, the it assesses yeast (Saccharomyces cerevisiae) transcription when grown in chemostat (steady-state) continuous culture under six different nutrient limitations (glucose (carbon), ammonium (nitrogen), sulfate, phosphate, and the amino acids uracil or leucine) and six different growth/flow rates (in units of percent volume / hour).
You’ve already read about this experiment and should be familiar with its basic structure and encoding as expression.tsv
and map.tsv
files. So let’s get them loaded into R! Start by ensuring that this file and your two input files are again in the same directory, such that you see them when you run R’s equivalent of the ls
command:
dir( )
## [1] "~$r02-r.pptx" "expression.tsv" "k04-r.html" "k04-r.Rmd"
## [5] "l04-r.html" "l04-r.pptx" "l04-r.Rmd" "map.tsv"
## [9] "r02-r.pptx" "r02-r.Rmd"
Now the rest is up to you again! You should do at least a few interesting things with the dataset, which can again include any combination of:
data.frame
s (i.e. read.delim
), transforming them, or displaying them."G0.05"
to the Glucose
nutrient at time point 0.05
, or the gene with ID YMR123W
to the name PKR1
).Most importantly, come up with at least one or two good ideas of what to analyze - even if you don’t know how to do it yet! And then ask Google and/or ask us to figure it out. This is your opportunity to finish something you started earlier, or come up with a new or more interesting question to ask.
Add text (any combination of notes and R chunks) at the this Markdown file as needed, and to get you started, here’s an example. Note that R Studio isn’t as good as Jupyter at automatically abbreviating long output, so we’ll frequently use head
to show only the first n elements of an entity, and you’ll sometimes have to scroll to the right to see the entire output:
dfrmExpression <- read.delim( "expression.tsv", row.names = 1, stringsAsFactors = FALSE )
head( dfrmExpression, 25 )
## NAME
## YMR123W PKR1 || biological process unknown || molecular function unknown || YMR123W || 1082847
## YJL012C VTC4 || vacuole fusion, non-autophagic || molecular function unknown || YJL012C || 1083730
## YIL043C CBR1 || electron transport || cytochrome-b5 reductase activity || YIL043C || 1086073
## YHR161C YAP1801 || endocytosis || clathrin binding || YHR161C || 1082031
## YGR192C TDH3 || glycolysis* || glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) activity || YGR192C || 1082314
## YFL030W AGX1 || glycine biosynthesis || alanine-glyoxylate transaminase activity || YFL030W || 1084273
## YMR132C JLP2 || biological process unknown || molecular function unknown || YMR132C || 1084272
## YLR360W VPS38 || late endosome to vacuole transport || molecular function unknown || YLR360W || 1082720
## YEL021W URA3 || 'de novo' pyrimidine base biosynthesis* || orotidine-5'-phosphate decarboxylase activity || YEL021W || 1085994
## YLR163C MAS1 || mitochondrial protein processing || mitochondrial processing peptidase activity || YLR163C || 1085883
## YPL235W RVB2 || regulation of transcription from RNA polymerase II promoter* || ATPase activity || YPL235W || 1084807
## YDL061C RPS29B || protein biosynthesis || structural constituent of ribosome || YDL061C || 1081582
## YGL085W || biological process unknown || molecular function unknown || YGL085W || 1086743
## YKL047W || biological process unknown || molecular function unknown || YKL047W || 1081106
## YDL248W COS7 || biological process unknown || receptor activity || YDL248W || 1085813
## YMR086C-A || || || YMR086C-A || 1083761
## YNL090W RHO2 || cell wall organization and biogenesis* || GTPase activity* || YNL090W || 1082726
## YDR335W MSN5 || protein-nucleus export || protein binding* || YDR335W || 1084560
## YPL244C HUT1 || UDP-galactose transport || UDP-galactose transporter activity || YPL244C || 1081802
## YGL094C PAN2 || postreplication repair* || poly(A)-specific ribonuclease activity || YGL094C || 1083948
## YOR362C PRE10 || ubiquitin-dependent protein catabolism || endopeptidase activity || YOR362C || 1086519
## YFR032C-A RPL29 || protein biosynthesis || structural constituent of ribosome || YFR032C-A || 1082417
## YLR143W || biological process unknown || molecular function unknown || YLR143W || 1083634
## YLR152C || biological process unknown || molecular function unknown || YLR152C || 1086273
## YMR121C RPL15B || protein biosynthesis || structural constituent of ribosome* || YMR121C || 1084662
## G0.05 G0.1 G0.15 G0.2 G0.25 G0.3 N0.05 N0.1 N0.15
## YMR123W -0.73000000 -0.370000 -0.46 -0.41 -0.06 0.13000000 -1.55 -1.33 -0.51
## YJL012C -0.19000000 -0.080000 -0.16 -0.03 -0.16 -0.16000000 0.85 0.81 0.54
## YIL043C -0.45000000 -0.160000 -0.26 -0.22 -0.19 -0.27000000 -1.15 -0.89 -0.82
## YHR161C -0.05000000 -0.070000 0.14 -0.04 -0.29 -0.10000000 0.45 0.22 -0.50
## YGR192C -2.20000000 -2.910000 -1.99 -1.34 -0.99 -0.85000000 -1.97 -1.47 -2.15
## YFL030W 1.70000000 1.710000 1.52 1.71 0.65 -1.08000000 0.12 -0.02 -1.72
## YMR132C 0.16000000 0.600000 0.05 -0.23 -0.03 0.16000000 -0.85 -0.03 0.93
## YLR360W -0.09000000 0.050000 0.16 0.17 -0.05 0.07000000 -0.20 -0.12 0.58
## YEL021W -0.60000000 -0.500000 -0.17 0.16 0.11 0.22000000 -0.66 -0.52 -0.42
## YLR163C -0.03000000 -0.120000 -0.17 0.09 0.15 0.23000000 0.10 -0.18 -0.79
## YPL235W -0.14000000 -0.220000 -0.05 0.06 -0.07 0.07000000 0.09 0.03 0.00
## YDL061C 0.08000000 0.110000 0.08 -0.03 0.21 0.15000000 -0.89 -0.41 0.79
## YGL085W 0.25000000 0.350000 -0.24 -0.16 0.22 0.34000000 -0.68 -0.34 0.17
## YKL047W -0.06000000 -0.210000 -0.25 -0.17 -0.22 -0.16000000 -0.11 -0.02 0.07
## YDL248W -0.17000000 -0.090000 -0.05 -0.04 -0.58 -0.57000000 -0.11 -0.22 0.20
## YMR086C-A -0.00506832 -0.139995 -0.47 -0.21 -0.36 0.00399855 0.62 0.52 -0.12
## YNL090W -0.56000000 -0.050000 -0.10 -0.05 0.08 0.27000000 -0.74 -0.38 -0.17
## YDR335W 0.06000000 -0.030000 0.29 0.16 0.14 0.14000000 0.55 0.19 0.05
## YPL244C -0.72000000 -0.480000 -0.57 -0.32 -0.13 0.00000000 -1.31 -0.99 -0.99
## YGL094C -0.58000000 -0.410000 -0.37 -0.29 -0.27 -0.22000000 -0.28 -0.21 0.28
## YOR362C -0.15000000 -0.040000 0.09 0.00 -0.09 -0.15000000 -0.60 -0.69 -0.35
## YFR032C-A -0.43000000 -0.030000 -0.31 -0.33 0.05 -0.11000000 -1.04 -0.51 0.41
## YLR143W -0.48000000 -0.530000 -0.41 -0.27 -0.29 0.03000000 -0.18 -0.27 -0.04
## YLR152C 1.89000000 1.810000 1.32 0.94 0.11 -0.07000000 0.82 0.84 0.57
## YMR121C 0.69000000 0.700000 0.41 0.33 0.02 0.12000000 0.69 0.56 0.25
## N0.2 N0.25 N0.3 P0.05 P0.1 P0.15 P0.2 P0.25 P0.3 S0.05 S0.1
## YMR123W -0.62 0.02 0.27 -0.79 -0.44 -0.17 0.14 0.11 0.32 -1.820000 -0.95
## YJL012C 0.32 0.28 0.45 3.72 4.09 3.70 3.79 3.74 3.49 0.830000 0.73
## YIL043C -0.61 -0.48 -0.32 -0.11 -0.08 0.19 0.35 -0.03 0.10 -0.380000 -0.73
## YHR161C -0.87 -0.49 -0.26 -0.56 -0.52 -0.49 -0.40 -0.26 -0.27 -0.190000 -0.55
## YGR192C -1.42 -1.07 -0.60 0.39 0.75 1.05 1.34 1.51 1.11 -2.840000 -2.56
## YFL030W -2.01 -2.80 -2.98 -1.07 -1.19 -1.87 -2.36 -3.34 -4.20 -0.700000 -0.23
## YMR132C 0.62 0.71 0.10 0.50 0.31 0.53 0.63 0.59 0.91 0.590000 0.14
## YLR360W 0.35 0.13 0.08 -0.23 -0.29 -0.17 0.08 -0.13 -0.26 0.510000 0.24
## YEL021W -0.32 -0.14 0.20 -0.78 -0.58 -0.44 -0.62 -0.10 -0.32 -1.060000 -0.67
## YLR163C -0.81 -0.77 -0.33 -0.71 -0.58 -0.70 -0.85 -0.50 -0.63 0.490000 -0.37
## YPL235W 0.03 0.09 -0.06 -0.19 -0.05 0.08 -0.10 0.06 0.10 0.280000 0.04
## YDL061C 0.73 0.44 0.27 -0.63 -0.62 -0.48 -0.37 -0.40 0.12 -0.610000 0.09
## YGL085W 0.43 0.31 0.18 0.02 0.04 0.27 0.29 0.30 0.26 -0.180000 -0.02
## YKL047W 0.13 -0.05 -0.18 0.69 0.67 0.63 0.35 0.32 0.18 0.410000 0.22
## YDL248W -0.18 -0.26 -0.64 -0.60 -0.82 -1.01 -1.10 -1.63 -1.42 -0.100000 0.27
## YMR086C-A 0.09 -0.13 -0.04 -0.18 -0.19 0.26 -0.04 0.26 0.65 0.467001 -0.21
## YNL090W 0.10 0.16 0.23 -0.33 -0.26 0.01 0.14 0.19 0.15 0.470000 0.27
## YDR335W 0.04 0.00 0.14 0.37 0.25 0.12 0.36 0.28 0.01 0.380000 0.43
## YPL244C -0.47 -0.51 0.02 -0.76 -0.62 -0.25 0.23 0.21 0.25 -1.560000 -0.82
## YGL094C 0.00 0.18 -0.04 0.12 0.22 0.21 0.31 0.18 0.28 -0.440000 -0.07
## YOR362C -0.28 -0.02 -0.09 -0.14 -0.63 -0.30 -0.42 -0.20 -0.09 -0.420000 -0.25
## YFR032C-A 0.40 0.34 0.00 -0.65 -0.51 -0.16 0.21 0.27 0.79 0.180000 0.07
## YLR143W -0.14 -0.07 0.11 -0.09 0.24 0.19 0.16 0.35 0.41 0.800000 -0.42
## YLR152C -0.03 -0.31 -0.41 -1.25 -1.39 -0.99 -0.91 -1.16 -1.17 0.640000 0.23
## YMR121C -0.06 -0.01 -0.25 0.51 0.56 0.62 0.13 0.30 0.20 1.240000 0.75
## S0.15 S0.2 S0.25 S0.3 L0.05 L0.1 L0.15 L0.2 L0.25 L0.3
## YMR123W -0.1300000 -0.15 -0.2000000 0.05 -0.41 -0.25 -0.20 -0.14 0.16 0.12
## YJL012C 0.5200000 0.42 0.3200000 0.19 -0.37 -0.07 0.01 0.12 0.19 0.04
## YIL043C -0.1700000 -0.22 -0.1500000 -0.16 0.02 -0.06 -0.16 -0.09 -0.23 -0.06
## YHR161C -0.3500000 -0.64 -0.5100000 -0.35 -0.16 -0.25 -0.12 -0.12 -0.17 -0.31
## YGR192C -2.1200000 -1.24 -1.4200000 -0.72 -1.74 -0.66 -0.35 0.12 0.25 -0.29
## YFL030W -1.5700000 -1.63 -1.9400000 -1.78 -0.19 -1.21 -1.98 -2.43 -2.51 -3.17
## YMR132C 0.2600000 0.07 0.4300000 0.23 -0.46 -0.12 -0.17 0.14 -0.01 0.22
## YLR360W -0.0100000 0.06 0.0900000 0.08 -0.33 -0.04 -0.17 -0.10 -0.05 0.02
## YEL021W -0.2300000 -0.13 -0.1400000 0.00 -0.55 -0.42 -0.72 -0.16 -0.03 0.23
## YLR163C -0.5800000 -0.69 -0.6700000 -0.69 -0.33 -0.33 -0.15 -0.30 -0.22 -0.14
## YPL235W 0.1800000 -0.04 0.1000000 -0.03 -0.11 -0.11 0.23 0.18 0.13 0.28
## YDL061C 0.2000000 0.27 0.2200000 0.27 0.00 -0.12 -0.09 0.08 -0.03 0.19
## YGL085W 0.2800000 0.17 0.2900000 0.25 0.00 0.20 0.02 0.30 0.11 0.41
## YKL047W 0.1800000 0.01 0.1600000 0.03 -0.10 0.25 0.06 0.10 0.09 0.09
## YDL248W -0.0100000 0.21 -0.0500000 0.19 0.02 0.09 -0.42 -0.18 -0.01 -0.22
## YMR086C-A -0.0780547 -0.10 -0.0450466 -0.21 0.40 -0.65 0.36 0.55 0.18 -0.20
## YNL090W 0.1700000 0.31 0.2200000 0.32 -0.28 0.11 0.33 0.36 0.38 0.29
## YDR335W 0.5800000 0.39 0.2900000 0.10 0.43 0.27 0.21 -0.03 0.08 0.02
## YPL244C -0.5400000 -0.46 -0.2900000 -0.05 -0.56 -0.28 -0.10 0.01 0.13 0.27
## YGL094C -0.0200000 0.00 -0.0600000 0.00 -0.27 -0.36 -0.24 -0.25 -0.13 -0.08
## YOR362C 0.3100000 0.06 0.0900000 -0.19 0.13 0.05 0.07 0.06 0.02 -0.01
## YFR032C-A 0.3700000 0.44 0.4100000 0.30 0.10 -0.12 0.02 0.12 0.10 0.28
## YLR143W -0.3800000 -0.22 -0.2400000 0.00 -0.37 -0.26 -0.02 -0.05 0.01 0.06
## YLR152C -0.4200000 -0.27 -0.4300000 -0.31 0.51 0.01 -0.32 -0.54 -1.05 -1.39
## YMR121C 0.5600000 0.40 0.4800000 0.34 0.72 0.49 0.37 0.35 0.03 0.36
## U0.05 U0.1 U0.15 U0.2 U0.25 U0.3
## YMR123W -0.90 -0.340000 -0.04 0.17 0.04 0.25
## YJL012C -0.62 0.070000 0.07 0.02 0.25 0.37
## YIL043C 0.18 0.480000 -0.12 0.05 -0.20 -0.12
## YHR161C -0.67 -0.200000 -0.12 -0.08 0.19 0.03
## YGR192C -3.66 -1.490000 -0.85 -0.50 -0.07 0.07
## YFL030W -1.27 -1.680000 -2.29 -3.17 -3.39 -3.23
## YMR132C -0.03 -0.160000 0.05 0.48 0.15 0.04
## YLR360W -0.34 0.430000 0.07 0.08 0.22 0.01
## YEL021W -3.07 -3.560000 -3.24 -3.00 -3.25 -3.02
## YLR163C -0.55 -0.790000 -0.57 -0.45 -0.66 -0.56
## YPL235W 0.12 0.050000 0.08 0.28 0.03 -0.08
## YDL061C -0.16 0.260000 0.31 0.29 0.31 0.07
## YGL085W -0.59 0.080000 0.07 0.37 0.19 0.12
## YKL047W 0.65 0.390000 0.22 0.26 0.29 0.16
## YDL248W -0.25 0.760000 0.64 0.04 0.39 0.31
## YMR086C-A 0.95 -0.094996 -0.50 -0.10 -0.03 0.12
## YNL090W -0.32 0.060000 0.22 0.36 0.23 0.18
## YDR335W -0.11 -0.040000 0.03 -0.11 -0.06 -0.09
## YPL244C -1.01 -0.410000 -0.27 0.01 -0.03 0.15
## YGL094C -0.28 -0.320000 -0.22 -0.26 -0.21 -0.13
## YOR362C -0.76 -0.180000 -0.09 0.05 -0.10 -0.05
## YFR032C-A -0.03 -0.160000 0.23 0.61 0.09 -0.10
## YLR143W 0.20 -0.450000 -0.21 0.05 0.12 0.18
## YLR152C 0.01 -0.470000 -1.29 -1.12 -1.02 -1.09
## YMR121C 2.11 0.880000 0.49 0.73 0.35 0.22
dfrmMetadata <- read.delim( "map.tsv", row.names = 1, stringsAsFactors = FALSE )
dfrmMetadata
## Nutrient Rate
## G0.05 Glucose 0.05
## G0.1 Glucose 0.10
## G0.15 Glucose 0.15
## G0.2 Glucose 0.20
## G0.25 Glucose 0.25
## G0.3 Glucose 0.30
## N0.05 Ammonium 0.05
## N0.1 Ammonium 0.10
## N0.15 Ammonium 0.15
## N0.2 Ammonium 0.20
## N0.25 Ammonium 0.25
## N0.3 Ammonium 0.30
## P0.05 Phosphate 0.05
## P0.1 Phosphate 0.10
## P0.15 Phosphate 0.15
## P0.2 Phosphate 0.20
## P0.25 Phosphate 0.25
## P0.3 Phosphate 0.30
## S0.05 Sulfate 0.05
## S0.1 Sulfate 0.10
## S0.15 Sulfate 0.15
## S0.2 Sulfate 0.20
## S0.25 Sulfate 0.25
## S0.3 Sulfate 0.30
## L0.05 Leucine 0.05
## L0.1 Leucine 0.10
## L0.15 Leucine 0.15
## L0.2 Leucine 0.20
## L0.25 Leucine 0.25
## L0.3 Leucine 0.30
## U0.05 Uracil 0.05
## U0.1 Uracil 0.10
## U0.15 Uracil 0.15
## U0.2 Uracil 0.20
## U0.25 Uracil 0.25
## U0.3 Uracil 0.30
colnames( dfrmExpression )
## [1] "NAME" "G0.05" "G0.1" "G0.15" "G0.2" "G0.25" "G0.3" "N0.05" "N0.1"
## [10] "N0.15" "N0.2" "N0.25" "N0.3" "P0.05" "P0.1" "P0.15" "P0.2" "P0.25"
## [19] "P0.3" "S0.05" "S0.1" "S0.15" "S0.2" "S0.25" "S0.3" "L0.05" "L0.1"
## [28] "L0.15" "L0.2" "L0.25" "L0.3" "U0.05" "U0.1" "U0.15" "U0.2" "U0.25"
## [37] "U0.3"
head( rownames( dfrmExpression ), 100 )
## [1] "YMR123W" "YJL012C" "YIL043C" "YHR161C" "YGR192C" "YFL030W"
## [7] "YMR132C" "YLR360W" "YEL021W" "YLR163C" "YPL235W" "YDL061C"
## [13] "YGL085W" "YKL047W" "YDL248W" "YMR086C-A" "YNL090W" "YDR335W"
## [19] "YPL244C" "YGL094C" "YOR362C" "YFR032C-A" "YLR143W" "YLR152C"
## [25] "YMR121C" "YMR198W" "YKL056C" "YJL087C" "YNL286W" "YDL050C"
## [31] "YDL237W" "YMR110C" "YDL030W" "YPL213W" "YNL275W" "YBL092W"
## [37] "YDR333C" "YGL063W" "YIL001W" "YGL072C" "YOR340C" "YNL284C"
## [43] "YDR529C" "YER029C" "YMR176W" "YDL235C" "YLR130C" "YPL092W"
## [49] "YJL065C" "YBL081W" "YIL096C" "YDR399W" "YEL074W" "YBR158W"
## [55] "YGL208W" "YHR194W" "YBR167C" "YJL045W" "YIL076W" "YDR302W"
## [61] "YER009W" "YGL061C" "YER018C" "YDR388W" "YNL253W" "YPL081W"
## [67] "YLR196W" "YIL085C" "YDL224C" "YER190W" "YMR174C" "YDR397C"
## [73] "YBR147W" "YPL090C" "YHR183W" "YDL204W" "YDR507C" "YJL034W"
## [79] "YBR156C" "YNL242W" "YKL089W" "YLR185W" "YIL074C" "YDR300C"
## [85] "YDL213C" "YDR377W" "YPL070W" "YEL052W" "YMR163C" "YLR194C"
## [91] "YBR136W" "YGL206C" "YGL030W" "YFL061W" "YPL266W" "YNL251C"
## [97] "YHR172W" "YEL061C" "YLR174W" "YIL063C"
Again, don’t forget to scroll to the right for wide output:
head( dfrmExpression["NAME"], 25 )
## NAME
## YMR123W PKR1 || biological process unknown || molecular function unknown || YMR123W || 1082847
## YJL012C VTC4 || vacuole fusion, non-autophagic || molecular function unknown || YJL012C || 1083730
## YIL043C CBR1 || electron transport || cytochrome-b5 reductase activity || YIL043C || 1086073
## YHR161C YAP1801 || endocytosis || clathrin binding || YHR161C || 1082031
## YGR192C TDH3 || glycolysis* || glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) activity || YGR192C || 1082314
## YFL030W AGX1 || glycine biosynthesis || alanine-glyoxylate transaminase activity || YFL030W || 1084273
## YMR132C JLP2 || biological process unknown || molecular function unknown || YMR132C || 1084272
## YLR360W VPS38 || late endosome to vacuole transport || molecular function unknown || YLR360W || 1082720
## YEL021W URA3 || 'de novo' pyrimidine base biosynthesis* || orotidine-5'-phosphate decarboxylase activity || YEL021W || 1085994
## YLR163C MAS1 || mitochondrial protein processing || mitochondrial processing peptidase activity || YLR163C || 1085883
## YPL235W RVB2 || regulation of transcription from RNA polymerase II promoter* || ATPase activity || YPL235W || 1084807
## YDL061C RPS29B || protein biosynthesis || structural constituent of ribosome || YDL061C || 1081582
## YGL085W || biological process unknown || molecular function unknown || YGL085W || 1086743
## YKL047W || biological process unknown || molecular function unknown || YKL047W || 1081106
## YDL248W COS7 || biological process unknown || receptor activity || YDL248W || 1085813
## YMR086C-A || || || YMR086C-A || 1083761
## YNL090W RHO2 || cell wall organization and biogenesis* || GTPase activity* || YNL090W || 1082726
## YDR335W MSN5 || protein-nucleus export || protein binding* || YDR335W || 1084560
## YPL244C HUT1 || UDP-galactose transport || UDP-galactose transporter activity || YPL244C || 1081802
## YGL094C PAN2 || postreplication repair* || poly(A)-specific ribonuclease activity || YGL094C || 1083948
## YOR362C PRE10 || ubiquitin-dependent protein catabolism || endopeptidase activity || YOR362C || 1086519
## YFR032C-A RPL29 || protein biosynthesis || structural constituent of ribosome || YFR032C-A || 1082417
## YLR143W || biological process unknown || molecular function unknown || YLR143W || 1083634
## YLR152C || biological process unknown || molecular function unknown || YLR152C || 1086273
## YMR121C RPL15B || protein biosynthesis || structural constituent of ribosome* || YMR121C || 1084662
Recall that []
returns the slice NAME
(in this case a sub-data.frame
containing one column, NAME
), whereas [[]]
returns the element NAME
(in this case a string vector
):
head( dfrmExpression[["NAME"]], 10 )
## [1] "PKR1 || biological process unknown || molecular function unknown || YMR123W || 1082847"
## [2] "VTC4 || vacuole fusion, non-autophagic || molecular function unknown || YJL012C || 1083730"
## [3] "CBR1 || electron transport || cytochrome-b5 reductase activity || YIL043C || 1086073"
## [4] "YAP1801 || endocytosis || clathrin binding || YHR161C || 1082031"
## [5] "TDH3 || glycolysis* || glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) activity || YGR192C || 1082314"
## [6] "AGX1 || glycine biosynthesis || alanine-glyoxylate transaminase activity || YFL030W || 1084273"
## [7] "JLP2 || biological process unknown || molecular function unknown || YMR132C || 1084272"
## [8] "VPS38 || late endosome to vacuole transport || molecular function unknown || YLR360W || 1082720"
## [9] "URA3 || 'de novo' pyrimidine base biosynthesis* || orotidine-5'-phosphate decarboxylase activity || YEL021W || 1085994"
## [10] "MAS1 || mitochondrial protein processing || mitochondrial processing peptidase activity || YLR163C || 1085883"
For vector
types, the two forms ([]
and [[]]
) are more-or-less equivalent.
dfrmExpression[["NAME"]][1]
## [1] "PKR1 || biological process unknown || molecular function unknown || YMR123W || 1082847"
R is notably worse than Python at non-numeric data manipulation. To perform the same simplification of the ||-delineated human-readable gene names that we did in a couple simple lines of Python, the magic is (initially on just the first name as an example):
strsplit( dfrmExpression[["NAME"]][1], "\\|\\|" )
## [[1]]
## [1] "PKR1 " " biological process unknown "
## [3] " molecular function unknown " " YMR123W "
## [5] " 1082847"
strsplit( dfrmExpression[["NAME"]][1], "\\|\\|" )[[1]]
## [1] "PKR1 " " biological process unknown "
## [3] " molecular function unknown " " YMR123W "
## [5] " 1082847"
trimws( strsplit( dfrmExpression[["NAME"]][1], "\\|\\|" )[[1]] )
## [1] "PKR1" "biological process unknown"
## [3] "molecular function unknown" "YMR123W"
## [5] "1082847"
trimws( strsplit( dfrmExpression[["NAME"]][1], "\\|\\|" )[[1]] )[1]
## [1] "PKR1"
Got all that? We split the string at every occurrence of "||"
(which requires extra escape characters in R), keep only the first element of the resulting list
(which is of length one), remove its whitespace, and then keep the first element of that resulting vector
(which is of length equal to the number of ||-delimited tokens in the first name).
Even more confusingly, to apply this to every name in the original data frame column, we add one more layer:
astrNames <- trimws( lapply( strsplit( dfrmExpression[["NAME"]], "\\|\\|" ), "[[", 1 ) )
head( astrNames, 100 )
## [1] "PKR1" "VTC4" "CBR1" "YAP1801" "TDH3" "AGX1" "JLP2"
## [8] "VPS38" "URA3" "MAS1" "RVB2" "RPS29B" "" ""
## [15] "COS7" "" "RHO2" "MSN5" "HUT1" "PAN2" "PRE10"
## [22] "RPL29" "" "" "RPL15B" "CIK1" "" "TRL1"
## [29] "CUS2" "" "" "" "PRP9" "LEA1" ""
## [36] "RPL32" "" "PUS2" "" "" "RPA43" "MRPL10"
## [43] "QCR7" "SMB1" "ECM5" "YPD1" "ZRT2" "SSU1" "DLS1"
## [50] "" "" "HPT1" "" "AMN1" "SIP2" "MDM31"
## [57] "POP7" "" "SEC28" "GPI11" "NTF2" "DUO1" "SPC25"
## [64] "RVS167" "TEX1" "RPS9A" "PWP1" "KTR7" "WHI4" "YRF1-2"
## [71] "PAI3" "NCB2" "" "RPS6A" "GND1" "RTN2" "GIN4"
## [78] "KAR2" "SLI15" "ATG2" "MIF2" "RPL37A" "SER33" "PRO1"
## [85] "NOP6" "ATP17" "MUK1" "AFG1" "" "" "MEC1"
## [92] "CHC1" "RPL30" "" "DIM1" "NRD1" "SPC97" "CIN8"
## [99] "IDP2" "YRB2"
But after all of that work, we can save the results back into the original data frame to work with later:
dfrmExpression[["NAME"]] <- astrNames
head( dfrmExpression["NAME"], 25 )
## NAME
## YMR123W PKR1
## YJL012C VTC4
## YIL043C CBR1
## YHR161C YAP1801
## YGR192C TDH3
## YFL030W AGX1
## YMR132C JLP2
## YLR360W VPS38
## YEL021W URA3
## YLR163C MAS1
## YPL235W RVB2
## YDL061C RPS29B
## YGL085W
## YKL047W
## YDL248W COS7
## YMR086C-A
## YNL090W RHO2
## YDR335W MSN5
## YPL244C HUT1
## YGL094C PAN2
## YOR362C PRE10
## YFR032C-A RPL29
## YLR143W
## YLR152C
## YMR121C RPL15B
Now, back to work…
dfrmData <- dfrmExpression[,-1]
head( dfrmData, 25 )
## G0.05 G0.1 G0.15 G0.2 G0.25 G0.3 N0.05 N0.1 N0.15
## YMR123W -0.73000000 -0.370000 -0.46 -0.41 -0.06 0.13000000 -1.55 -1.33 -0.51
## YJL012C -0.19000000 -0.080000 -0.16 -0.03 -0.16 -0.16000000 0.85 0.81 0.54
## YIL043C -0.45000000 -0.160000 -0.26 -0.22 -0.19 -0.27000000 -1.15 -0.89 -0.82
## YHR161C -0.05000000 -0.070000 0.14 -0.04 -0.29 -0.10000000 0.45 0.22 -0.50
## YGR192C -2.20000000 -2.910000 -1.99 -1.34 -0.99 -0.85000000 -1.97 -1.47 -2.15
## YFL030W 1.70000000 1.710000 1.52 1.71 0.65 -1.08000000 0.12 -0.02 -1.72
## YMR132C 0.16000000 0.600000 0.05 -0.23 -0.03 0.16000000 -0.85 -0.03 0.93
## YLR360W -0.09000000 0.050000 0.16 0.17 -0.05 0.07000000 -0.20 -0.12 0.58
## YEL021W -0.60000000 -0.500000 -0.17 0.16 0.11 0.22000000 -0.66 -0.52 -0.42
## YLR163C -0.03000000 -0.120000 -0.17 0.09 0.15 0.23000000 0.10 -0.18 -0.79
## YPL235W -0.14000000 -0.220000 -0.05 0.06 -0.07 0.07000000 0.09 0.03 0.00
## YDL061C 0.08000000 0.110000 0.08 -0.03 0.21 0.15000000 -0.89 -0.41 0.79
## YGL085W 0.25000000 0.350000 -0.24 -0.16 0.22 0.34000000 -0.68 -0.34 0.17
## YKL047W -0.06000000 -0.210000 -0.25 -0.17 -0.22 -0.16000000 -0.11 -0.02 0.07
## YDL248W -0.17000000 -0.090000 -0.05 -0.04 -0.58 -0.57000000 -0.11 -0.22 0.20
## YMR086C-A -0.00506832 -0.139995 -0.47 -0.21 -0.36 0.00399855 0.62 0.52 -0.12
## YNL090W -0.56000000 -0.050000 -0.10 -0.05 0.08 0.27000000 -0.74 -0.38 -0.17
## YDR335W 0.06000000 -0.030000 0.29 0.16 0.14 0.14000000 0.55 0.19 0.05
## YPL244C -0.72000000 -0.480000 -0.57 -0.32 -0.13 0.00000000 -1.31 -0.99 -0.99
## YGL094C -0.58000000 -0.410000 -0.37 -0.29 -0.27 -0.22000000 -0.28 -0.21 0.28
## YOR362C -0.15000000 -0.040000 0.09 0.00 -0.09 -0.15000000 -0.60 -0.69 -0.35
## YFR032C-A -0.43000000 -0.030000 -0.31 -0.33 0.05 -0.11000000 -1.04 -0.51 0.41
## YLR143W -0.48000000 -0.530000 -0.41 -0.27 -0.29 0.03000000 -0.18 -0.27 -0.04
## YLR152C 1.89000000 1.810000 1.32 0.94 0.11 -0.07000000 0.82 0.84 0.57
## YMR121C 0.69000000 0.700000 0.41 0.33 0.02 0.12000000 0.69 0.56 0.25
## N0.2 N0.25 N0.3 P0.05 P0.1 P0.15 P0.2 P0.25 P0.3 S0.05 S0.1
## YMR123W -0.62 0.02 0.27 -0.79 -0.44 -0.17 0.14 0.11 0.32 -1.820000 -0.95
## YJL012C 0.32 0.28 0.45 3.72 4.09 3.70 3.79 3.74 3.49 0.830000 0.73
## YIL043C -0.61 -0.48 -0.32 -0.11 -0.08 0.19 0.35 -0.03 0.10 -0.380000 -0.73
## YHR161C -0.87 -0.49 -0.26 -0.56 -0.52 -0.49 -0.40 -0.26 -0.27 -0.190000 -0.55
## YGR192C -1.42 -1.07 -0.60 0.39 0.75 1.05 1.34 1.51 1.11 -2.840000 -2.56
## YFL030W -2.01 -2.80 -2.98 -1.07 -1.19 -1.87 -2.36 -3.34 -4.20 -0.700000 -0.23
## YMR132C 0.62 0.71 0.10 0.50 0.31 0.53 0.63 0.59 0.91 0.590000 0.14
## YLR360W 0.35 0.13 0.08 -0.23 -0.29 -0.17 0.08 -0.13 -0.26 0.510000 0.24
## YEL021W -0.32 -0.14 0.20 -0.78 -0.58 -0.44 -0.62 -0.10 -0.32 -1.060000 -0.67
## YLR163C -0.81 -0.77 -0.33 -0.71 -0.58 -0.70 -0.85 -0.50 -0.63 0.490000 -0.37
## YPL235W 0.03 0.09 -0.06 -0.19 -0.05 0.08 -0.10 0.06 0.10 0.280000 0.04
## YDL061C 0.73 0.44 0.27 -0.63 -0.62 -0.48 -0.37 -0.40 0.12 -0.610000 0.09
## YGL085W 0.43 0.31 0.18 0.02 0.04 0.27 0.29 0.30 0.26 -0.180000 -0.02
## YKL047W 0.13 -0.05 -0.18 0.69 0.67 0.63 0.35 0.32 0.18 0.410000 0.22
## YDL248W -0.18 -0.26 -0.64 -0.60 -0.82 -1.01 -1.10 -1.63 -1.42 -0.100000 0.27
## YMR086C-A 0.09 -0.13 -0.04 -0.18 -0.19 0.26 -0.04 0.26 0.65 0.467001 -0.21
## YNL090W 0.10 0.16 0.23 -0.33 -0.26 0.01 0.14 0.19 0.15 0.470000 0.27
## YDR335W 0.04 0.00 0.14 0.37 0.25 0.12 0.36 0.28 0.01 0.380000 0.43
## YPL244C -0.47 -0.51 0.02 -0.76 -0.62 -0.25 0.23 0.21 0.25 -1.560000 -0.82
## YGL094C 0.00 0.18 -0.04 0.12 0.22 0.21 0.31 0.18 0.28 -0.440000 -0.07
## YOR362C -0.28 -0.02 -0.09 -0.14 -0.63 -0.30 -0.42 -0.20 -0.09 -0.420000 -0.25
## YFR032C-A 0.40 0.34 0.00 -0.65 -0.51 -0.16 0.21 0.27 0.79 0.180000 0.07
## YLR143W -0.14 -0.07 0.11 -0.09 0.24 0.19 0.16 0.35 0.41 0.800000 -0.42
## YLR152C -0.03 -0.31 -0.41 -1.25 -1.39 -0.99 -0.91 -1.16 -1.17 0.640000 0.23
## YMR121C -0.06 -0.01 -0.25 0.51 0.56 0.62 0.13 0.30 0.20 1.240000 0.75
## S0.15 S0.2 S0.25 S0.3 L0.05 L0.1 L0.15 L0.2 L0.25 L0.3
## YMR123W -0.1300000 -0.15 -0.2000000 0.05 -0.41 -0.25 -0.20 -0.14 0.16 0.12
## YJL012C 0.5200000 0.42 0.3200000 0.19 -0.37 -0.07 0.01 0.12 0.19 0.04
## YIL043C -0.1700000 -0.22 -0.1500000 -0.16 0.02 -0.06 -0.16 -0.09 -0.23 -0.06
## YHR161C -0.3500000 -0.64 -0.5100000 -0.35 -0.16 -0.25 -0.12 -0.12 -0.17 -0.31
## YGR192C -2.1200000 -1.24 -1.4200000 -0.72 -1.74 -0.66 -0.35 0.12 0.25 -0.29
## YFL030W -1.5700000 -1.63 -1.9400000 -1.78 -0.19 -1.21 -1.98 -2.43 -2.51 -3.17
## YMR132C 0.2600000 0.07 0.4300000 0.23 -0.46 -0.12 -0.17 0.14 -0.01 0.22
## YLR360W -0.0100000 0.06 0.0900000 0.08 -0.33 -0.04 -0.17 -0.10 -0.05 0.02
## YEL021W -0.2300000 -0.13 -0.1400000 0.00 -0.55 -0.42 -0.72 -0.16 -0.03 0.23
## YLR163C -0.5800000 -0.69 -0.6700000 -0.69 -0.33 -0.33 -0.15 -0.30 -0.22 -0.14
## YPL235W 0.1800000 -0.04 0.1000000 -0.03 -0.11 -0.11 0.23 0.18 0.13 0.28
## YDL061C 0.2000000 0.27 0.2200000 0.27 0.00 -0.12 -0.09 0.08 -0.03 0.19
## YGL085W 0.2800000 0.17 0.2900000 0.25 0.00 0.20 0.02 0.30 0.11 0.41
## YKL047W 0.1800000 0.01 0.1600000 0.03 -0.10 0.25 0.06 0.10 0.09 0.09
## YDL248W -0.0100000 0.21 -0.0500000 0.19 0.02 0.09 -0.42 -0.18 -0.01 -0.22
## YMR086C-A -0.0780547 -0.10 -0.0450466 -0.21 0.40 -0.65 0.36 0.55 0.18 -0.20
## YNL090W 0.1700000 0.31 0.2200000 0.32 -0.28 0.11 0.33 0.36 0.38 0.29
## YDR335W 0.5800000 0.39 0.2900000 0.10 0.43 0.27 0.21 -0.03 0.08 0.02
## YPL244C -0.5400000 -0.46 -0.2900000 -0.05 -0.56 -0.28 -0.10 0.01 0.13 0.27
## YGL094C -0.0200000 0.00 -0.0600000 0.00 -0.27 -0.36 -0.24 -0.25 -0.13 -0.08
## YOR362C 0.3100000 0.06 0.0900000 -0.19 0.13 0.05 0.07 0.06 0.02 -0.01
## YFR032C-A 0.3700000 0.44 0.4100000 0.30 0.10 -0.12 0.02 0.12 0.10 0.28
## YLR143W -0.3800000 -0.22 -0.2400000 0.00 -0.37 -0.26 -0.02 -0.05 0.01 0.06
## YLR152C -0.4200000 -0.27 -0.4300000 -0.31 0.51 0.01 -0.32 -0.54 -1.05 -1.39
## YMR121C 0.5600000 0.40 0.4800000 0.34 0.72 0.49 0.37 0.35 0.03 0.36
## U0.05 U0.1 U0.15 U0.2 U0.25 U0.3
## YMR123W -0.90 -0.340000 -0.04 0.17 0.04 0.25
## YJL012C -0.62 0.070000 0.07 0.02 0.25 0.37
## YIL043C 0.18 0.480000 -0.12 0.05 -0.20 -0.12
## YHR161C -0.67 -0.200000 -0.12 -0.08 0.19 0.03
## YGR192C -3.66 -1.490000 -0.85 -0.50 -0.07 0.07
## YFL030W -1.27 -1.680000 -2.29 -3.17 -3.39 -3.23
## YMR132C -0.03 -0.160000 0.05 0.48 0.15 0.04
## YLR360W -0.34 0.430000 0.07 0.08 0.22 0.01
## YEL021W -3.07 -3.560000 -3.24 -3.00 -3.25 -3.02
## YLR163C -0.55 -0.790000 -0.57 -0.45 -0.66 -0.56
## YPL235W 0.12 0.050000 0.08 0.28 0.03 -0.08
## YDL061C -0.16 0.260000 0.31 0.29 0.31 0.07
## YGL085W -0.59 0.080000 0.07 0.37 0.19 0.12
## YKL047W 0.65 0.390000 0.22 0.26 0.29 0.16
## YDL248W -0.25 0.760000 0.64 0.04 0.39 0.31
## YMR086C-A 0.95 -0.094996 -0.50 -0.10 -0.03 0.12
## YNL090W -0.32 0.060000 0.22 0.36 0.23 0.18
## YDR335W -0.11 -0.040000 0.03 -0.11 -0.06 -0.09
## YPL244C -1.01 -0.410000 -0.27 0.01 -0.03 0.15
## YGL094C -0.28 -0.320000 -0.22 -0.26 -0.21 -0.13
## YOR362C -0.76 -0.180000 -0.09 0.05 -0.10 -0.05
## YFR032C-A -0.03 -0.160000 0.23 0.61 0.09 -0.10
## YLR143W 0.20 -0.450000 -0.21 0.05 0.12 0.18
## YLR152C 0.01 -0.470000 -1.29 -1.12 -1.02 -1.09
## YMR121C 2.11 0.880000 0.49 0.73 0.35 0.22
iGCN4 <- match( "GCN4", dfrmExpression[["NAME"]] )
iGCN4
## [1] 4646
rownames( dfrmExpression )[iGCN4]
## [1] "YEL009C"
dfrmData[iGCN4,]
## G0.05 G0.1 G0.15 G0.2 G0.25 G0.3 N0.05 N0.1 N0.15 N0.2 N0.25 N0.3 P0.05
## YEL009C 1.15 0.97 0.97 0.32 0.2 0 1.14 1.33 1.17 1.13 0.7 0.3 1.21
## P0.1 P0.15 P0.2 P0.25 P0.3 S0.05 S0.1 S0.15 S0.2 S0.25 S0.3 L0.05
## YEL009C 1.11 0.88 0.43 0.16 -0.01 1 0.36 0.12 0.02 0.07 -0.26 1.34
## L0.1 L0.15 L0.2 L0.25 L0.3 U0.05 U0.1 U0.15 U0.2 U0.25 U0.3
## YEL009C 0.97 0.65 0.09 -0.18 -0.42 1.11 0.41 -0.19 -0.1 -0.17 -0.51
plot( 1:ncol( dfrmData ), dfrmData[iGCN4,], type = "l" )
plot( 1:ncol( dfrmData ), dfrmData[iGCN4,], type = "l", xaxt = "n",
xlab = "",
ylab = dfrmExpression["NAME"][iGCN4,] )
axis( 1, at = 1:ncol( dfrmData ), labels = colnames( dfrmData ), las = 2 )
library(ggplot2)
ggplot( ) + geom_line( aes( 1:ncol( dfrmData ), as.numeric(dfrmData[iGCN4,])) )
ggplot( ) + geom_line( aes( 1:ncol( dfrmData ), as.numeric(dfrmData[iGCN4,])) ) +
scale_x_continuous( name = "", breaks = 1:ncol( dfrmData ), labels = colnames( dfrmData ) ) +
theme( axis.text.x = element_text( angle = 45, hjust = 1 ) ) +
ylab( dfrmExpression[["NAME"]][iGCN4] )
Let’s continue with the same additional example as before. Note that R doesn’t really have a good alternative to reading the original data as frames, so we’ll stick with the structures we’ve already built above (rather than re-loading them in another format like we did in Python).
afGlucose <- dfrmMetadata["Nutrient"] == "Glucose"
afGlucose
## Nutrient
## G0.05 TRUE
## G0.1 TRUE
## G0.15 TRUE
## G0.2 TRUE
## G0.25 TRUE
## G0.3 TRUE
## N0.05 FALSE
## N0.1 FALSE
## N0.15 FALSE
## N0.2 FALSE
## N0.25 FALSE
## N0.3 FALSE
## P0.05 FALSE
## P0.1 FALSE
## P0.15 FALSE
## P0.2 FALSE
## P0.25 FALSE
## P0.3 FALSE
## S0.05 FALSE
## S0.1 FALSE
## S0.15 FALSE
## S0.2 FALSE
## S0.25 FALSE
## S0.3 FALSE
## L0.05 FALSE
## L0.1 FALSE
## L0.15 FALSE
## L0.2 FALSE
## L0.25 FALSE
## L0.3 FALSE
## U0.05 FALSE
## U0.1 FALSE
## U0.15 FALSE
## U0.2 FALSE
## U0.25 FALSE
## U0.3 FALSE
adFirstGene <- dfrmExpression[1,-1]
adFirstGene
## G0.05 G0.1 G0.15 G0.2 G0.25 G0.3 N0.05 N0.1 N0.15 N0.2 N0.25 N0.3
## YMR123W -0.73 -0.37 -0.46 -0.41 -0.06 0.13 -1.55 -1.33 -0.51 -0.62 0.02 0.27
## P0.05 P0.1 P0.15 P0.2 P0.25 P0.3 S0.05 S0.1 S0.15 S0.2 S0.25 S0.3
## YMR123W -0.79 -0.44 -0.17 0.14 0.11 0.32 -1.82 -0.95 -0.13 -0.15 -0.2 0.05
## L0.05 L0.1 L0.15 L0.2 L0.25 L0.3 U0.05 U0.1 U0.15 U0.2 U0.25 U0.3
## YMR123W -0.41 -0.25 -0.2 -0.14 0.16 0.12 -0.9 -0.34 -0.04 0.17 0.04 0.25
adFirstGeneGlucose <- adFirstGene[afGlucose]
adFirstGeneGlucose
## [1] -0.73 -0.37 -0.46 -0.41 -0.06 0.13
adFirstGeneOther <- adFirstGene[!afGlucose]
adFirstGeneOther
## [1] -1.55 -1.33 -0.51 -0.62 0.02 0.27 -0.79 -0.44 -0.17 0.14 0.11 0.32
## [13] -1.82 -0.95 -0.13 -0.15 -0.20 0.05 -0.41 -0.25 -0.20 -0.14 0.16 0.12
## [25] -0.90 -0.34 -0.04 0.17 0.04 0.25
t.test( adFirstGeneGlucose, adFirstGeneOther )
##
## Welch Two Sample t-test
##
## data: adFirstGeneGlucose and adFirstGeneOther
## t = -0.043799, df = 12.512, p-value = 0.9658
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3536437 0.3396437
## sample estimates:
## mean of x mean of y
## -0.3166667 -0.3096667
lsFirstGeneGlucoseTTest <- t.test( adFirstGeneGlucose, adFirstGeneOther )
names( lsFirstGeneGlucoseTTest )
## [1] "statistic" "parameter" "p.value" "conf.int" "estimate"
## [6] "null.value" "stderr" "alternative" "method" "data.name"
lsFirstGeneGlucoseTTest["p.value"]
## $p.value
## [1] 0.9657557
lsFirstGeneGlucoseTTest[["p.value"]]
## [1] 0.9657557
lsFirstGeneGlucoseTTest[[3]]
## [1] 0.9657557
Now, we can do this next step the same way we did in Python, using a for
loop to construct a vector
of numeric p-values:
adPs <- c()
for( iGene in 1:nrow( dfrmData ) ) {
lsTest <- t.test( dfrmData[iGene, afGlucose], dfrmData[iGene, !afGlucose] )
adPs[iGene] <- lsTest[["p.value"]]
}
head( adPs, 100 )
## [1] 9.657557e-01 3.093475e-04 4.511485e-01 1.263265e-02 3.512822e-02
## [6] 5.439103e-04 4.375654e-01 6.083414e-01 4.992052e-03 3.966839e-05
## [11] 5.709254e-02 2.406893e-01 8.942786e-01 2.157193e-08 7.548512e-01
## [16] 1.586571e-02 2.329955e-01 4.500093e-01 9.816524e-01 1.028740e-03
## [21] 8.288520e-02 1.902277e-02 1.078654e-02 6.120312e-03 4.867914e-01
## [26] 4.331932e-04 3.274743e-01 9.914171e-02 7.952911e-01 6.006733e-01
## [31] 3.762373e-01 8.621155e-02 1.806734e-01 2.163387e-02 1.282305e-03
## [36] 3.295156e-01 1.283778e-02 4.274155e-01 3.993228e-01 9.056287e-01
## [41] 7.383414e-01 2.594774e-02 3.488206e-03 5.178298e-05 7.164193e-01
## [46] 5.925450e-01 3.882493e-03 5.010793e-02 2.762366e-01 6.641149e-01
## [51] 2.919149e-02 9.965011e-01 3.775379e-01 1.691185e-02 3.674130e-04
## [56] 3.957748e-01 5.492884e-01 9.411102e-01 2.068806e-02 3.217844e-01
## [61] 7.771635e-01 1.320304e-03 6.537981e-02 4.072965e-01 1.649004e-01
## [66] 4.827485e-01 6.532829e-01 7.985595e-04 2.079977e-01 8.271679e-02
## [71] 6.476730e-01 7.259310e-01 1.894165e-01 3.678150e-01 6.614514e-04
## [76] 4.892644e-01 2.401266e-02 2.444502e-02 1.314629e-02 3.368175e-02
## [81] 6.991194e-04 1.937475e-01 1.739353e-02 9.013114e-02 7.346542e-01
## [86] 4.241373e-02 4.936567e-01 1.958054e-03 2.434236e-03 5.758941e-02
## [91] 8.610407e-01 8.660269e-02 4.317326e-03 8.307372e-02 2.061247e-01
## [96] 8.750649e-02 7.978199e-03 6.482396e-06 1.709710e-03 1.621952e-01
However, R strongly recommends avoiding for
loops whenever possible. Instead, use the apply
family of functions to iterate over every element in a collection directly. apply
itself runs on matrix-form data (i.e. data.frame
s):
lsTests <- apply( dfrmData, 1, function( ls ) t.test( ls[afGlucose], ls[!afGlucose] ) )
head( lsTests )
## $YMR123W
##
## Welch Two Sample t-test
##
## data: ls[afGlucose] and ls[!afGlucose]
## t = -0.043799, df = 12.512, p-value = 0.9658
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.3536437 0.3396437
## sample estimates:
## mean of x mean of y
## -0.3166667 -0.3096667
##
##
## $YJL012C
##
## Welch Two Sample t-test
##
## data: ls[afGlucose] and ls[!afGlucose]
## t = -4.0853, df = 29.502, p-value = 0.0003093
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.6387894 -0.5458772
## sample estimates:
## mean of x mean of y
## -0.1300000 0.9623333
##
##
## $YIL043C
##
## Welch Two Sample t-test
##
## data: ls[afGlucose] and ls[!afGlucose]
## t = -0.76375, df = 29.181, p-value = 0.4511
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.21817821 0.09951154
## sample estimates:
## mean of x mean of y
## -0.2583333 -0.1990000
##
##
## $YHR161C
##
## Welch Two Sample t-test
##
## data: ls[afGlucose] and ls[!afGlucose]
## t = 2.8284, df = 15.119, p-value = 0.01263
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.05325613 0.37807720
## sample estimates:
## mean of x mean of y
## -0.06833333 -0.28400000
##
##
## $YGR192C
##
## Welch Two Sample t-test
##
## data: ls[afGlucose] and ls[!afGlucose]
## t = -2.3995, df = 11.074, p-value = 0.03513
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.83987587 -0.08012413
## sample estimates:
## mean of x mean of y
## -1.7133333 -0.7533333
##
##
## $YFL030W
##
## Welch Two Sample t-test
##
## data: ls[afGlucose] and ls[!afGlucose]
## t = 5.9615, df = 7.066, p-value = 0.0005439
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.789336 4.134664
## sample estimates:
## mean of x mean of y
## 1.035 -1.927
We can then combine this with lapply
, which runs on linear data (list
s or vector
s), to keep just each result’s p-value:
lsPs <- lapply( lsTests, "[[", "p.value" )
head( lsPs )
## $YMR123W
## [1] 0.9657557
##
## $YJL012C
## [1] 0.0003093475
##
## $YIL043C
## [1] 0.4511485
##
## $YHR161C
## [1] 0.01263265
##
## $YGR192C
## [1] 0.03512822
##
## $YFL030W
## [1] 0.0005439103
And finally convert that to a (numeric) vector
:
adPs <- as.numeric( lsPs )
head( adPs, 100 )
## [1] 9.657557e-01 3.093475e-04 4.511485e-01 1.263265e-02 3.512822e-02
## [6] 5.439103e-04 4.375654e-01 6.083414e-01 4.992052e-03 3.966839e-05
## [11] 5.709254e-02 2.406893e-01 8.942786e-01 2.157193e-08 7.548512e-01
## [16] 1.586571e-02 2.329955e-01 4.500093e-01 9.816524e-01 1.028740e-03
## [21] 8.288520e-02 1.902277e-02 1.078654e-02 6.120312e-03 4.867914e-01
## [26] 4.331932e-04 3.274743e-01 9.914171e-02 7.952911e-01 6.006733e-01
## [31] 3.762373e-01 8.621155e-02 1.806734e-01 2.163387e-02 1.282305e-03
## [36] 3.295156e-01 1.283778e-02 4.274155e-01 3.993228e-01 9.056287e-01
## [41] 7.383414e-01 2.594774e-02 3.488206e-03 5.178298e-05 7.164193e-01
## [46] 5.925450e-01 3.882493e-03 5.010793e-02 2.762366e-01 6.641149e-01
## [51] 2.919149e-02 9.965011e-01 3.775379e-01 1.691185e-02 3.674130e-04
## [56] 3.957748e-01 5.492884e-01 9.411102e-01 2.068806e-02 3.217844e-01
## [61] 7.771635e-01 1.320304e-03 6.537981e-02 4.072965e-01 1.649004e-01
## [66] 4.827485e-01 6.532829e-01 7.985595e-04 2.079977e-01 8.271679e-02
## [71] 6.476730e-01 7.259310e-01 1.894165e-01 3.678150e-01 6.614514e-04
## [76] 4.892644e-01 2.401266e-02 2.444502e-02 1.314629e-02 3.368175e-02
## [81] 6.991194e-04 1.937475e-01 1.739353e-02 9.013114e-02 7.346542e-01
## [86] 4.241373e-02 4.936567e-01 1.958054e-03 2.434236e-03 5.758941e-02
## [91] 8.610407e-01 8.660269e-02 4.317326e-03 8.307372e-02 2.061247e-01
## [96] 8.750649e-02 7.978199e-03 6.482396e-06 1.709710e-03 1.621952e-01
dMinP <- min( adPs )
dMinP
## [1] 5.524548e-16
iMinP <- match( dMinP, adPs )
iMinP
## [1] 156
strMostDifferentialGeneInGlucose <- dfrmExpression[["NAME"]][iMinP]
strMostDifferentialGeneInGlucose
## [1] "HXT7"
ggplot( ) + geom_line( aes( 1:ncol( dfrmData ), as.numeric(dfrmData[iMinP,])) ) +
scale_x_continuous( name = "", breaks = 1:ncol( dfrmData ), labels = colnames( dfrmData ) ) +
theme( axis.text.x = element_text( angle = 45, hjust = 1 ) ) +
ylab( strMostDifferentialGeneInGlucose )
ggplot( ) + geom_boxplot( aes( dfrmMetadata[["Nutrient"]], as.numeric(dfrmData[iMinP,]) ) )
As you might have noticed above, R does something funny (and often irritating) to factors (categorically valued strings) by default: it alphabetizes them. That’s what causes the plot above to be reordered. We can use a small R trick to fix this in our plot; unique
is a function that returns unique values in a list in the order that they first occur:
unique( dfrmMetadata[["Nutrient"]] )
## [1] "Glucose" "Ammonium" "Phosphate" "Sulfate" "Leucine" "Uracil"
So we’ll replace our Nutrient
factor with one containing identical values, but more intuitively reordered levels:
kNutrients <- factor( dfrmMetadata[["Nutrient"]], levels = unique( dfrmMetadata[["Nutrient"]] ) )
kNutrients
## [1] Glucose Glucose Glucose Glucose Glucose Glucose Ammonium
## [8] Ammonium Ammonium Ammonium Ammonium Ammonium Phosphate Phosphate
## [15] Phosphate Phosphate Phosphate Phosphate Sulfate Sulfate Sulfate
## [22] Sulfate Sulfate Sulfate Leucine Leucine Leucine Leucine
## [29] Leucine Leucine Uracil Uracil Uracil Uracil Uracil
## [36] Uracil
## Levels: Glucose Ammonium Phosphate Sulfate Leucine Uracil
And now plot it again (otherwise as above):
ggplot( ) + geom_boxplot( aes( kNutrients, as.numeric(dfrmData[iMinP,]),
color = kNutrients ) ) +
xlab( "" ) +
ylab( strMostDifferentialGeneInGlucose ) +
theme( legend.position = "n" )
Bonus points! Why did R turn up a different most-differential-in-glucose gene than Python did? Is this also a reasonable result biologically?