1- Title: Differential Expression Analysis of Breast Cancer Dataset (GSE25055) Using limma in R: Comparing two classifications of "Molecular Subtypes" and "Grading System" to Differentiate Aggressiveness Levels in Breast Cancer
Number of Functions: 3 functions have been written.
1. DownloadGEO()
Purpose: Downloads a GEO dataset based on the specified accession number (GSEname), extracts the expression matrix, phenotype data, and annotation, and saves them as files (matrix.csv, phenotype.tsv, annot.tsv).
Key Steps:
- Downloads the dataset using GEOquery.
- Extracts and processes the matrix, phenotype, and annotation.
- Writes these components to separate files.
2. DEGanalysis()
Purpose: Performs differential expression analysis using the limma package, comparing either subtypes or grades based on user input.
Key Steps:
- Loads the pre-saved matrix, phenotype, and annotation files.
- Performs log2 transformation on the matrix if needed.
- Groups samples by subtypes or grades and fits a linear model using the limma package.
- Outputs a table of differentially expressed genes (DEGs) with annotation and saves it as DEGs.tsv.
3. Makevolcano()
Purpose: Generates an interactive volcano plot using ggplot2 and plotly to visualize the results of the differential expression analysis.
Key Steps:
- Reads the DEGs from DEGs.tsv.
- Assigns colors to upregulated and downregulated genes.
- Creates an interactive volcano plot with tooltips for gene information.
- Saves the top up- and down-regulated genes as top.csv and displays the plot.
Packages Used:
- GEOquery: For downloading and processing GEO datasets.
- readr: For reading and writing tabular data files (e.g., CSV, TSV).
- dplyr: For data manipulation tasks such as filtering, mutating, and grouping.
- limma: For differential expression analysis using linear modeling techniques.
- ggplot2: For creating publication-quality visualizations such as volcano plots.
- plotly: For converting static ggplot2 charts into interactive visualizations.
- tidyverse: A meta-package that includes readr, dplyr, ggplot2, and others for streamlined data science workflows.