The Project Summary

Biomarker and Pathway Discovery for Breast Cancer Aggressiveness

This project aimed to classify aggressiveness in breast cancer patients using gene expression data, focusing on discovering biomarkers and pathways associated with tumor aggressiveness. This project investigated two classification approaches—grading system (Grade 1 vs. Grade 3 tumors) and molecular subtypes (Luminal A/Normal-like (non-aggressive subtypes) vs. Luminal B/Basal-like/HER2 (aggressive subtypes ))—using gene expression data to discover biomarkers and pathways associated with aggressiveness. The study was conducted in three phases: Phase 1 compared the two approaches and prioritized the grading system due to its larger dataset. Phase 2 applied the grading system across four datasets to identify 77 common DEGs as molecular signatures of breast cancer aggressiveness. Phase 3 integrated these datasets through meta-analysis, uncovering a robust set of DEGs and associated pathways, laying the groundwork for personalized diagnostic and therapeutic strategies.

To conduct this analysis, data was collected from PubMed and GEO databases, narrowing down to four large-scale microarray datasets: GSE7390, GSE11121, GSE25055, and GSE25065. These datasets, encompassing a total of 468 untreated patient samples, were selected based on the availability of gene expression data and information on tumor grading and molecular subtypes.

Phase 1: Evaluating Classification Approaches

Objective: To compare two classification approaches—grading system (Grade 1 vs. Grade 3) and molecular subtypes (Luminal A/Normal-like vs. Luminal B/Basal-like/HER2)—to identify differentially expressed genes (DEGs) associated with aggressiveness.

Data: The GSE25055 dataset, comprising pre-treatment tumor biopsies, was used, providing information on both grading systems and molecular subtypes.

Methods: Differential expression analysis was performed using the limma package in R, with volcano plots visualizing significant gene expression changes.

Results: Both classification systems revealed significant DEGs. However, due to the larger number of samples with grading information, the grading system was prioritized for subsequent phases.

Conclusion: This phase highlighted the effectiveness of both classification methods and set the stage for prioritizing grade-based classification in later analyses.

Phase 2: Multi-Dataset DEG Analysis

Objective: To identify DEGs using the grading system (Grade 1 vs. Grade 3) across four large-scale microarray datasets (GSE25055, GSE7390, GSE11121, GSE25065).

Data: A total of 468 untreated patient samples were analyzed, focusing on datasets that included grading information.

Methods: Differential expression analysis was performed individually on each dataset using the limma package. DEGs were analyzed for fold change (FC) and false discovery rate (FDR).

Results: A common set of 77 DEGs was identified across datasets, representing molecular signatures indicative of disease severity.

Conclusion: This phase provided robust evidence of the molecular differences associated with breast cancer aggressiveness, validating the findings across multiple datasets.

Phase 3: Meta-Analysis for Robust Biomarker Discovery

Objective: To integrate findings from Phase 2 using a meta-analysis approach to enhance the reliability of identified biomarkers.

Data Integration: Four microarray datasets (GSE25055, GSE7390, GSE11121, GSE25065) were preprocessed for normalization, quality control, and batch effect correction.

Methods: A random-effects model (REM) meta-analysis was performed using the GeneMeta package, combining DEG results from individual datasets. Functional enrichment analysis was conducted using tools such as msigdbr and PANTHER to explore biological pathways.

Results: Meta-analysis identified a larger set of DEGs with significant Z scores and FDR values, reducing dataset-specific biases. Functional analysis highlighted key pathways and biological processes linked to aggressiveness, including gene ontology (GO), KEGG pathways, and protein-protein interaction (PPI) networks.

Conclusion: This phase provided a more comprehensive understanding of gene expression changes, identifying robust biomarkers with potential applications in personalized diagnostics and targeted therapies.

Overall Contribution

This three-phase project demonstrated the utility of integrating grading systems and multi-dataset analyses for discovering biomarkers and pathways associated with breast cancer aggressiveness. The inclusion of 468 samples and the prioritization of grade-based classification ensured a robust foundation for biomarker discovery. Future research will focus on subtype-based classifications to provide complementary insights. The findings lay the groundwork for advanced diagnostic tools and therapeutic strategies, with potential applications in personalized medicine.