Dr. Reema Singh

This website is maintained by ReemaSingh

Welcome to Academic webpage of Dr. Reema Singh

Work ExperiencePublicationsSoftwaresTalksPostersTeachingAwardsContact

About Me

I am a computational Biologist/Bioinformatician with more than 14 years of research experience.

Research Work Summary

2022 - Present: In my current position, I am applying my bioinformatics skills to understand host response to viral infections.

2017 - 2022: Whole-genome sequencing (WGS) based approaches have been widely used to track the outbreak, transmission network and antimicrobial resistance (AMR) pattern for the sexually transmitted infection-causing pathogens such as Neisseria gonorrhoeae and Chlamydia trachomatis. I have used WGS data 1) to characterize a novel beta-lactamase-producing plasmid in Neisseria gonorrhoeae, and 2) to develop a novel one-stop computational pipeline, named Gen2Epi, that assembles short reads into full scaffolds and automatically assigns molecular epidemiological and AMR information to the assembled genomes. Gen2Epi has been validated on big WGS datasets (n=1473), previously used for genomic epidemiological surveillance in Canada, NewZealand, and different European countries. I have also exploited WGS data to perform the comparative study of Neisseria gonorrhoeae outbreak strains to track the epidemics as well as data analysis of Neisseria gonorrhoeae and Chlamydia trachomatis clinical isolates from Saskatchewan. In an another project, I have used publicly available WGS datasets to study signature erosion in Neisseria gonorrhoeae diagnostic primers. The other collaboratory projects are: I have performed the 1) Single Nucleotide Polymorphism (SNP) analysis of Ferret SARS-COV-2 (severe acute respiratory syndrome coronavirus 2) isolates using Nanopore data, 2) WGS data analysis of Escherichia coli isolates to track colibacillosis in Saskatchewan broiler flocks and 3) WGS analysis of Mycobacterium tuberculosis clinical isolates to distinguish the reinfection from novel infections in Canada.

2013 - 2016: The wealth of information on transcriptomics studies (i.e. RNA-seq) in the case of social amoebas allowed me to generate de novo transcriptome assemblies for four Dictyostelid species i.e. Dictyostelium discoideum (Ddis), Polysphondylium pallidum (Ppal), Dictyostelium fasciculatum (Dfas) and Dictyostelium lacteum (Dlac). My results showed how the incorporation of the new de novo transcriptome assemblies with existing annotations and gene models leads to updates in the majority of the gene annotations including the identification of novel genes and transcripts. In an alternative project, to identify stalk genes that are regulated by c-di-GMP, I identified the differentially expressed genes in RNA-seq datasets of wild-type and dgca null mutant cells in Ddis. The selected candidate genes (with at least 10-fold down-regulation in dgca- cell and a read count of >50 in the control cell) were used as a marker to investigate the signal transduction pathways of c- di-GMP. Finally, I have performed the consensus Dictyostelia phylogeny of 40 candidate proteins (>200 amino acids) selected from 14 primary metabolic pathways and 12 proteins with a diverse role in basic cell biology of Ddis. I have identified and extracted putative orthologs of these 52 candidate proteins from six newly sequenced (i.e. Acytostelium ellipticum, Acytostelium leptosomum, Dictyostelium deminutivum, Polysphondylium violaceum, Dictyostelium polycephalum, and Dictyostelium polycarpum) and six existing Dictyostelia (i.e. Dictyostelium purpureum, Dlac, Ppal, Dfas, Acytostelium subglobosum, and Physarum polycephalum) along with two unicellular amoebozoa (i.e. Acanthamoeba castellani and Entamoeba histolytica), and generated individual and concatenated phylogeny after curating the corresponding gene models manually.

2006 - 2012: Access to large-scale genomic datasets has revolutionized the genomics field and facilitate the emergence of various databases and tools to advance our understanding of various human pathogens. I have exploited this big information (i.e. publically available bacterial genomes) by collecting and identifying ß-lactamase antimicrobial resistance genes (AMR) along with their quantitative resistance index and organized this information in the form of a database, named Dlact. This database contains 2020 ß-lactamases from 457 bacterial strains which were further classified using graph-based clustering of best bidirectional hits to identify the group-specific signature of ß-lactamases.

PhD thesis [2009-2013]: My work focused specifically on developing and deploying clustering and machine learning methods for high- throughput data analysis i.e. development of 1) an integrated gene-expression analysis (meta-analysis) protocol to identify process and pathways associated with Mycobacterium tuberculosis infection, and 2) an integrated generic R package, named HTDA (https://r-forge.r-project.org/projects/htda/), for the analysis of high-throughput (transcriptomics, proteomics, and metabolomics) data. The integrated high-throughput data analysis is a challenging problem for two reasons. First, finding a common identifier to integrate data from different studies is intrinsically complicated. Second, data heterogeneity and different analytic assumptions could lead to inaccurate results. My PhD thesis provides an improved gene-expression meta-analysis protocol (at the level of annotation, data processing, and functional analysis) for data integration while preserving the accuracy of the original datasets.