My interest in statistical programming started late in undergrad, with a computational biology class. There, I began using Linux and standard tools for molecular biology, sparking my journey in programming and computer science. After my undergraduate degree, I started working in a research lab, a role I held for two years. Here, my proficiency in statistical programming, R, and bash began to deepen. And I applied my understanding of molecular biology and experimental designs. On my way to work, I often listened to podcasts covering the fundamentals of Machine Learning and cloud services. This experience excited me to pursue my goal and bring me closer to my desired career as a data scientist.
I pursued a Master’s in Biostatistics to develop as a programmer and statistician. Through classes and my time as a graduate research assistant, I learned how to leverage R for data analysis and data exploration. Also, I became proficient in data visualization by utilizing ggplot2 and R markdown to create graphics and reports. My skill set includes power and sample size simulations, bootstrapping, and Monte Carlo. Which I have been able to apply to genome-wide association study analyses (GWAS) with 650,000 SNPs per individual and 500,000 individuals, utilizing the Ukbiobank dataset.
In my prior role as a Bioinformatician, I briefly worked on quantitative analysis projects for their research group. I was able to hone my skills as a programmer during my time there, utilizing my understanding of Linux and R. Here, I expanded my expertise in Linux tools awk and sed to manipulate and extract data. Because the data sets were too cumbersome to be practically loaded into R. I had to pick up skills and learn quickly, adapting as needed, and would be willing to do so again.