A Fruitful Symbiosis Between an Undergrad in Computer Science and a Graduate Student in Genetics and Genomics
Question: What was the inspiration for EffectorO?

Kelsey: When I started my Ph.D. program on oomycete effector genomics in 2013, it seemed like everyone was really only using the same motifs (RXLR or LFLAK) to predict effectors. But as I dove deeper into the genome of the lettuce downy mildew pathogen, I found that there were some real effectors that did not have these motifs and published a paper on my findings (Wood et al., 2020). That got me thinking of alternative ways to predict effectors.

The first way to predict effectors that I thought of was to leverage lineage specificity, which is a characteristic of many effectors from species with narrow host ranges, to find effectors. I realized this would be pretty simple in principle using BLAST. However, the downside would be that there are other lineage-specific genes besides effectors and a lot of misannotated junk from genomes would probably be picked up. It also would depend a lot on what other organisms are sequenced for comparison.
In my search for a more accurate way to predict effectors, I found a paper on EffectorP 2.0 (Sperschneider et al., 2018) that used machine learning to predict effectors from fungi. I tried to use it on my oomycete genome, but it didn’t work and I was sad.
That is, until I met Munir.
Question: How did you two find each other?
Munir: During my undergraduate studies, I was eager to apply what I’ve learned in computer science and bioinformatics classes to help solve research problems. I saw an ad for a bioinformatics intern in the Michelmore lab and applied by email.
Kelsey: Funny story, I posted a few job ads for undergraduate interns for summer 2017 on the Michelmore lab website, and we didn’t take them down even after the job openings had expired. Munir saw the (old) ad for the bioinformatics intern and emailed me right around the time I was wanting to develop a machine-learning pipeline for oomycete effectors, and I interviewed and hired him on the spot. Moral of the story: never update your lab website. And, if you are an undergrad, don’t wait for an official job ad to reach out to labs for internships!
Question: What did you think bioinformatics research would be like versus what it was actually like?
Munir: Bioinformatics research is much more data wrangling than I thought! I think this is also true for the entire field of data science—it typically takes a lot more work to get the data ready than it does to build models and perform analyses!
Kelsey: What Munir said. And, I’m always surprised at how much you have to reperform the same or similar analysis until it’s “done.” Using R for most of the graphics was a life saver because if something ended up changing we could easily rerun the scripts with the new data.
Question: What lessons did you learn during the preparation of this manuscript?
Munir: Write clean code in a reusable manner the first time around. And if you don’t, definitely get to it the second time you use the same code! Research analyses typically get rerun multiple times, as you’re constantly pulling the latest datasets that get released in the research world or tweaking some parameters to compare different models/hypotheses.
Constantly write lab notes and code documentation, since you will often be looking back at analyses you performed and code you wrote several weeks, months, or years ago.
Kelsey: I learned how valuable reviewer feedback could be, even (or especially?) criticism. One reviewer in particular had a lot of excellent critiques that forced me to rewrite several sections, which resulted in a much clearer argument for the manuscript.
Question: How did reviewers help to improve the manuscript?
Kelsey: One very useful suggestion was to perform domain prediction on our effector candidates using Pfam, which I didn’t think would be a good idea because most effector domains are not well studied. This is what the results ended up showing, but the domains that were found were mostly known effector domains, which helped support the conclusions of the paper. Also, there were many reverse-transcription–related domains that I think also support the conclusions, as it is known that effectors live in transposon-rich regions of the genome. The ones with RT-related domains are probably pseudogenes though, so it is another criteria that one could use to refine the list of candidates.
One reviewer also asked if BLE01, which was the Bremia lactucae effector that we predicted with EffectorO and that we found to be an Avr candidate, was also predicted by EffectorP 3.0 (Sperschneider and Dodds, 2021). We found that it was not predicted by their pipeline. This was important because EffectorP 3.0 came out while our paper was under review. However, this showed that the two machine-learning algorithms predict distinct (but overlapping) sets of proteins and, thus, can be used together for prediction of oomycete effectors. Thank you so much Reviewer #2!
Question: Why was the collaboration between you two especially fruitful?
Kelsey: I brought the biology knowledge, and Munir brought the coding skills. I learned Unix, Perl, and R scripting during grad school, but Munir knew how to code really well in Python, which was essential for this project. He was able to write code very quickly and elegantly and came up with the various evaluation metrics used for the machine-learning models. He also spent a long time working on a convolutional neural network model that was more computationally complex, in the end giving us similar results to the simpler Random Forest classifier we ended up using.
Munir: One of the first things I learned while collaborating with Kelsey was how to effectively digest research papers. My first exercise at the lab was summarizing a collection of research articles relevant to our projects, which was immensely invaluable in teaching me how to look in the right places for information. Kelsey’s research background also played a significant role in coming up with fresh hypotheses and methods to test them, and her plant genetics background allowed us to make better sense of the large amount of data we had.
Also, at the end of the EffectorO project, I got the opportunity to do my first PCR! This was really fun for me to do, as computer scientists and bioinformaticians don’t always have much exposure to the wet lab.
Question: What are you excited to see in future MPMI research?
Kelsey: I’m excited to see how advances in protein structure prediction will expand our knowledge of effectors with uncharacterized protein domains. I’m also excited about high-throughput assays for testing predicted effectors.
Munir: Making machine learning more accessible! I think it would be great to standardize self-service model-building interfaces, since training sets are ever expanding. This would be a way to further improve classifiers like EffectorO whenever new effectors are discovered.
Learn more about Munir and Kelsey in their InterConnections article.
References
Sperschneider, J., and Dodds, P. N. 2022. EffectorP 3.0: Prediction of apoplastic and cytoplasmic effectors in fungi and oomycetes. Mol. Plant-Microbe Interact. 35:146-156.
Sperschneider, J., Dodds, P. N., Gardiner, D. M., Singh, K. B., and Taylor, J. M. 2018. Improved prediction of fungal effector proteins from secretomes with EffectorP 2.0. Mol. Plant Pathol. 19:2094-2110.
Wood, K., Nur, M., Gil, J., Fletcher, K., Lakeman, K., et al. 2020. Effector prediction and characterization in the oomycete pathogen Bremia lactucae reveal host-recognized WY domain proteins that lack the canonical RXLR motif. PLOS Pathog. 10:e1009012.