The MCDMiner is a bioinformatic tool that allows researchers to mine the proteome for Motif Cluster Domains (MCD), a unique type of domain identified not by sequence or structural similarities but by the concentration of multiple copies of the same motif in a short stretch of the protein. Currently, the S/T-Q Cluster Domain (SCD) remains the only well-described example of MCD in the literature. Originally defined as the presence of at least 3 S/T-Q motifs in a protein section equal to or shorter than 100 amino acids, the SCD domain is overrepresented in ATM and ATR targets, two kinases that regulate DNA Damage Response (DDR) and phosphorylate their targets at S/T-Q motifs. Showcasing its functional importance, SCD domains are not only much more abundant than expected by chance, but they also distribute unevenly across the proteome as they concentrate on DDR-related pathways.
To uncover potential novel MCDs, the MCDMiner empowers researchers to search for concentrations of user-defined motifs in specified stretches of proteins. The outcome includes visualization of each MCD in the amino acid sequence and the crystal structure of each protein detected in the search. Researchers can also use an ontology feature built into this tool to identify pathways where putative MCDs may be overrepresented, an important feature to characterize potentially novel MCDs. Finally, a definition optimization tool helps users find the optimal MCD definition by plotting enrichment and p-values in a given pathway as the motif repeats or the length of the amino acid stretch considered changes.
Visit the official web site for more details [coming soon]
Source code available on GitHub
Bioaqueduct is a distributed pipeline for data processing written in Python. Although the first release is meant to process biological data, specifically to identify viruses carried by mosquitoes by efficiently analyzing millions of DNA and RNA sequences, the goal is to generalize it for processing any dataset in a distributed environment.
Visit the official web site for more details [coming soon]
Source code available on GitHub
From 2014 to 2018 in my role of Research Scientist with the Department of Computer Science at Rice University I was part of the team that developed PlinyCompute; a platform for high-performance distributed tool and library development written in C++.
I collaborated with the PlinyCompute team, working on a Code Search project. Presently we are using PlinyCompute for implementing some of the algorithms and analytics projects that students in my Lab at UST are working on. I also use PlinyCompute as a pedagogical resource to teach and experiment with OS concepts such as processes, memory management, multi-threading, and distributed computation among others.
Visit the official web site for more details http://plinycompute.rice.edu.
Source code available on GitHub