BDB-Lab Autumn 2022 Update
GMSC: what? why? and by whom?
Upcoming travels. Svetlana Ugarcina Perovic and Luis Pedro Coelho will be in Gothenburg (Sweden) from the 24th of September to attend EDAR6 and meet the rest of the EMBARK consortium. Luis will present at the GOTBIN Seminar Series on the 27th. Please feel free to get in touch if you will be in Gothenburg!
Autumn 2022 Focus: GMSC by Yiqian Duan
What is the GMSC, what does the acronym stand for? GMSC is the global microbial smORFs catalog created by us in collaboration with the Bork group at EMBL. smORFs are small open reading frames containing fewer than 100 codons, which can potentially encode small proteins. However, they are often ignored because of the limitations of computational and experimental methods. Here we annotated ~1 billion non-redundant smORFs from ~63k metagenomes from GMGC v2 and ~87k isolated microbial genomes from Progenomes v2. After clustering, these non-redundant smORFs were grouped into 287 million families at 90% sequence identity and 232 million families at 50% sequence identity. As a global microbial smORFs resource, GMSC can support further research into the presence, prevalence, distribution, evolution and the function of smORFs at the global scale.
What can be done with the GMSC now? The catalog collects smORFs from metagenomes and isolated microbial genomes to solve the problem of incomplete annotation of microbial genomes. We have done a lot of in silico quality control work to make them more reliable, which will result in a high-quality smORFs subset. Subsequently, we can further study the evolutionary conservation and functional characterization of smORFs among different species and habitats worldwide. We are also developing GMSC-mapper, which is a mapping tool for smORFs, to make it easier for users to explore their own sequences based on our GMSC.
Where do you plan to take the project in the future? I think there are many potential possibilities to explore the field of small proteins. For example, due to the structural characteristics of small proteins, they generally contain only one domain and are potential molecules for studying protein folding. Some small proteins are predicted to be antibacterial peptides or contain transmembrane properties, which can provide potential insights for disease treatment. In addition, I hope the analysis in the future can answer a series of biological questions about small proteins, such as how wide is the distribution range of smORF and their small proteins in all microbial genomes, whether it varies with the lifestyles or habitats of prokaryotes, and what are the differences and similarities in evolution and functions between smORFs and full-length ORFs.
Can you also tell us a bit about yourself: what was your path to get here? I got my bachelor's degree in bioinformatics from Huazhong University of Science and Technology, China in 2020. Now I’m a PhD student at BDB-Lab in Fudan University, China. My research interest is microbiology, especially metagenomics. At present, my research is related to small proteins in the microbiome.
What are your future (scientific) plans? During my PhD time, I plan to use bioinformatics, statistics and other methods to understand the smORFs and the small proteins they translate in more detail. On the one hand, it's because of curiosity. On the other hand, this is a very new field. There are many interesting things to learn and explore. I hope I can do something to promote the development of this field. At the same time, I am also interested in other fields of microbiome, such as research on antimicrobial resistance genes, which I think has very practical application significance. Maybe we could combine them and find something new in the future.
Where can people find you and get in touch?
My email is email@example.com. If you are interested in any of my projects, feel free to contact me.
Subscribe for free new updates. We send out four newsletters per year (no more, no less).
Other BDB-Lab Updates
People. This Summer we had two remote interns, Jelena Somborski and Breno Lívio. With the conclusion of their internship, soon we will have some new blog posts about their work with us.
Tools. We released NGLess v1.5.0. See the release notes for a full list of new features, but the major new feature is the ability to use YAML to specify sample organization. We also updated the online NGLess script builder.
We also released SemiBin 1.1.0. We fixed several issues, such as an issue with atomic writes in certain network filesystems, supporting .cram input and depth files from Metabat2.
Manuscripts. Lab alumnus Hui Chong published EXPERT: transfer learning-enabled context-aware microbial community classification in Briefings in Bioinformatics.
The manuscript Drivers and determinants of strain dynamics following fecal microbiota transplantation was published in Nature Medicine.
Recent travels. Anna Cuscó attended the 1st Applied HoloGenomics Conference in Bilbao (Spain) from the 13th to the 15th of September. She presented a poster on pet gut microbiome: “Exploring pet gut microbiome within a large-scale investigation on animal gut metagenomes”. Overall, she enjoyed hearing experts talking about many different animal microbiomes beyond humans.
Luis attended ECCB2022 in Sitges and presented at the Quest for Orthologs workshop.
Svetlana was at FEMS in Belgrade, presenting on “Challenges in metagenomic annotation of antibiotic resistome.”