BDB-Lab January 2026 Updates
New Year!
Upcoming travels
4-6 March. Juan will attend The Nordic AMR Conference 2026 (Helsinki).
11-12 March. Juan will attend Artificial intelligence and data-driven approaches for addressing the global challenge of antibiotic resistance (Uppsala).
Focus of the Quarter: Metagenomics for Source Attribution of Foodborne Pathogens by Anil Pokhrel
To begin, who are you? What are you working on and what motivated you?
I am Anil Pokhrel, a PhD candidate at the Big Data Biology Lab, Queensland University of Technology (QUT). I am working on developing metagenomic workflows for source attribution of foodborne pathogens.
My training started in microbiology in Nepal, followed by a Master of Science in Medical Microbiology. For my master’s thesis, I worked on the epidemiology of scrub typhus, a mite-borne febrile illness with a clinical presentation similar to typhoid, in Nepal (Pokhrel et al., 2021). Before starting my PhD, I worked in a clinical microbiology setting at a referral hospital in Kathmandu, where I gained hands-on experience in routine culture workflows, biochemical testing, and molecular diagnostics, including GeneXpert for tuberculosis and antibiotic resistance in common clinical isolates.
Over time, I realised that many real-world microbiology problems are not limited by whether we can detect a microbe, but by whether we can detect the right microbe quickly enough, and with enough context to act. In outbreak situations, laboratory workflows can be shaped by selective culture steps and targeted tests, which are valuable but also constrained. Genomics enables analysis at the scale of whole genomes, while metagenomics enables analysis of complex samples more broadly. For me, the motivation was practical: to learn methods that can support faster, more complete evidence generation for surveillance and outbreak response, helping strengthen public health.
Moving into genomics and bioinformatics has been a major transition for me. At the beginning, even basic command-line concepts and high-performance computing (HPC) systems were unfamiliar, but this shift aligns with my long-term goal of combining wet-lab background with advanced genomic and bioinformatic skills for public health applications.
When you say “source attribution”, what is the real-world problem you are trying to solve?
Foodborne disease remains a major global public health burden, with very large numbers of illnesses every year, substantial mortality, and economic losses. A real challenge in foodborne outbreaks is not just identifying the causative pathogens but also determining their most probable source. Source attribution is the assignment of a pathogen to its most likely source, enabling targeted control measures before the outbreak spreads.
Many approaches still used in routine investigations and source tracking are what I refer to as conventional or legacy methods. They are usually targeted to some known pathogen, and they can work well when the pathogen is successfully isolated. However, there is a critical vulnerability; if the outbreak-causing organism is missed during isolation or selection, it may never enter the pipeline for identification. It becomes more plausible when samples are complex, contamination is mixed, or the target organism is present at low abundance.
So why do you think metagenomics is needed, if conventional methods are already in place?
Metagenomics is powerful because it is untargeted. It is less dependent on “guessing correctly” at the isolation or screening stage. In practical terms, metagenomics means sequencing the DNA present in a sample without first selecting a specific organism to culture. Because it does not depend on a narrow set of targets, it can reduce the chance of missing the causative pathogen simply because it was not recovered during isolation.
A second advantage is the efficiency of scope. Metagenomics can detect multiple pathogens within a single workflow, which is important when contamination is mixed.
The third advantage is depth of information. A metagenomics-based workflow can, in principle, support evidence beyond detection alone, including identifying antimicrobial resistance genes, virulence-associated genes, and single nucleotide polymorphisms (SNPs) that inform phylogeny and relatedness. Conventional methods can sometimes provide parts of this picture, but often only through a combination of separate approaches. Metagenomics can bring these layers together, which has implications for both time and completeness.
Finally, turnaround time matters for public health decision making. Earlier and better-supported decisions can enable faster control of outbreaks, reducing harm to health and helping limit downstream economic losses.
How are you approaching this problem?
We approach it through two connected components that address two practical attribution questions.
First, commodity-level attribution or the “which food?” question:
We are developing and evaluating shotgun metagenomic workflows that extract interpretable results directly from complex samples. The goal is to produce outputs that can be compared across different food samples, enabling us to identify the most likely commodity/source and support timely outbreak control decisions.
Second is geographical source attribution or the “which region?” question:
In parallel, we are implementing and adapting machine learning approaches for the geographical attribution of Salmonella enterica genomes. This adds a second layer of inference. Genomic evidence can indicate whether a strain is more consistent with lineages commonly observed in particular regions. This, in turn, helps interpret whether an outbreak is more likely linked to local food sources, imported supply chains, or travel-associated cases in which exposure occurs elsewhere but symptoms develop after arrival.
The two parts become most powerful when they are combined. When these two parts are combined, the project links “which food is most likely responsible” for an outbreak with “which region that food may have come from.” This is relevant in Australia, where consumer demand is shaped by a highly diverse population and where imported foods form a substantial part of the supply. A method that supports both commodity attribution and likely origin inference has value not only for outbreak response but also for reducing losses across agrifood supply chains.
Figure created using icons and images imported from BioArt, BioIcons, and WikiMedia Commons
Where do you see the main challenge for metagenomics in routine food safety use?
The bottleneck is not only sequencing. The bigger challenge is making metagenomics routine and reliable within real-world public health and food testing systems. That starts with practical barriers: building and maintaining the required infrastructure like laboratory workflows, sequencing capacity, computing and data storage, and covering the ongoing costs for consumables, maintenance, and quality assurance. Even in economically strong settings, these investments are non-trivial.
A second challenge is workforce capability. Routine implementation needs staff who can run wet-lab workflows and also manage bioinformatics, data governance, and interpretation. This skills mix is still emerging in many settings, and training takes time.
Finally, there is the challenge of interpretation: metagenomics produces rich but complex outputs, and decision-makers need results that are clear, comparable, and defensible under time pressure. That requires validated pipelines, standardised reporting, and transparent handling of uncertainty. In my view, the pathway to routine use is not just “more sequencing”, but building systems that integrate infrastructure, people, and robust interpretation into a workflow that can be trusted.
Where might your work make the biggest difference in the longer term?
Our goal is to make metagenomic source attribution more practical for routine use by improving how evidence is generated, summarised, and communicated. If metagenomics can be made more accessible, reproducible, and decision-oriented, it can strengthen day-to-day public health surveillance and support more timely and evidence-based risk management during outbreaks and even across the food industry.
Ways to connect with you?
I can be reached on LinkedIn or by email.
People
Catarina Loureiro joined BDB lab as a Postdoctoral Researcher, to work on an NWO-Rubicon funded project in collaboration with the Medema Lab (Wageningen University and Research). Her research is focused on developing computational methods to improve strain-level resolution in genome-resolved metagenomic analysis, and applying these methods to identify and quantify strain level heterogeneity. Specifically, she will focus on biosynthetic gene clusters (BGCs) which are responsible for the production of specialised metabolites and are present in structural variants, subject to strain level variation, to link their function and ecological context. Welcome Catarina!
Grants and Awards
Juan received the Translational Research Institute (TRI) Travel Award to support his participation in the conferences “The Nordic AMR Conference 2026” and “Artificial intelligence and data-driven approaches for addressing the global challenge of antibiotic resistance,” held in Finland and Sweden, respectively.
We are happy to be a part of the newly funded ARC Centre of Excellence for Advanced Peptide and Protein Engineering!
Conferences and Symposia
Anil delivered a flash-talk at MiM2025 organised by ASM-Queensland at Gold Coast (22 Nov 2025) and also at the TRI student symposium (31 Oct 2025).
Alexandre presented a poster at TRI student symposium (31 Oct 2025).
Juan presented a poster at the Australasian Bioinformatics and Computational Biology Society (ABACBS) meeting (24-28 Nov 2025).
Luis attended the SMBE Satellite Meeting (Evolutionary Biochemistry of Insect Antimicrobial Peptides) in Houston, Texas (USA) and delivered a presentation (13-15 Oct 2025).
Luis presented a talk at Queensland Immunology Networking Symposium (QINS25, 23-24 Oct 2025).
Luis attended the 19th International Conference on Data and Text Mining in Biomedical Informatics (DTMBIO 2025, 15-18 Dec) in Muju, Republic of Korea.
Publications
“proGenomes4: providing 2 million accurately and consistently annotated high-quality prokaryotic genomes” co-authored by Alexandre and Luis is published in Nucleic Acids Research. This provides a large-scale resource of nearly two million high-quality bacterial and archaeal genomes. The resource is accessible via the proGenomes website (proGenomes Database v4) and supports bulk retrieval using a dedicated command-line interface.
“Marine-Inspired Antimicrobial Peptides Disrupt Gene Expression at the DNA Level” co-authored by Juan is published in ACS Infectious Diseases. The study, using tandem mass tag (TMT)-based quantitative proteomics, revealed extensive proteome remodeling, with 175 and 120 differentially expressed proteins (DEPs) after treatment with L3 and L3-K peptides, respectively. The results underscore peptides’ potential as scaffolds for next-generation antimicrobial peptides with DNA-binding and non-membrane-lytic activity.
Software and Tools
Macrel v1.6.0 released. This release changes model storage to improve portability and compatibility across systems.
progenomes-cli v0.3.0 released. Updated release supporting retrieval of proGenomes4 data via the command line or as a Python library.
SemiBin v2.2.1 update addresses compatibility with newer versions of igraph.
Visiting Speakers
Prof. Zamin Iqbal (University of Bath) visited us and delivered a seminar titled “Using historical samples and new methods to study the evolution of plasmids” on 1 December 2025 at the TRI.
Prof. Ami Bhatt from Stanford University visited us and delivered a seminar titled “Deciphering microbe‑host communication” on 17 November 2025 at the TRI.
A/Prof. Buck S. Samuel from Baylor College of Medicine delivered a seminar titled “Cultivating relationships: genetic regulation of gut microbiome form and function” on 24 September 2025 at the TRI.





The untargeted advantage of metagenomics is huge when conventional culture methods keep missing low-abundance organisms in complex samples. Linking commodity attribution with geographical sourcing is particuarly clever for places like Australia where imported foods make up so much of supply chains. Ran into similar bottlenecks before where the isolaton step basically predetermined what could even be detected downstream. That workforce capability gap around bioinformatics interpretation is probly the hardest piece to solve at scale.