Deciphering dark matter in human gut metagenome using advanced sequencing technology and machine learning algorithms
Project Description
The human gastrointestinal tract harbours the highest microbial density in the body, with thousands of species coexisting in a complex and ever-changing community. These microbes, often called the "second genome," have a profound impact on our health, diseases, and other physiological processes. Understanding how the gut microbiome changes in different conditions is critical, but most gut microbes cannot be readily isolated and cultured in a lab. Instead, scientists use metagenomic sequencing, which analyses the mixed DNA of gut microbes directly from human faecal samples. This approach allows researchers to assemble microbial genomes, known as metagenome-assembled genomes (MAGs). In our project, we used advanced sequencing technologies and designed machine learning tools to better study the gut microbiome (https://www.youtube.com/watch?v=voGnbAHkuDU&t=1s). Collaborating with industry partners, we identified linked-read sequencing as a cost-effective method for analysing gut microbes (https://www.youtube.com/watch?v=j6Yqt8viM8A&pp=ygUVbGlua2VkLXJlYWRzIGx1IHpoYW5n). This technique uses barcodes to group short-reads from the same long DNA fragments, making genome assembly more efficient. However, computational tools specifically designed to perform linked-read metagenome assembly were lacking. To fill this gap, we developed Pangaea, a tool that uses deep learning to group and assemble these co-barcode linked-reads (https://www.youtube.com/watch?v=yGaIZryvZwg&t=474s). It could improve the accuracy of microbial genome assembly, particularly for low-abundance microbes. To further refine the contigs, we developed DeepMAS, a tool that detects and corrects misassembly by transforming sequencing alignment data into images and analysing them with computer vision techniques. Because traditional tools often overlook short contigs in contig binning, we introduced DeepMetaBin and METAMVGL, which use advanced graph-based methods to link both short and long contigs to increase MAG completeness. To eliminate contamination in these MAGs, we developed Deepurify, which uses a multi-modal deep learning model to ensure only contigs from the same species are included. All these tools are integrated into LRTK, a versatile toolkit for linked-read metagenomic analysis. Based on these advanced tools, we analysed over 10,000 human gut microbiome samples and generated high-quality reference genomes. This work revealed significant differences in gut microbial genomic variations between Han Chinese and Western populations, which could influence diet, disease, and treatment responses. Our project not only advances gut microbiome research but also provides practical tools for scientists worldwide, paving the way for personalised medicine and a deeper understanding of human health.
Project Investigator
Dr ZHANG Lu (Department of Computer Sciene)
Project Collaboraters
- Professor BIAN Zhaoxiang (Vincent V.C. Woo Chinese Medicine Clinical Research Institute)
- Dr ZHAI Lixiang (Centre for Chinese Herbal Medicine Drug Development Limited)
- Professor Pavel Pevzner (University of California, San Diego (UCSD))
- Professor Zhang Louxin (National University of Singapore (NUS))
- Professor Cai Yunpeng (Shenzhen Institutes of Advanced Technology (SIAT))
Funding/Award
- Research Grants Council - Young Collaborative Research Grant
- Health Bureau - Health and Medical Research Fund
- Innovation and Technology Commission - Guangdong-Hong Kong Technology Cooperation Funding Scheme
- Beijing Genome Institute
- Guangdong Science and Technology Department - Guangdong Basic and Applied Basic Research Foundation
Publications
- Bohao Zou, JingJing Wang, Yi Ding, ZhenMiao Zhang, Yufen Huang, Xiaodong Fang, Ka Chun Cheung, Simon See and Lu Zhang*. A multi-modal deep language model to contaminant removal from metagenome-assembled genomes. Nature Machine Intelligence. https://www.nature.com/articles/s42256-024-00908-5
- Chao Yang, Zhenmiao Zhang, Yufen Huang, Xuefeng Xie, Herui Liao, Jin Xiao, Werner Pieter Veldsman, Kejing Yin, Xiaodong Fang*, Lu Zhang*. LRTK: A platform agnostic toolkit for linked-read analysis of both human genomes and metagenomes. GigaScience https://academic.oup.com/gigascience/article/doi/10.1093/gigascience/giae028/7692299
- Zhenmiao Zhang, Jin Xiao, Hongbo Wang, Chao Yang, Yufeng Huang, Zhen Yue, Yang Chen, Lijuan Han, Aiping Lyu, Xiaodong Fang, Lu Zhang*. Exploring high-quality microbial genomes by assembling short-reads with long-range connectivity. Nature Communications. https://www.nature.com/articles/s41467-024-49060-z
- Zhenmiao Zhang, Chao Yang, Werner Pieter Veldsman, Xiaodong Fang, Lu Zhang*. Benchmarking genome assembly methods on metagenomic sequencing data. Briefings in Bioinformatics. https://academic.oup.com/bib/article/24/2/bbad087/7077274
- Lixiang Zhai, Haitao Xiao, Chengyuan Lin, Yan Y Lam, Hoi Leong Xavier Wong, Mengxue Gong, Guojun Wu, Yusheng Deng, Ziwan Ning, Chunhua Huang, Yijing Zhang, Min Zhuang, Chao Yang, Lu Zhang, Ling Zhao, Chenhong Zhang, Xiaodong Fang, Wei Jia, Liping Zhao, Zhao-xiang Bian. Gut microbiota-derived tryptamine impairs insulin sensitivity. Nature Communications. https://www.nature.com/articles/s41467-023-40552-y
- Lixiang Zhai, Chunhua Huang, Ziwan Ning, Yijing Zhang, Min Zhuang, Wei Yang, Xiaolei Wang, Jingjing Wang, Lu Zhang, Haitao Xiao, Ling Zhao, Yan Y Lam, Chi Fung Willis Chow, Jiandong Huang, Shuofeng Yuan, Kui Ming Chan, Hoi Leong Xavier Wong, Zhao-xiang Bian. Ruminococcus gnavus plays a pathogenic role in diarrhea-predominant irritable bowel syndrome by increasing serotonin biosynthesis. Cell Host and Microbe. https://www.cell.com/cell-host-microbe/pdf/S1931-3128(22)00562-5.pdf
- Chao Yang, Debajyoti Chowdhury, Zhenmiao Zhang, William K. Cheung, Aiping Lu, ZhaoXiang Bian, Lu Zhang*. A review of computational tools for generating metagenome-assembled genomes from metagenomic sequencing data. Computational and Structural Biotechnology Journal. https://www.sciencedirect.com/science/article/pii/S2001037021004931
- Lu Zhang*, Xiaodong Fang et al. A Comprehensive Investigation of Metagenome Assembly by Linked-Read Sequencing. Microbiome. https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-020-00929-3