
An international research team, led by Jian-Feng Mao, have developed PlantLncBoost, a new computational tool that helps to identify long non-coding RNAs in plants. These RNAs are crucial for numerous biological processes but differ a lot between different plant species. PlantLncBoost addresses this challenge with very high accuracy offering new possibilities for genomic studies in plants. These findings were recently published in the journal New Phytologist.
Long non-coding RNAs, called lncRNAs, are transcribed from DNA as other RNAs but they do not carry instructions for proteins. Instead, they help controlling genes, guide plant development and are involved in plant responses to stress like drought or heat. Identifying these lncRNAs has been difficult because their genetic sequences vary a lot between different plant species.
The team around Jian-Feng Mao tackled the problem using machine learning, a type of artificial intelligence that is trained on large amounts of data to find patterns. They analysed over 1,600 different features of lncRNAs and identified just three key features that could effectively distinguish lncRNAs from RNAs containing the code for a protein.
Identification of sequence patterns using mathematical parameters
What makes PlantLncBoost particularly innovative is its use of mathematical parameters to capture intrinsic sequence properties beyond traditional biological features. The research team used so called Fourier transformation-based approaches. That allowed them to detect patterns in the RNA sequences that are consistent across diverse plant species despite of the high variability in the genetic sequences.
“Through systematic evaluation of multiple machine learning algorithms and rigorous parameter optimization, we have developed a tool that achieves both high accuracy and strong generalization capabilities,” explains Jian-Feng Mao, Associate professor at Umeå University who established his lab at the Umeå Plant Science Centre in 2023.
To make sure their new tool worked, the team tested PlantLncBoost on datasets from 20 different plant species. It correctly identified lncRNAs with over 96% accuracy, significantly outperforming existing tools. The tool even recognised nearly all 358 long lncRNAs that had been experimentally validated before, including those from twelve species that were not included in the training set used to develop the tool.
New possibilities to compare long non-coding RNAs across species
“Developing PlantLncBoost was an exciting opportunity to apply machine learning to solve a complex biological problem,” says first author Xue-Chan Tian, who completed this work as part of her PhD thesis at Beijing Forestry University. “My doctoral programme focused on combining advanced computational methods with plant genomics to extract meaningful biological insights from complex sequence data.”

The project brought together experts in genomics, bioinformatics and computer science from around the world, including researchers from Sweden, China and Brazil. The tool is now freely available to the scientific community and has been integrated in a larger analysis workflow that was developed earlier by Jian-Feng Mao’s group. It allows not only to identify but also to characterise lncRNAs in plants. By implementing PlantLncBoost in this workflow, researchers can now identify long non-coding RNAs from different plant species much more accurate, making it easier to compare and analyse them.
The article:
Tian, X-C., Nie, S., Domingues, D.S., Paschoal, A.R., Jiang, L-B., Mao, J-F. (2025). PlantLncBoost: key features for plant lncRNA identification and significant improvement in accuracy and generalization. New Phytologist. DOI: https://doi.org/10.1111/nph.70211
For questions, please contact:
Jian-Feng Mao, Umeå Plant Science Centre, Department of Plant Physiology, Umeå University
E-mail:
Phone: +46 73 672 6636
https://www.upsc.se/jianfeng_mao